-
Notifications
You must be signed in to change notification settings - Fork 328
Existence Checking for Structured Data
When retrieving data from a table, we sometimes need to check whether a member exists according to a specific condition. The condition could come from another data table. For instance, there is a score field in score table and we want to get records from student table where every subject gets a score above 80. So through a series of examples, let’s see how to handle such existence checking tasks conveniently and efficiently. The article provides esProc sample script for each example. Looking ${article} for details.
Find records according to the existence of foreign key mapping based on two associated tables.
【Example 1】 Based on the following Score table and Student table, calculate the average score of each boy in class one.
【Analysis】
When retrieving data from Score table, we check whether the current record is about a boy in class one. Select it out if it is.
【SPL script】
A | B | |
---|---|---|
1 | =connect("db") | /Connect to database |
2 | =A1.query("select * from Student") | /Query Student table |
3 | =A1.query("select * from Score") | /Query Score table |
4 | =A3.select(Class=="Class 1" && Gender=="Male") | / Select records of boys in class one |
5 | =A2.join@i(Class:StudentID, A4:Class:ID) | / A.join@i() function joins Student table and the selected Score records and performs filtering |
6 | =A5.groups(StudentID; avg(Score):Score) | / Group A5’s records and calculate each boy’s average score |
A6’s result:
StudentID | Score |
---|---|
1 | 76 |
3 | 74 |
… | … |
When the foreign key table is too large to fit into memory, we can use cursor to retrieve data from source tables and then perform order-based merge between cursors.
【Example 2】 Based on the following are Order table and *Detail *table, find the number of orders that don’t use the discount in each month of the year 2014.
【Analysis】
When retrieving data from the Order table, we check whether the discount in the current record is 0 and select it out if it is.
【SPL script】
A | B | |
---|---|---|
1 | =connect("db") | / Connect to database |
2 | =A1.cursor("select * from Order where year(Date)=2014 order by ID") | / Create cursor of Order table and get records of 2014 |
3 | =A1.cursor("select * from Detail order by ID") | / Create cursor of Detail table |
4 | =A3.select(Discount==0) | / Get Detail records that don’t use the discount |
5 | =joinx(A2:Order,ID;A4:Detail,ID) | / joinx function performs order-based merge between A2 and A4’s cursors |
6 | =A5.groups(month(Order.Date):Month; icount(Order.ID):OrderCount) | / Group merged records by month and count orders in each month |
A6’s result:
Month | OrderCount |
---|---|
1 | 16 |
2 | 25 |
… | … |
We search for data in a table according to whether a specific condition is met through the non-equi join.
【Example 3】 Based on the following order data, we want to find the amount of orders that span over a year. Below is part of the source data:
ID | NUMBER | AMOUNT | DELIVERDATE | ARRIVALDATE |
---|---|---|---|---|
10814 | 1 | 408.0 | 2014/01/05 | 2014/04/18 |
10814 | 2 | 204.0 | 2014/02/21 | 2014/04/05 |
10814 | 3 | 102.0 | 2014/03/14 | 2014/04/06 |
10814 | 4 | 102.0 | 2014/04/09 | 2014/04/27 |
10814 | 5 | 102.0 | 2014/05/04 | 2014/07/04 |
10848 | 1 | 873.0 | 2014/01/06 | 2014/04/21 |
… | … | … | … | … |
【Analysis】
When retrieving data from the Order table, we check whether there are any records whose order span over a year and select them out if there is.
【SPL script】
A | B | |
---|---|---|
1 | =connect("db") | / Connect to database |
2 | =A1.query("select * from Detail") | / Query Detail table |
3 | =A2.group(ID) | / Group Detail table by ID |
4 | =A3.select(interval(~.min(DELIVERDATE), ~.max(ARRIVALDATE)) > 365) | / Get records of same order ID where the latest arrival date is at least 365 days after the earlier deliver date |
5 | =A4.new(ID, ~.sum(AMOUNT):Amount) | / Create a new data table and sum amounts of each eligible order |
A5’s result:
ID | Amount |
---|---|
10998 | 6800.0 |
11013 | 4560.0 |
11032 | 20615.0 |
… | … |
Find records according to the non-existence of foreign key mapping based on two associated tables.
【Example 4】Based on the following Student table and Score table, find students who get scores above 80 for all subjects:
【Analysis】
When retrieving data from the Student table, we check whether there is a score below 80 in the current record and, if it doesn’t have any, select it out.
【SPL script】
A | B | |
---|---|---|
1 | =connect("db") | / Connect to database |
2 | =A1.query("select * from Student") | / Query Student table |
3 | =A1.query("select * from Score") | / Query Score table |
4 | =A3.select(Score<=80) | / Get Student records where there is a score of or below 80 |
5 | =A4.id(StudentID) | / Perform distinct over the selected records |
6 | =A2.join@d(ID, A5) | / A.join@d() function gets unmatching records from the Student table |
A6’s result:
ID | Class | Name |
---|---|---|
2 | Class 1 | Ashley |
16 | Class 2 | Alexis |
Find matching records through double negaives.
【Example 5】 Based on the following Student table, SelectCourse table and *Course *table, find students who select all courses.
【Analysis】
When retrieving records from the Student table, we check whether there is any course the current student doesn’t select and, if there isn’t one, select it out. We can also think the opposite in the positive way to select the records where the number of selected courses is equal to that of all courses.
【SPL script】
A | B | |
---|---|---|
1 | =connect("db") | / Connect to database |
2 | =A1.query("select * from Student") | / Query Student table |
3 | =A1.query("select * from Course") | / Query Course table |
4 | =A1.query("select * from SelectCourse") | / Query SelectCourse table |
5 | =A4.groups(StudentID; icount(CourseID):CourseCount) | / Group the SelectCourse table by student ID and count the courses selected by each student |
6 | =A5.select(CourseCount==A3.len()) | / Get student IDs that select all courses |
7 | =A2.join@i(ID, A6:StudentID) | / A.join@i() function performs filtering join |
A7’s result:
ID | Name | Class |
---|---|---|
4 | Emily Smith | Class 1 |
Find records according to the existence of any two values that meet a specific condition based on two associated tables.
【Example 6】 Based on the following Student table and* Score* table, find Student records where there are any two scores whose difference is over 30.
【Analysis】
When retrieving records from the Student table, we check if there are records where scores of any subjects have a 30 difference between them, and select it out if there are.
【SPL script】
A | B | |
---|---|---|
1 | =connect("db") | / Connect to database |
2 | =A1.query("select * from Student") | /Query Student table |
3 | =A1.query("select * from Score") | / Query Score table |
4 | =A3.group(StudentID) | / Group Score table by student ID |
5 | =A4.select( |
/ Get students whose highest score and lowest score have a 30 difference between them |
6 | =A5.id(StudentID) | / Perform distinct by student ID |
7 | =A2.join@i(ID,A6) | / A.join@i() performs a join filtering |
A7’s result:
ID | Name | Class |
---|---|---|
4 | Emily Smith | Class 1 |
8 | Megan | Class 1 |
… | … | … |
Find all records meeting a specific condition from a data table.
【Example 7】Based on the following Employee table, find the employees whose salaries are higher than those of all salespeople. Below is part of the source data:
ID | NAME | DEPT | SALARY |
---|---|---|---|
1 | Rebecca | R&D | 7000 |
2 | Ashley | Finance | 11000 |
3 | Rachel | Sales | 9000 |
4 | Emily | HR | 7000 |
5 | Ashley | R&D | 16000 |
… | … | … | … |
【Analysis】
When retrieving data from the Employee table, we check whether the current employee’s salary is higher than those of all salespeople, and select it out if it is.
【SPL script】
A | B | |
---|---|---|
1 | =connect("db") | / Connect to database |
2 | =A1.query("select * from Employee") | /Query Employee table |
3 | =A2.select(DEPT:"Sales").max(SALARY) | /Get records of sales department and get the highest salary in the department |
4 | =A2.select(SALARY>A3) | / Find records where salaries are higher than the highest salary in sales department |
A4’s result:
ID | NAME | DEPT | SALARY |
---|---|---|---|
5 | Ashley | R&D | 16000 |
20 | Alexis | Administration | 16000 |
22 | Jacob | R&D | 18000 |
47 | Elizabeth | Marketing | 17000 |
Read SPL CookBook to find more related examples.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code