Skip to content

Existence Checking for Structured Data

esProcSPL edited this page Aug 12, 2024 · 1 revision

When retrieving data from a table, we sometimes need to check whether a member exists according to a specific condition. The condition could come from another data table. For instance, there is a score field in score table and we want to get records from student table where every subject gets a score above 80. So through a series of examples, let’s see how to handle such existence checking tasks conveniently and efficiently. The article provides esProc sample script for each example. Looking ${article} for details.

1. Existence checks of foreign key mapping

Find records according to the existence of foreign key mapping based on two associated tables.

【Example 1】 Based on the following Score table and Student table, calculate the average score of each boy in class one.

【Analysis】

When retrieving data from Score table, we check whether the current record is about a boy in class one. Select it out if it is.

【SPL script】

A B
1 =connect("db") /Connect to database
2 =A1.query("select * from Student") /Query Student table
3 =A1.query("select * from Score") /Query Score table
4 =A3.select(Class=="Class 1" && Gender=="Male") / Select records of boys in class one
5 =A2.join@i(Class:StudentID, A4:Class:ID) / A.join@i() function joins Student table and the selected Score records and performs filtering
6 =A5.groups(StudentID; avg(Score):Score) / Group A5’s records and calculate each boy’s average score

A6’s result:

StudentID Score
1 76
3 74

When the foreign key table is too large to fit into memory, we can use cursor to retrieve data from source tables and then perform order-based merge between cursors.

【Example 2】 Based on the following are Order table and *Detail *table, find the number of orders that don’t use the discount in each month of the year 2014.

【Analysis】

When retrieving data from the Order table, we check whether the discount in the current record is 0 and select it out if it is.

【SPL script】

A B
1 =connect("db") / Connect to database
2 =A1.cursor("select * from Order where year(Date)=2014 order by ID") / Create cursor of Order table and get records of 2014
3 =A1.cursor("select * from Detail order by ID") / Create cursor of Detail table
4 =A3.select(Discount==0) / Get Detail records that don’t use the discount
5 =joinx(A2:Order,ID;A4:Detail,ID) / joinx function performs order-based merge between A2 and A4’s cursors
6 =A5.groups(month(Order.Date):Month; icount(Order.ID):OrderCount) / Group merged records by month and count orders in each month

A6’s result:

Month OrderCount
1 16
2 25

2. Checks through non-equi join

We search for data in a table according to whether a specific condition is met through the non-equi join.

【Example 3】 Based on the following order data, we want to find the amount of orders that span over a year. Below is part of the source data:

ID NUMBER AMOUNT DELIVERDATE ARRIVALDATE
10814 1 408.0 2014/01/05 2014/04/18
10814 2 204.0 2014/02/21 2014/04/05
10814 3 102.0 2014/03/14 2014/04/06
10814 4 102.0 2014/04/09 2014/04/27
10814 5 102.0 2014/05/04 2014/07/04
10848 1 873.0 2014/01/06 2014/04/21

【Analysis】

When retrieving data from the Order table, we check whether there are any records whose order span over a year and select them out if there is.

【SPL script】

A B
1 =connect("db") / Connect to database
2 =A1.query("select * from Detail") / Query Detail table
3 =A2.group(ID) / Group Detail table by ID
4 =A3.select(interval(~.min(DELIVERDATE), ~.max(ARRIVALDATE)) > 365) / Get records of same order ID where the latest arrival date is at least 365 days after the earlier deliver date
5 =A4.new(ID, ~.sum(AMOUNT):Amount) / Create a new data table and sum amounts of each eligible order

A5’s result:

ID Amount
10998 6800.0
11013 4560.0
11032 20615.0

3. Non-existence checks of foreign key mapping

Find records according to the non-existence of foreign key mapping based on two associated tables.

【Example 4】Based on the following Student table and Score table, find students who get scores above 80 for all subjects:

【Analysis】

When retrieving data from the Student table, we check whether there is a score below 80 in the current record and, if it doesn’t have any, select it out.

【SPL script】

A B
1 =connect("db") / Connect to database
2 =A1.query("select * from Student") / Query Student table
3 =A1.query("select * from Score") / Query Score table
4 =A3.select(Score<=80) / Get Student records where there is a score of or below 80
5 =A4.id(StudentID) / Perform distinct over the selected records
6 =A2.join@d(ID, A5) / A.join@d() function gets unmatching records from the Student table

A6’s result:

ID Class Name
2 Class 1 Ashley
16 Class 2 Alexis

4. Checks through double negatives

Find matching records through double negaives.

【Example 5】 Based on the following Student table, SelectCourse table and *Course *table, find students who select all courses.

【Analysis】

When retrieving records from the Student table, we check whether there is any course the current student doesn’t select and, if there isn’t one, select it out. We can also think the opposite in the positive way to select the records where the number of selected courses is equal to that of all courses.

【SPL script】

A B
1 =connect("db") / Connect to database
2 =A1.query("select * from Student") / Query Student table
3 =A1.query("select * from Course") / Query Course table
4 =A1.query("select * from SelectCourse") / Query SelectCourse table
5 =A4.groups(StudentID; icount(CourseID):CourseCount) / Group the SelectCourse table by student ID and count the courses selected by each student
6 =A5.select(CourseCount==A3.len()) / Get student IDs that select all courses
7 =A2.join@i(ID, A6:StudentID) / A.join@i() function performs filtering join

A7’s result:

ID Name Class
4 Emily Smith Class 1

5. Checks of existence of any values meeting a specific condition

Find records according to the existence of any two values that meet a specific condition based on two associated tables.

【Example 6】 Based on the following Student table and* Score* table, find Student records where there are any two scores whose difference is over 30.

【Analysis】

When retrieving records from the Student table, we check if there are records where scores of any subjects have a 30 difference between them, and select it out if there are.

【SPL script】

A B
1 =connect("db") / Connect to database
2 =A1.query("select * from Student") /Query Student table
3 =A1.query("select * from Score") / Query Score table
4 =A3.group(StudentID) / Group Score table by student ID
5 =A4.select(.max(Score)-.min(Score)>30) / Get students whose highest score and lowest score have a 30 difference between them
6 =A5.id(StudentID) / Perform distinct by student ID
7 =A2.join@i(ID,A6) / A.join@i() performs a join filtering

A7’s result:

ID Name Class
4 Emily Smith Class 1
8 Megan Class 1

6. Checks of all eligible members meeting a specific condition

Find all records meeting a specific condition from a data table.

【Example 7】Based on the following Employee table, find the employees whose salaries are higher than those of all salespeople. Below is part of the source data:

ID NAME DEPT SALARY
1 Rebecca R&D 7000
2 Ashley Finance 11000
3 Rachel Sales 9000
4 Emily HR 7000
5 Ashley R&D 16000

【Analysis】

When retrieving data from the Employee table, we check whether the current employee’s salary is higher than those of all salespeople, and select it out if it is.

【SPL script】

A B
1 =connect("db") / Connect to database
2 =A1.query("select * from Employee") /Query Employee table
3 =A2.select(DEPT:"Sales").max(SALARY) /Get records of sales department and get the highest salary in the department
4 =A2.select(SALARY>A3) / Find records where salaries are higher than the highest salary in sales department

A4’s result:

ID NAME DEPT SALARY
5 Ashley R&D 16000
20 Alexis Administration 16000
22 Jacob R&D 18000
47 Elizabeth Marketing 17000

Read SPL CookBook to find more related examples.

Clone this wiki locally