-
Notifications
You must be signed in to change notification settings - Fork 328
Accessing Members of a Structured Data Set by Sequence Numbers
Sequence numbers (or indexes/subscripts) can be used to access members of an ordered set. There are various requirements in daily analytic scenarios, such as using one sequence number to access one member each time, or using multiple sequence numbers to access a number of members at once, or using one or more sequence numbers to access one or more members from back to front, or even accessing members according to a specified span, etc. This article will illustrate how to implement these requirements efficiently and quickly and offer sample scripts in esProc SPL. Looking ${article} for details.
We can get one record from a data table according to one sequence number.
【Example 1】 Get the information of the first and the last transaction days in Shanghai Stock Exchange in the year of 2019. Below is part of the source data:
Date | Open | Close | Amount |
---|---|---|---|
2019/12/31 | 3036.3858 | 3050.124 | 2.27E11 |
2019/12/30 | 2998.1689 | 3040.0239 | 2.67E11 |
2019/12/27 | 3006.8517 | 3005.0355 | 2.58E11 |
2019/12/26 | 2981.2485 | 3007.3546 | 1.96E11 |
2019/12/25 | 2980.4276 | 2981.8805 | 1.9E11 |
… | … | … | … |
【SPL script】
A | B | |
---|---|---|
1 | =file("000001.csv").import@ct() | / Import the source data file |
2 | =A1.select(year(Date)==2019).sort(Date) | / Select records of 2019 and sort them by date |
3 | =A2(1) | A2.m(-1) |
A3’s result:
Date | Open | Close | Amount |
---|---|---|---|
2019/01/02 | 2497.8805 | 2465.291 | 9.76E10 |
2019/12/31 | 3036.3858 | 3050.124 | 2.27E11 |
At certain occasions we need to get the sequence number of member in the specified position, from back to front sometimes. To get the sequence number of the second to last transaction date from the above records ordered by date, for instance, we can use the parameter -2 to do this.
【Example 2】 Based on the EMPLOYEE table, calculate the average salary for the states of [California, Texas, New York, Florida] and for other states as a whole, which are classified as “Other”. Below is part of the source data:
ID | NAME | STATE | SALARY |
---|---|---|---|
1 | Rebecca | California | 7000 |
2 | Ashley | New York | 11000 |
3 | Rachel | New Mexico | 9000 |
4 | Emily | Texas | 7000 |
5 | Ashley | Texas | 16000 |
… | … | … | … |
【SPL script】
A | B | |
---|---|---|
1 | =connect("db") | / Connect to database |
2 | =A1.query("select * from EMPLOYEE") | / Query EMPLOYEE table |
3 | [California,Texas,New York,Florida] | / Define a sequence of states |
4 | =A2.align@an(A3,STATE) | / Group records of EMPLOYEE table by the specified states; @a option enables returning all matching records for each group, and @n option creates a new group to hold the unmatching records |
5 | =A4.new(if (#>A3.p(-1),"Other",STATE):STATE,~.avg(SALARY):AvgSalary) | / Calculate the average salary in each group and generate a new table sequence; A.p(-1) function gets the sequence number of the last member; change the last group name to “Other” |
A5’s result:
STATE | SALARY |
---|---|
California | 7700.0 |
Texas | 7592.59 |
New York | 7677.77 |
Florida | 7145.16 |
Other | 7308.1 |
At times we need to get a number of records according to multiple sequence numbers. For example, we can use a set of sequence numbers [4,5,6] to access records of the second quarter from a certain year’s sales table ordered by months; or we can access the weekend duty records using a set of sequence numbers [1,7] from a weekly on-duty table.
【Example 3】 The following is part of a table that records daily attendant information:
Per_Code | in_out | Date | Time | Type |
---|---|---|---|---|
1110263 | 1 | 2013-10-11 | 09:17:14 | In |
1110263 | 6 | 2013-10-11 | 11:37:00 | Break |
1110263 | 5 | 2013-10-11 | 11:38:21 | Return |
1110263 | 0 | 2013-10-11 | 11:43:21 | NULL |
1110263 | 6 | 2013-10-11 | 13:21:30 | Break |
1110263 | 5 | 2013-10-11 | 14:25:58 | Return |
1110263 | 2 | 2013-10-11 | 18:28:55 | Out |
We want to group the table every 7 records and then convert it to the following format:
Per_Code | Date | In | Out | Break | Return |
---|---|---|---|---|---|
1110263 | 2013-10-11 | 9:17:14 | 18:28:55 | 11:37:00 | 11:38:21 |
1110263 | 2013-10-11 | 9:17:14 | 18:28:55 | 13:21:30 | 14:25:58 |
【SPL script】
A | B | |
---|---|---|
1 | =connect("db") | / Connect to database |
2 | =A1.query("select * from DailyTime order by Per_Code,Date,Time") | / Query the source table and sort it by code, date and time |
3 | =A2.group(Per_Code,Date) | / Group records by code and date |
4 | =create(Per_Code,Date,In,Out,Break,Return) | / Create an empty table that stores the final result |
5 | =A3.(~([1,7,2,3,1,7,5,6])) | / A([1,7,2,3,1,7,5,6]) function gets records from each group in order to generate the ordered whole record of a date |
6 | =A5.conj([ |
~.(Time).m([1,2,3,4]) |
7 | >A4.record(A6) | / Populate values to A4’s table |
A4’s result:
Per_Code | Date | In | Out | Break | Return |
---|---|---|---|---|---|
1110263 | 2013-10-11 | 9:17:14 | 18:28:55 | 11:37:00 | 11:38:21 |
1110263 | 2013-10-11 | 9:17:14 | 18:28:55 | 13:21:30 | 14:25:58 |
We can also access members from back to front.
【Example 5】 Calculate the growth rate of each closing price in SSE’s last 10 trading dates of the year 2019 compared with the previous closing price. Below is part of the source data:
Date | Open | Close | Amount |
---|---|---|---|
2019/12/31 | 3036.3858 | 3050.124 | 2.27E11 |
2019/12/30 | 2998.1689 | 3040.0239 | 2.67E11 |
2019/12/27 | 3006.8517 | 3005.0355 | 2.58E11 |
2019/12/26 | 2981.2485 | 3007.3546 | 1.96E11 |
2019/12/25 | 2980.4276 | 2981.8805 | 1.9E11 |
… | … | … | … |
【SPL script】
A | B | |
---|---|---|
1 | =file("000001.csv").import@ct() | /Import the source data file |
2 | =A1.select(year(Date)==2019).sort(Date) | / Select records of 2019 and sort them by date |
3 | =A2.p(to(-10,-1)) | / A.p() function returns the sequence numbers of the last records |
4 | =A3.new(A2( |
/ Circularly calculate the growth rate of each closing price compared with the previous one |
A4’s result:
Date | Increase |
---|---|
2019/12/18 | -0.178% |
2019/12/19 | 0.001% |
2019/12/20 | -0.402% |
2019/12/23 | -1.404% |
2019/12/24 | 0.673% |
… | … |
Another scenario is to access a series of members beginning from the specific sequence number according to a specified span. To get one sample from every 10 records in a data table, for instance, we can begin from the first record and get one within every ten. Another instance is to find all multiples of 3 from the natural numbers from 1 to 100. To do that, we can begin from the third number and get one every 3 numbers.
【Example 6】 Find the prime numbers within 100.
【SPL script】
A | B | |
---|---|---|
--- | --- | --- |
1 | =to(100) | / Define a set of numbers from 1 to 100 |
2 | =to(2,10) | / Define a set of numbers from 2 to 10 |
3 | =A2.(A1.step( |
/ A1.step( |
4 | =A1.to(2,)\A3.conj() | Get all prime numbers within 100 by removing 1 and all composite numbers within the same range; A3.conj() finds all composite numbers within 100 |
A4’s result:
Member |
---|
2 |
3 |
5 |
7 |
11 |
13 |
17 |
19 |
Find more examples in SPL CookBook.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code