Skip to content

Accessing Members of a Structured Data Set by Sequence Numbers

esProcSPL edited this page Aug 12, 2024 · 1 revision

Sequence numbers (or indexes/subscripts) can be used to access members of an ordered set. There are various requirements in daily analytic scenarios, such as using one sequence number to access one member each time, or using multiple sequence numbers to access a number of members at once, or using one or more sequence numbers to access one or more members from back to front, or even accessing members according to a specified span, etc. This article will illustrate how to implement these requirements efficiently and quickly and offer sample scripts in esProc SPL. Looking ${article} for details.

1. Access one member by one sequence number each time

We can get one record from a data table according to one sequence number.

【Example 1】 Get the information of the first and the last transaction days in Shanghai Stock Exchange in the year of 2019. Below is part of the source data:

Date Open Close Amount
2019/12/31 3036.3858 3050.124 2.27E11
2019/12/30 2998.1689 3040.0239 2.67E11
2019/12/27 3006.8517 3005.0355 2.58E11
2019/12/26 2981.2485 3007.3546 1.96E11
2019/12/25 2980.4276 2981.8805 1.9E11

【SPL script】

A B
1 =file("000001.csv").import@ct() / Import the source data file
2 =A1.select(year(Date)==2019).sort(Date) / Select records of 2019 and sort them by date
3 =A2(1) A2.m(-1)

A3’s result:

Date Open Close Amount
2019/01/02 2497.8805 2465.291 9.76E10
2019/12/31 3036.3858 3050.124 2.27E11

At certain occasions we need to get the sequence number of member in the specified position, from back to front sometimes. To get the sequence number of the second to last transaction date from the above records ordered by date, for instance, we can use the parameter -2 to do this.

【Example 2】 Based on the EMPLOYEE table, calculate the average salary for the states of [California, Texas, New York, Florida] and for other states as a whole, which are classified as “Other”. Below is part of the source data:

ID NAME STATE SALARY
1 Rebecca California 7000
2 Ashley New York 11000
3 Rachel New Mexico 9000
4 Emily Texas 7000
5 Ashley Texas 16000

【SPL script】

A B
1 =connect("db") / Connect to database
2 =A1.query("select * from EMPLOYEE") / Query EMPLOYEE table
3 [California,Texas,New York,Florida] / Define a sequence of states
4 =A2.align@an(A3,STATE) / Group records of EMPLOYEE table by the specified states; @a option enables returning all matching records for each group, and @n option creates a new group to hold the unmatching records
5 =A4.new(if (#>A3.p(-1),"Other",STATE):STATE,~.avg(SALARY):AvgSalary) / Calculate the average salary in each group and generate a new table sequence; A.p(-1) function gets the sequence number of the last member; change the last group name to “Other”

A5’s result:

STATE SALARY
California 7700.0
Texas 7592.59
New York 7677.77
Florida 7145.16
Other 7308.1

2. Access members by multiple sequence numbers each time

At times we need to get a number of records according to multiple sequence numbers. For example, we can use a set of sequence numbers [4,5,6] to access records of the second quarter from a certain year’s sales table ordered by months; or we can access the weekend duty records using a set of sequence numbers [1,7] from a weekly on-duty table.

【Example 3】 The following is part of a table that records daily attendant information:

Per_Code in_out Date Time Type
1110263 1 2013-10-11 09:17:14 In
1110263 6 2013-10-11 11:37:00 Break
1110263 5 2013-10-11 11:38:21 Return
1110263 0 2013-10-11 11:43:21 NULL
1110263 6 2013-10-11 13:21:30 Break
1110263 5 2013-10-11 14:25:58 Return
1110263 2 2013-10-11 18:28:55 Out

We want to group the table every 7 records and then convert it to the following format:

Per_Code Date In Out Break Return
1110263 2013-10-11 9:17:14 18:28:55 11:37:00 11:38:21
1110263 2013-10-11 9:17:14 18:28:55 13:21:30 14:25:58

【SPL script】

A B
1 =connect("db") / Connect to database
2 =A1.query("select * from DailyTime order by Per_Code,Date,Time") / Query the source table and sort it by code, date and time
3 =A2.group(Per_Code,Date) / Group records by code and date
4 =create(Per_Code,Date,In,Out,Break,Return) / Create an empty table that stores the final result
5 =A3.(~([1,7,2,3,1,7,5,6])) / A([1,7,2,3,1,7,5,6]) function gets records from each group in order to generate the ordered whole record of a date
6 =A5.conj([.Per_Code,.Date] ~.(Time).m([1,2,3,4])
7 >A4.record(A6) / Populate values to A4’s table

A4’s result:

Per_Code Date In Out Break Return
1110263 2013-10-11 9:17:14 18:28:55 11:37:00 11:38:21
1110263 2013-10-11 9:17:14 18:28:55 13:21:30 14:25:58

We can also access members from back to front.

【Example 5】 Calculate the growth rate of each closing price in SSE’s last 10 trading dates of the year 2019 compared with the previous closing price. Below is part of the source data:

Date Open Close Amount
2019/12/31 3036.3858 3050.124 2.27E11
2019/12/30 2998.1689 3040.0239 2.67E11
2019/12/27 3006.8517 3005.0355 2.58E11
2019/12/26 2981.2485 3007.3546 1.96E11
2019/12/25 2980.4276 2981.8805 1.9E11

【SPL script】

A B
1 =file("000001.csv").import@ct() /Import the source data file
2 =A1.select(year(Date)==2019).sort(Date) / Select records of 2019 and sort them by date
3 =A2.p(to(-10,-1)) / A.p() function returns the sequence numbers of the last records
4 =A3.new(A2().Date:Date, string(A2().Close/A2(~-1).Close-1, "0.000%" ):Increase) / Circularly calculate the growth rate of each closing price compared with the previous one

A4’s result:

Date Increase
2019/12/18 -0.178%
2019/12/19 0.001%
2019/12/20 -0.402%
2019/12/23 -1.404%
2019/12/24 0.673%

3. Access members by the specified span

Another scenario is to access a series of members beginning from the specific sequence number according to a specified span. To get one sample from every 10 records in a data table, for instance, we can begin from the first record and get one within every ten. Another instance is to find all multiples of 3 from the natural numbers from 1 to 100. To do that, we can begin from the third number and get one every 3 numbers.

【Example 6】 Find the prime numbers within 100.

【SPL script】

A B
--- --- ---
1 =to(100) / Define a set of numbers from 1 to 100
2 =to(2,10) / Define a set of numbers from 2 to 10
3 =A2.(A1.step(,*2)) / A1.step(,*2) calculates the multiples (n times and n>1) of each member in A2 within 100
4 =A1.to(2,)\A3.conj() Get all prime numbers within 100 by removing 1 and all composite numbers within the same range; A3.conj() finds all composite numbers within 100

A4’s result:

Member
2
3
5
7
11
13
17
19

Find more examples in SPL CookBook.

Clone this wiki locally