Accessing Members of a Structured Data Set by Sequence Numbers

Sequence numbers (or indexes/subscripts) can be used to access members of an ordered set. There are various requirements in daily analytic scenarios, such as using one sequence number to access one member each time, or using multiple sequence numbers to access a number of members at once, or using one or more sequence numbers to access one or more members from back to front, or even accessing members according to a specified span, etc. This article will illustrate how to implement these requirements efficiently and quickly and offer sample scripts in esProc SPL. Looking ${article} for details.

1. Access one member by one sequence number each time

We can get one record from a data table according to one sequence number.

【Example 1】 Get the information of the first and the last transaction days in Shanghai Stock Exchange in the year of 2019. Below is part of the source data:

Date	Open	Close	Amount
2019/12/31	3036.3858	3050.124	2.27E11
2019/12/30	2998.1689	3040.0239	2.67E11
2019/12/27	3006.8517	3005.0355	2.58E11
2019/12/26	2981.2485	3007.3546	1.96E11
2019/12/25	2980.4276	2981.8805	1.9E11
…	…	…	…

【SPL script】

	A	B
1	=file("000001.csv").import@ct()	/ Import the source data file
2	=A1.select(year(Date)==2019).sort(Date)	/ Select records of 2019 and sort them by date
3	=A2(1)	A2.m(-1)

A3’s result:

Date	Open	Close	Amount
2019/01/02	2497.8805	2465.291	9.76E10
2019/12/31	3036.3858	3050.124	2.27E11

At certain occasions we need to get the sequence number of member in the specified position, from back to front sometimes. To get the sequence number of the second to last transaction date from the above records ordered by date, for instance, we can use the parameter -2 to do this.

【Example 2】 Based on the EMPLOYEE table, calculate the average salary for the states of [California, Texas, New York, Florida] and for other states as a whole, which are classified as “Other”. Below is part of the source data:

ID	NAME	STATE	SALARY
1	Rebecca	California	7000
2	Ashley	New York	11000
3	Rachel	New Mexico	9000
4	Emily	Texas	7000
5	Ashley	Texas	16000
…	…	…	…

【SPL script】

	A	B
1	=connect("db")	/ Connect to database
2	=A1.query("select * from EMPLOYEE")	/ Query EMPLOYEE table
3	[California,Texas,New York,Florida]	/ Define a sequence of states
4	=A2.align@an(A3,STATE)	/ Group records of EMPLOYEE table by the specified states; @a option enables returning all matching records for each group, and @n option creates a new group to hold the unmatching records
5	=A4.new(if (#>A3.p(-1),"Other",STATE):STATE,~.avg(SALARY):AvgSalary)	/ Calculate the average salary in each group and generate a new table sequence; A.p(-1) function gets the sequence number of the last member; change the last group name to “Other”

A5’s result:

STATE	SALARY
California	7700.0
Texas	7592.59
New York	7677.77
Florida	7145.16
Other	7308.1

2. Access members by multiple sequence numbers each time

At times we need to get a number of records according to multiple sequence numbers. For example, we can use a set of sequence numbers [4,5,6] to access records of the second quarter from a certain year’s sales table ordered by months; or we can access the weekend duty records using a set of sequence numbers [1,7] from a weekly on-duty table.

【Example 3】 The following is part of a table that records daily attendant information:

Per_Code	in_out	Date	Time	Type
1110263	1	2013-10-11	09:17:14	In
1110263	6	2013-10-11	11:37:00	Break
1110263	5	2013-10-11	11:38:21	Return
1110263	0	2013-10-11	11:43:21	NULL
1110263	6	2013-10-11	13:21:30	Break
1110263	5	2013-10-11	14:25:58	Return
1110263	2	2013-10-11	18:28:55	Out

We want to group the table every 7 records and then convert it to the following format:

Per_Code	Date	In	Out	Break	Return
1110263	2013-10-11	9:17:14	18:28:55	11:37:00	11:38:21
1110263	2013-10-11	9:17:14	18:28:55	13:21:30	14:25:58

【SPL script】

	A	B
1	=connect("db")	/ Connect to database
2	=A1.query("select * from DailyTime order by Per_Code,Date,Time")	/ Query the source table and sort it by code, date and time
3	=A2.group(Per_Code,Date)	/ Group records by code and date
4	=create(Per_Code,Date,In,Out,Break,Return)	/ Create an empty table that stores the final result
5	=A3.(~([1,7,2,3,1,7,5,6]))	/ A([1,7,2,3,1,7,5,6]) function gets records from each group in order to generate the ordered whole record of a date
6	=A5.conj([~~.Per_Code,~~.Date]	~.(Time).m([1,2,3,4])
7	>A4.record(A6)	/ Populate values to A4’s table

A4’s result:

Per_Code	Date	In	Out	Break	Return
1110263	2013-10-11	9:17:14	18:28:55	11:37:00	11:38:21
1110263	2013-10-11	9:17:14	18:28:55	13:21:30	14:25:58

We can also access members from back to front.

【Example 5】 Calculate the growth rate of each closing price in SSE’s last 10 trading dates of the year 2019 compared with the previous closing price. Below is part of the source data:

Date	Open	Close	Amount
2019/12/31	3036.3858	3050.124	2.27E11
2019/12/30	2998.1689	3040.0239	2.67E11
2019/12/27	3006.8517	3005.0355	2.58E11
2019/12/26	2981.2485	3007.3546	1.96E11
2019/12/25	2980.4276	2981.8805	1.9E11
…	…	…	…

【SPL script】

	A	B
1	=file("000001.csv").import@ct()	/Import the source data file
2	=A1.select(year(Date)==2019).sort(Date)	/ Select records of 2019 and sort them by date
3	=A2.p(to(-10,-1))	/ A.p() function returns the sequence numbers of the last records
4	=A3.new(A2(~~).Date:Date, string(A2(~~).Close/A2(~-1).Close-1, "0.000%" ):Increase)	/ Circularly calculate the growth rate of each closing price compared with the previous one

A4’s result:

Date	Increase
2019/12/18	-0.178%
2019/12/19	0.001%
2019/12/20	-0.402%
2019/12/23	-1.404%
2019/12/24	0.673%
…	…

3. Access members by the specified span

Another scenario is to access a series of members beginning from the specific sequence number according to a specified span. To get one sample from every 10 records in a data table, for instance, we can begin from the first record and get one within every ten. Another instance is to find all multiples of 3 from the natural numbers from 1 to 100. To do that, we can begin from the third number and get one every 3 numbers.

【Example 6】 Find the prime numbers within 100.

【SPL script】

	A	B
---	---	---
1	=to(100)	/ Define a set of numbers from 1 to 100
2	=to(2,10)	/ Define a set of numbers from 2 to 10
3	=A2.(A1.step(,*2))	/ A1.step(,*2) calculates the multiples (n times and n>1) of each member in A2 within 100
4	=A1.to(2,)\A3.conj()	Get all prime numbers within 100 by removing 1 and all composite numbers within the same range; A3.conj() finds all composite numbers within 100

A4’s result:

Member
2
3
5
7
11
13
17
19

Find more examples in SPL CookBook.

SPL Resource： SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accessing Members of a Structured Data Set by Sequence Numbers

1. Access one member by one sequence number each time

2. Access members by multiple sequence numbers each time

3. Access members by the specified span

Clone this wiki locally