Skip to content

esProc SPL:Equivalent to the Python enhanced DuckDB

esProcSPL edited this page Mar 7, 2025 · 1 revision

For desktop data analysis users, if DuckDB is the handy ‘SQL Swiss Army Knife,’ then esProc SPL is an ‘all-in-one toolbox’ with built-in Python capabilities, maintaining SQL’s ease of use while overcoming its inherent limitations.

Like DuckDB, esProc SPL offers excellent support for file handling. Common files like CSV and Excel can be used directly as databases, allowing you to run SQL queries immediately. For example, query sales data:

$SELECT region, SUM(amount) FROM sales.csv GROUP BY region

Such lightweight operations are a breeze for esProc. Similar to DuckDB, esProc also supports data binarization, but it stores data as files with excellent compression ratios. Loading millions of rows takes just seconds, making it at least three times faster than reading CSV directly.

Currently, esProc’s SQL does not support window functions, making it less comprehensive than DuckDB. However, esProc has a trump card: its native language, SPL, which significantly simplifies complex tasks compared to SQL, so you actually won’t need to write cumbersome window functions. For example, calculating bonuses for the top 3 salespeople in each province requires multiple nested queries in SQL:

WITH ranked_sales AS (
  SELECT province, salesman, amount,
    ROW_NUMBER() OVER(PARTITION BY province ORDER BY amount DESC) as rank
  FROM sales
)
SELECT * FROM ranked_sales WHERE rank <=3

With SPL, it’s much more straightforward:

sales.groups(province;top(-3;amount))

Such tasks can still be implemented in SQL. However, for more complex tasks, like calculating ‘automatic doubling of reward points when a customer’s consecutive purchase days exceed 5’, SQL often falls short. When it comes to flow control operations—such as implementing loops in SQL or dynamically adjusting calculation logic based on conditions, SQL’s IF and LOOP statements are too limited to be practical. The convoluted code you manage to write becomes incomprehensible even to yourself after just three days. That’s why DuckDB often relies on Python.

DuckDB’s Python interface is remarkably smooth, but using them in combination still creates a sense of fragmentation. You query data with SQL, load it into a DataFrame, and often end up writing it back to the database. These are two distinct systems with different development and debugging approaches, requiring constant mental context switching that feels jarring. It’s like ordering a steak in a Chinese restaurant—it works, but it just feels awkward.

In contrast, esProc SPL directly integrates the core capabilities of Python. The above calculation for “a customer’s consecutive purchase days exceed 5” is written in SPL:

A
1 $SELECT * FROM orders.csv ODDER BY client,order_date
2 =A1.group(client).conj(~.run(streak = if(order_date[-1] && order_date==order_date[-1]+1, streak+1, 1)).select(streak>=5))
3 =A2.groups(client;"Doubling":bonus)

This code is actually even more concise than Python.

esProc SPL, with its comprehensive computing capabilities, support for procedural computation, and robust flow control mechanisms, outperforms Python-enhanced DuckDB. It combines SQL’s agility with programming language’s flexibility, while eliminating the need to juggle back and forth between multiple tools. For desktop analysts who frequently handle complex calculations, esProc SPL may be a more elegant solution than ‘SQL + Python’. After all, who wouldn’t want to handle everything in one window?

Clone this wiki locally