Skip to content

Commit 274b988

Browse files
authored
duckdb + polars performance improvements (ploomber#725)
1 parent fcf1b33 commit 274b988

27 files changed

+1726
-738
lines changed

.github/workflows/scheduled.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ on:
99
jobs:
1010
broken-links:
1111
runs-on: ubuntu-latest
12+
if: ${{ !contains(github.event.pull_request.labels.*.name, 'allow-broken-links') }}
13+
1214
steps:
1315
- uses: actions/checkout@v2
1416
- name: Set up Python ${{ matrix.python-version }}

CHANGELOG.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,15 @@
88
* [Feature] Moved `%sqlrender` feature to `%sqlcmd snippets` (#647)
99
* [Feature] Added tables listing stored snippets when `%sqlcmd snippets` is called (#648)
1010
* [Doc] Modified integrations content to ensure they're all consistent (#523)
11-
* [Doc] Document --persist-replace in API section (#539)
11+
* [Doc] Document `--persist-replace` in API section (#539)
1212
* [Fix] Fixed CI issue by updating `invalid_connection_string_duckdb` in `test_magic.py` (#631)
1313
* [Fix] Refactored `ResultSet` to lazy loading (#470)
1414
* [Fix] Removed `WITH` when a snippet does not have a dependency (#657)
1515
* [Fix] Used display module when generating CTE (#649)
1616
* [Doc] Re-organized sections. Adds section showing how to share notebooks via Ploomber Cloud
1717
* [Fix] Adding `--with` back because of issues with sqlglot query parser (#684)
18-
* [Fix] Improving << parsing logic (#610)
18+
* [Feature] Better performance when using DuckDB native connection and converting to `pandas.DataFrame` or `polars.DataFrame`
19+
* [Fix] Improving `<<` parsing logic (#610)
1920
* [Fix] Migrate user feedback to use display module (#548)
2021

2122
## 0.7.9 (2023-06-19)

benchmarks/duckdb.ipynb

-248
This file was deleted.

benchmarks/profiling.py

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
"""
2+
Sample script to profile the sql magic.
3+
"""
4+
from sql.magic import SqlMagic
5+
from IPython import InteractiveShell
6+
import duckdb
7+
from pandas import DataFrame
8+
import numpy as np
9+
10+
num_rows = 1000_000
11+
12+
df = DataFrame(np.random.randn(num_rows, 20))
13+
14+
magic = SqlMagic(InteractiveShell())
15+
16+
conn = duckdb.connect()
17+
magic.execute(line="conn --alias duckdb", local_ns={"conn": conn})
18+
magic.autopandas = True
19+
magic.displaycon = False
20+
21+
22+
# NOTE: you can put the @profile decorator on any internal function to profile it
23+
# the @profile decorator is injected by the line_profiler package at runtime, to learn
24+
# more, see: https://github.com/pyutils/line_profiler
25+
@profile # noqa
26+
def run_magic():
27+
magic.execute("SELECT * FROM df")
28+
29+
30+
if __name__ == "__main__":
31+
run_magic()

doc/_toc.yml

+2
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ parts:
3939
- file: integrations/questdb
4040
- file: integrations/oracle
4141
- file: integrations/trinodb
42+
- file: integrations/duckdb-sqlalchemy
4243

4344
- caption: API Reference
4445
chapters:
@@ -69,6 +70,7 @@ parts:
6970
- file: tutorials/etl
7071
- file: tutorials/excel
7172
- file: tutorials/product-analytics
73+
- file: tutorials/duckdb-native-sqlalchemy
7274

7375
- caption: Community
7476
chapters:

doc/community/developer-guide.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -149,22 +149,22 @@ conn.execute("CREATE TABLE some_table (name, age)")
149149
### Non SQLAlchemy supported engines
150150

151151
When working with engines that are not supported by SQLAlchemy, e.g. `QuestDB`, we won't be able to use `sqlalchemy.create_engine`.
152-
Instead, we should initiate an engine using the native method and use the `CustomConnection` object.
152+
Instead, we should initiate an engine using the native method and use the `DBAPIConnection` object.
153153

154154
```python
155155
import psycopg as pg
156-
from sql.connection import CustomConnection
156+
from sql.connection import DBAPIConnection
157157

158158
engine = pg.connect("dbname='qdb' user='admin' host='127.0.0.1' port='8812' password='quest'")
159-
conn = CustomConnection(engine)
159+
conn = DBAPIConnection(engine)
160160

161161
plot.histogram("my_table", "column_name", bins=50, conn=conn)
162162
```
163163

164164
For a full example on how to use JupySQL with a non SQLAlchemy supported engine please see [QuestDB](./../integrations/questdb).
165165

166166
```{note}
167-
Please be advised that there may be some features/functionalities that won't be fully compatible with JupySQL when using `CustomConnection`.
167+
Please be advised that there may be some features/functionalities that won't be fully compatible with JupySQL when using `DBAPIConnection`.
168168
```
169169

170170
## Unit testing

0 commit comments

Comments
 (0)