You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: content/posts/2025-duckdb-pair-with-postgres.md
+11-7
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ on top of our regular daily driver of PostgreSQL.
18
18
19
19
At both EthicalAds and at our parent company Read the Docs, we use PostgreSQL heavily.
20
20
We use Postgres to store basically all our production data and to be our "source of truth"
21
-
for all advertising statsand expenditures.
21
+
for all advertising stats, billing, and payouts to publishers.
22
22
Postgres handles everything and is among the most dependable pieces of our infrastructure.
23
23
Postgres can handle [ML embeddings]({filename}../posts/2024-niche-ad-targeting.md)
24
24
with [pgvector](https://github.com/pgvector/pgvector) for better contextual ad targeting
@@ -50,7 +50,7 @@ Despite how much we love Postgres at EthicalAds, this specifically has felt like
50
50
51
51
## Column-wise storage & DuckDB
52
52
53
-
These kinds of expensive aggregation queries historically are better fits for column databases, data warehouses and [OLAP databases](https://en.wikipedia.org/wiki/Online_analytical_processing) generally.
53
+
Typically, these kinds of expensive aggregation queries are better fits for column databases, data warehouses and [OLAP databases](https://en.wikipedia.org/wiki/Online_analytical_processing) generally.
54
54
We considered building out a data warehouse or other kinds of column oriented databases
55
55
but never found something we really liked and we were always hesitant to add a second production system
56
56
that could get out of sync with Postgres.
@@ -60,11 +60,11 @@ but these solutions all either didn't work for our use case or
60
60
weren't supported on Azure's Managed Postgres, where we are hosted.
61
61
This is where using DuckDB came to our rescue.
62
62
63
-
[DuckDB](https://duckdb.org/) is an in-process, analytical database.
63
+
[DuckDB](https://duckdb.org/) is an in-process, analytical database and toolkit for analytical workloads.
64
64
It's sort of like SQLite but for analytical workloads and querying data anywhere in a variety of formats.
65
65
Like SQLite, you either run it in your app's process (Python for us)
66
66
or you can run its own standalone CLI.
67
-
It can read from CSV or Parquet files stored in blob storage
67
+
It can read from CSV or Parquet files stored on disk or in blob storage
68
68
or directly from an SQL database like Postgres.
69
69
70
70
Because most of our aggregations are for hourly or daily data and then data virtually never changes
@@ -160,9 +160,11 @@ This provides a number of advantages including:
160
160
161
161
"But David. Won't it be slow to run a SQL query against a remote file?"
162
162
Firstly, these queries are strictly analytical queries, nothing transactional.
163
-
Remember that any of the major clouds these blob storage files are going to be in or near
163
+
Remember that with any of the major clouds these blob storage files are going to be in or near
164
164
the data center where the rest of your servers are running.
165
-
Querying them is a lot faster than I originally expected it to be.
165
+
Querying them is a lot faster than I expected it to be.
166
+
For reports, estimates and other analytical workloads where folks are used to waiting a few seconds,
167
+
it works fairly well.
166
168
167
169
While DuckDB is pretty smart about [cross database queries](https://duckdb.org/2024/01/26/multi-database-support-in-duckdb.html),
168
170
I put "joins" in scare quotes for a reason.
@@ -175,7 +177,9 @@ Expensive, cross-database queries require a bit of extra testing and scrutiny.
175
177
Lastly, if anybody from the Azure team happens to be reading this,
176
178
we'd love it if you'd add [pg_parquet](https://github.com/CrunchyData/pg_parquet/) to Azure Managed Postgres
177
179
now that it [supports Azure storage](https://www.crunchydata.com/blog/pg_parquet-an-extension-to-connect-postgres-and-parquet).
178
-
Dumping parquets from Postgres directly would be more optimized than what we're currently doing.
180
+
Dumping parquets from Postgres directly would be much more optimized than
181
+
doing that from DuckDB. DuckDB is still amazing for reading these files once they're written,
182
+
but creating them directly with Postgres would be better still.
0 commit comments