Skip to content

Commit f0a0382

Browse files
committed
Fix up a few words in the post
1 parent 1856908 commit f0a0382

File tree

1 file changed

+11
-7
lines changed

1 file changed

+11
-7
lines changed

content/posts/2025-duckdb-pair-with-postgres.md

+11-7
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ on top of our regular daily driver of PostgreSQL.
1818

1919
At both EthicalAds and at our parent company Read the Docs, we use PostgreSQL heavily.
2020
We use Postgres to store basically all our production data and to be our "source of truth"
21-
for all advertising stats and expenditures.
21+
for all advertising stats, billing, and payouts to publishers.
2222
Postgres handles everything and is among the most dependable pieces of our infrastructure.
2323
Postgres can handle [ML embeddings]({filename}../posts/2024-niche-ad-targeting.md)
2424
with [pgvector](https://github.com/pgvector/pgvector) for better contextual ad targeting
@@ -50,7 +50,7 @@ Despite how much we love Postgres at EthicalAds, this specifically has felt like
5050

5151
## Column-wise storage & DuckDB
5252

53-
These kinds of expensive aggregation queries historically are better fits for column databases, data warehouses and [OLAP databases](https://en.wikipedia.org/wiki/Online_analytical_processing) generally.
53+
Typically, these kinds of expensive aggregation queries are better fits for column databases, data warehouses and [OLAP databases](https://en.wikipedia.org/wiki/Online_analytical_processing) generally.
5454
We considered building out a data warehouse or other kinds of column oriented databases
5555
but never found something we really liked and we were always hesitant to add a second production system
5656
that could get out of sync with Postgres.
@@ -60,11 +60,11 @@ but these solutions all either didn't work for our use case or
6060
weren't supported on Azure's Managed Postgres, where we are hosted.
6161
This is where using DuckDB came to our rescue.
6262

63-
[DuckDB](https://duckdb.org/) is an in-process, analytical database.
63+
[DuckDB](https://duckdb.org/) is an in-process, analytical database and toolkit for analytical workloads.
6464
It's sort of like SQLite but for analytical workloads and querying data anywhere in a variety of formats.
6565
Like SQLite, you either run it in your app's process (Python for us)
6666
or you can run its own standalone CLI.
67-
It can read from CSV or Parquet files stored in blob storage
67+
It can read from CSV or Parquet files stored on disk or in blob storage
6868
or directly from an SQL database like Postgres.
6969

7070
Because most of our aggregations are for hourly or daily data and then data virtually never changes
@@ -160,9 +160,11 @@ This provides a number of advantages including:
160160

161161
"But David. Won't it be slow to run a SQL query against a remote file?"
162162
Firstly, these queries are strictly analytical queries, nothing transactional.
163-
Remember that any of the major clouds these blob storage files are going to be in or near
163+
Remember that with any of the major clouds these blob storage files are going to be in or near
164164
the data center where the rest of your servers are running.
165-
Querying them is a lot faster than I originally expected it to be.
165+
Querying them is a lot faster than I expected it to be.
166+
For reports, estimates and other analytical workloads where folks are used to waiting a few seconds,
167+
it works fairly well.
166168

167169
While DuckDB is pretty smart about [cross database queries](https://duckdb.org/2024/01/26/multi-database-support-in-duckdb.html),
168170
I put "joins" in scare quotes for a reason.
@@ -175,7 +177,9 @@ Expensive, cross-database queries require a bit of extra testing and scrutiny.
175177
Lastly, if anybody from the Azure team happens to be reading this,
176178
we'd love it if you'd add [pg_parquet](https://github.com/CrunchyData/pg_parquet/) to Azure Managed Postgres
177179
now that it [supports Azure storage](https://www.crunchydata.com/blog/pg_parquet-an-extension-to-connect-postgres-and-parquet).
178-
Dumping parquets from Postgres directly would be more optimized than what we're currently doing.
180+
Dumping parquets from Postgres directly would be much more optimized than
181+
doing that from DuckDB. DuckDB is still amazing for reading these files once they're written,
182+
but creating them directly with Postgres would be better still.
179183

180184

181185
## Wrapup

0 commit comments

Comments
 (0)