-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgres: Add support for partitioned tables #2465
Comments
How large of a lift would it be to support partitioned tables? Is that something I could contribute to? I'm guessing some involved logic in the create stage, given you have to declare the partitions manually in PostgreSQL. I have an incremental_by_time model with 1.7B rows, and maintaining the (necessary) indexes has caused things to slow dramatically. Partitioning by month, so at least the insert by day only has to reference one month's worth of data, would likely help tremendously. |
It would be a new concept for SQLMesh, right now its concept of partitioning is "tell the underlying engine what columns to partition on and let it handle it transparently". This would be active manual partitioning where SQLMesh itself has to create, drop and keep track of partitions. On the plus side, adding this for Postgres would also benefit Hive/Athena which has the same problem as Postgres |
Postgres supports Declarative Partitioning, as described in the link above, 5.11.2. Declarative Partitioning. |
Indeed it does, and even with Declarative Partitioning you still have to manually create the partitions (you just don't have to manually attach and detach them). For example:
Today, SQLMesh assumes that the DB will automatically create partitions when you start trying to insert data, which isnt the case for Postgres. So we would have to track partitions manually and issue the correct commands to create + drop them so that we can successfully insert data into the table. |
Would it simplify things to tie back to snapshot intervals? So forcing partitions to align (but contain possibly multiple of) with the interval unit? Given the way SQLMesh also creates all the physical tables, then evaluates models, making it inherit from the parent table with pg_partman would be possible, right? Then as snapshots are evaluated, it first checks if the physical table has a partition available for the interval. I’m not sure if incremental by time and range based partitioning have to go together in SQLMesh, but to me, it seems simpler to implement if both are used in conjunction. |
Yes, for incremental models partitioned by time, the plan is to use RANGE-based partitioning. I'm debating whether to make it multiples of the interval unit specifically or copy what Iceberg did which is nice and simple (make I'm also looking at how we can add LIST and HASH based partitioning because this lines up with partitioning strategies of other engines ( I'm not planning to add a dependency on |
Postgres natively supports partitioned tables
https://www.postgresql.org/docs/current/ddl-partitioning.html
The text was updated successfully, but these errors were encountered: