Postgres: Add support for partitioned tables #2465

mingfang · 2024-04-17T20:26:59Z

Postgres natively supports partitioned tables

https://www.postgresql.org/docs/current/ddl-partitioning.html

jrhorne · 2024-09-20T14:18:10Z

How large of a lift would it be to support partitioned tables? Is that something I could contribute to? I'm guessing some involved logic in the create stage, given you have to declare the partitions manually in PostgreSQL.

I have an incremental_by_time model with 1.7B rows, and maintaining the (necessary) indexes has caused things to slow dramatically. Partitioning by month, so at least the insert by day only has to reference one month's worth of data, would likely help tremendously.

@izeigerman

erindru · 2024-09-22T20:26:08Z

It would be a new concept for SQLMesh, right now its concept of partitioning is "tell the underlying engine what columns to partition on and let it handle it transparently".

This would be active manual partitioning where SQLMesh itself has to create, drop and keep track of partitions. On the plus side, adding this for Postgres would also benefit Hive/Athena which has the same problem as Postgres

mingfang · 2024-09-23T00:15:51Z

Postgres supports Declarative Partitioning, as described in the link above, 5.11.2. Declarative Partitioning.

erindru · 2024-09-23T01:06:04Z

Indeed it does, and even with Declarative Partitioning you still have to manually create the partitions (you just don't have to manually attach and detach them).

For example:

create table foo (id int, day date) partition by range (day);

insert into foo (id, day) values (1, '2023-01-01'::date);
--FAIL: ERROR: no partition of relation "foo" found for row.  Detail: Partition key of the failing row contains (day) = (2023-01-01).

create table foo_2023 partition of foo for values from ('2023-01-01') to ('2024-01-01');

insert into foo (id, day) values (1, '2023-01-01'::date); --succeeds

insert into foo (id, day) values (1, '2024-01-01'::date);
--FAIL: ERROR: no partition of relation "foo" found for row  Detail: Partition key of the failing row contains (day) = (2024-01-01).

Today, SQLMesh assumes that the DB will automatically create partitions when you start trying to insert data, which isnt the case for Postgres. So we would have to track partitions manually and issue the correct commands to create + drop them so that we can successfully insert data into the table.

jrhorne · 2024-09-25T02:00:18Z

Would it simplify things to tie back to snapshot intervals? So forcing partitions to align (but contain possibly multiple of) with the interval unit?

Given the way SQLMesh also creates all the physical tables, then evaluates models, making it inherit from the parent table with pg_partman would be possible, right?

Then as snapshots are evaluated, it first checks if the physical table has a partition available for the interval.

I’m not sure if incremental by time and range based partitioning have to go together in SQLMesh, but to me, it seems simpler to implement if both are used in conjunction.

erindru · 2024-09-25T04:12:38Z

Yes, for incremental models partitioned by time, the plan is to use RANGE-based partitioning. I'm debating whether to make it multiples of the interval unit specifically or copy what Iceberg did which is nice and simple (make year(time_col), month(time_col), day(time_col) or hour(time_col) the available options).

I'm also looking at how we can add LIST and HASH based partitioning because this lines up with partitioning strategies of other engines (identity and bucket in Iceberg).

I'm not planning to add a dependency on pg_partman at this stage. The SQLMesh Postgres adapter will be responsible for ensuring partitions exist on the target table prior to inserting data and should work out of the box on any Postgres instance without special configuration.

izeigerman added Improvement Improves existing functionality Engine: Postgres labels Apr 23, 2024

erindru self-assigned this Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres: Add support for partitioned tables #2465

Postgres: Add support for partitioned tables #2465

mingfang commented Apr 17, 2024 •

edited

Loading

jrhorne commented Sep 20, 2024

erindru commented Sep 22, 2024 •

edited

Loading

mingfang commented Sep 23, 2024

erindru commented Sep 23, 2024 •

edited

Loading

jrhorne commented Sep 25, 2024

erindru commented Sep 25, 2024

Postgres: Add support for partitioned tables #2465

Postgres: Add support for partitioned tables #2465

Comments

mingfang commented Apr 17, 2024 • edited Loading

jrhorne commented Sep 20, 2024

erindru commented Sep 22, 2024 • edited Loading

mingfang commented Sep 23, 2024

erindru commented Sep 23, 2024 • edited Loading

jrhorne commented Sep 25, 2024

erindru commented Sep 25, 2024

mingfang commented Apr 17, 2024 •

edited

Loading

erindru commented Sep 22, 2024 •

edited

Loading

erindru commented Sep 23, 2024 •

edited

Loading