Reuse connections between requests with Postgres #3030

kate-goldenring · 2025-02-27T16:31:29Z

The Postgres host component creates a client during calls to open_connection this client can then be reused across all database interactions within that request.

Ideally, connections should be shared between requests, with clients cached.

The text was updated successfully, but these errors were encountered:

kate-goldenring · 2025-03-03T22:17:54Z

There may be a concern around trying to use a connection that is in an unknown state. We should investigate whether the Rust PG library manages a connection pool for us

lann · 2025-03-03T22:45:01Z

If we don't find a good postgres pooling library we can probably use a generic pool and just be sure to ROLLBACK; DISCARD ALL before giving a connection back to the pool.

itowlson · 2025-03-10T00:22:03Z

Found an Actix sample (https://github.com/actix/examples/blob/master/databases/postgres/src/main.rs) which uses deadpool_postgres (https://docs.rs/deadpool-postgres/latest/deadpool_postgres/). The deadpool project seems active and responsive so I'll have a noodle on that.

itowlson · 2025-03-10T03:23:44Z

@kate-goldenring @lann deadpool offers options for how connections get reused, from "fast" (claims to be a safe choice unless you have special needs) to ~~"slow"~~ "clean" (even safer than a safe choice). What the different levels mean is at:

https://docs.rs/deadpool-postgres/latest/deadpool_postgres/enum.RecyclingMethod.html

How comfortable are folks with the different levels? I haven't quantified the performance tradeoffs, so I appreciate that I'm not giving you much to go on as regards the consequences.

itowlson · 2025-03-10T22:25:02Z

Kicking the tyres with this. Unscientific test with a trivial select from a one-row table on localhost. Pool size 4:

$ ./bombardier-linux-amd64 -c 200 -d 5s localhost:3000

Statistics        Avg      Stdev        Max
  Reqs/sec      3444.76     452.73    4161.56
  Latency       57.76ms    28.87ms   370.36ms
  HTTP codes:
    1xx - 0, 2xx - 17387, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   830.79KB/s

(A pool size of 16 made only a "within one standard deviation" difference in this case. I couldn't increase the number of connections much more because Spin started falling over with component instance limits.)

For comparison, Spin 3.1.2:

$ ./bombardier-linux-amd64 -c 200 -d 5s localhost:3000

Statistics        Avg      Stdev        Max
  Reqs/sec      1493.60     647.61    4660.55
  Latency      132.42ms   114.27ms      0.93s
  HTTP codes:
    1xx - 0, 2xx - 4272, 3xx - 0, 4xx - 0, 5xx - 3318
    others - 0
  Throughput:   402.11KB/s

with the 5xx errors logging Handler returned an error: Error::ConnectionFailed("db error: FATAL: sorry, too many clients already\n\nCaused by:\n FATAL: sorry, too many clients already")

itowlson · 2025-03-11T03:32:56Z

The perf figures get dramatically different when I make the database interaction a fraction more complex. Here's one where, with my one line table, I do an INSERT, a SELECT, and then a DELETE of the inserted row. I did the comparison using 20 bombardier connections (so as not to cause Spin 3.1.2 grief) so these aren't directly comparable to above.

Spin 3.1.2 (20 conns) (baseline):

Statistics        Avg      Stdev        Max
  Reqs/sec       818.40     313.97    1409.58
  Latency       24.40ms    11.92ms   202.83ms
  HTTP codes:
    1xx - 0, 2xx - 4104, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   749.88KB/s

The PR with a pool size of 4 - much worse than baseline:

Statistics        Avg      Stdev        Max
  Reqs/sec       170.62     134.81     738.38
  Latency      115.65ms    60.46ms   244.73ms
  HTTP codes:
    1xx - 0, 2xx - 872, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    71.23KB/s

The PR with a pool size of 64 - similarish to baseline but not great:

Statistics        Avg      Stdev        Max
  Reqs/sec       606.60     338.90    2525.12
  Latency       32.87ms    13.91ms   128.44ms
  HTTP codes:
    1xx - 0, 2xx - 3052, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   806.49KB/s

(A pool size of 128 was a marginal improvement over 64.)

When I whomped the big pool with 200 bombardier connections I got a lot of "too many clients" errors from the database.

I tried bombarding at 100 connections. Spin 3.1.2 was wobbly and I got the occasional "too many clients".

Spin 3.1.2 (100 conns) (baseline) - this was one of the good runs

Statistics        Avg      Stdev        Max
  Reqs/sec      1391.13     661.84    3767.87
  Latency       71.32ms    15.91ms   157.82ms
  HTTP codes:
    1xx - 0, 2xx - 7023, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:     2.59MB/s

PR with pool size 64 - now distinctly better:

Statistics        Avg      Stdev        Max
  Reqs/sec      1706.48     888.22    4314.62
  Latency       58.33ms    34.37ms   377.26ms
  HTTP codes:
    1xx - 0, 2xx - 8598, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:     4.91MB/s

A 128 pool was again better but still within a standard deviation. The performance with 64 was pretty variable - that whopping SD is no lie, but the run I showed seems around in the middle.

I tried using deadpool's Fast recycling method and the throughput and latency were not noticeably better (the runs I tried seemed slightly worse, but the variation is so great that I wouldn't read anything into that).

I don't really know how to interpret any of this in terms of making choices about the pool size, or if it's even meaningful with relatively few tests on a machine with other stuff going on (you know who you are, rust-analyser). Maybe we should punt to the user/admin but that feels a bit lazy, and we still need a decent default. I realise that what data I have is pretty scattershot but I'd welcome guidance from pooling nerds, even if only pointers to better tests to run.

itowlson · 2025-03-11T03:41:20Z

I should probably stress that all these numbers were for a single local database. So there was only one pool in operation. I haven't yet tested with multiple databases, or with off-machine network latency.

kate-goldenring · 2025-03-11T16:29:16Z

Tying up what i see as the summary from above

Test case A: simple test (one row table, pool size of 4)
Conclusion: deadpool_postgres (Fast) is 3x faster
Test case B: complex operation (3 ops on row, pool size of 4)
Conclusion: deadpool_postgres (Fast) is 5x slower
Test case C: complex operation (3 ops on row, pool size of 64)
Conclusion: deadpool_postgres (Fast) is 0.5x slower

It seems like with the pool, the same connection may not be being used for the 3 operations. Are they all done in one transaction or in 3 separate ones?

itowlson · 2025-03-11T20:07:37Z

@kate-goldenring I would add that I ran test case C first with 20 connections (deadpool slower) and then with 100 connections (deadpool faster and more stable).

I'm not sure what "Fast" refers to in your comment - if it relates to connection recycling then these were done as Clean (I tried Fast but it wasn't faster).

I didn't use transactions but each operation within a request was done on the same Spin connection. The code I used was:

    let conn = spin_sdk::pg::Connection::open("host=localhost user=postgres password=my_password dbname=mydb")?;

    conn.execute("INSERT INTO pets(age, name, is_finicky) VALUES ($1, $2, $3)", &[ParameterValue::Int32(3), ParameterValue::Str("Hobbes".to_owned()), ParameterValue::Boolean(false)])?;
    let sel_res = conn.query("SELECT * FROM pets WHERE age < $1", &[ParameterValue::Int32(20)])?;
    conn.execute("DELETE FROM pets WHERE name = $1", &[ParameterValue::Str("Hobbes".to_owned())])?;

On the Spin side, each Postgres connection string is associated with a pool (so all requests used the same pool). The object created in Connection::open() was reused throughout the life of the Connection resource, but this is a deadpool wrapper object around the tokio_postgres::Client so I'm not sure how it works internally - my impression was that it retained the same Client throughout but I haven't investigated.

I'd be delighted if you have guidance on better ways to evaluate this - I'm flailing a bit I'm afraid.

kate-goldenring moved this to Triage Needed in Spin Triage Mar 3, 2025

kate-goldenring added this to Spin Triage Mar 3, 2025

kate-goldenring moved this from Triage Needed to Investigating / Open For Comment in Spin Triage Mar 3, 2025

macolso moved this from Investigating / Open For Comment to In Progress in Spin Triage Mar 10, 2025

macolso moved this from In Progress to Investigating / Open For Comment in Spin Triage Mar 10, 2025

itowlson linked a pull request Mar 10, 2025 that will close this issue

Pool Postgres connections #3043

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse connections between requests with Postgres #3030

Reuse connections between requests with Postgres #3030

kate-goldenring commented Feb 27, 2025

kate-goldenring commented Mar 3, 2025

lann commented Mar 3, 2025

itowlson commented Mar 10, 2025

itowlson commented Mar 10, 2025

itowlson commented Mar 10, 2025

itowlson commented Mar 11, 2025

itowlson commented Mar 11, 2025

kate-goldenring commented Mar 11, 2025

itowlson commented Mar 11, 2025

Reuse connections between requests with Postgres #3030

Reuse connections between requests with Postgres #3030

Comments

kate-goldenring commented Feb 27, 2025

kate-goldenring commented Mar 3, 2025

lann commented Mar 3, 2025

itowlson commented Mar 10, 2025

itowlson commented Mar 10, 2025

itowlson commented Mar 10, 2025

itowlson commented Mar 11, 2025

itowlson commented Mar 11, 2025

kate-goldenring commented Mar 11, 2025

itowlson commented Mar 11, 2025