Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse connections between requests with Postgres #3030

Open
kate-goldenring opened this issue Feb 27, 2025 · 9 comments · May be fixed by #3043
Open

Reuse connections between requests with Postgres #3030

kate-goldenring opened this issue Feb 27, 2025 · 9 comments · May be fixed by #3043

Comments

@kate-goldenring
Copy link
Contributor

The Postgres host component creates a client during calls to open_connection this client can then be reused across all database interactions within that request.

Ideally, connections should be shared between requests, with clients cached.

@kate-goldenring kate-goldenring moved this to Triage Needed in Spin Triage Mar 3, 2025
@kate-goldenring kate-goldenring moved this from Triage Needed to Investigating / Open For Comment in Spin Triage Mar 3, 2025
@kate-goldenring
Copy link
Contributor Author

There may be a concern around trying to use a connection that is in an unknown state. We should investigate whether the Rust PG library manages a connection pool for us

@lann
Copy link
Collaborator

lann commented Mar 3, 2025

If we don't find a good postgres pooling library we can probably use a generic pool and just be sure to ROLLBACK; DISCARD ALL before giving a connection back to the pool.

@itowlson
Copy link
Collaborator

Found an Actix sample (https://github.com/actix/examples/blob/master/databases/postgres/src/main.rs) which uses deadpool_postgres (https://docs.rs/deadpool-postgres/latest/deadpool_postgres/). The deadpool project seems active and responsive so I'll have a noodle on that.

@itowlson
Copy link
Collaborator

@kate-goldenring @lann deadpool offers options for how connections get reused, from "fast" (claims to be a safe choice unless you have special needs) to "slow" "clean" (even safer than a safe choice). What the different levels mean is at:

https://docs.rs/deadpool-postgres/latest/deadpool_postgres/enum.RecyclingMethod.html

How comfortable are folks with the different levels? I haven't quantified the performance tradeoffs, so I appreciate that I'm not giving you much to go on as regards the consequences.

@macolso macolso moved this from Investigating / Open For Comment to In Progress in Spin Triage Mar 10, 2025
@macolso macolso moved this from In Progress to Investigating / Open For Comment in Spin Triage Mar 10, 2025
@itowlson
Copy link
Collaborator

Kicking the tyres with this. Unscientific test with a trivial select from a one-row table on localhost. Pool size 4:

$ ./bombardier-linux-amd64 -c 200 -d 5s localhost:3000

Statistics        Avg      Stdev        Max
  Reqs/sec      3444.76     452.73    4161.56
  Latency       57.76ms    28.87ms   370.36ms
  HTTP codes:
    1xx - 0, 2xx - 17387, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   830.79KB/s

(A pool size of 16 made only a "within one standard deviation" difference in this case. I couldn't increase the number of connections much more because Spin started falling over with component instance limits.)

For comparison, Spin 3.1.2:

$ ./bombardier-linux-amd64 -c 200 -d 5s localhost:3000

Statistics        Avg      Stdev        Max
  Reqs/sec      1493.60     647.61    4660.55
  Latency      132.42ms   114.27ms      0.93s
  HTTP codes:
    1xx - 0, 2xx - 4272, 3xx - 0, 4xx - 0, 5xx - 3318
    others - 0
  Throughput:   402.11KB/s

with the 5xx errors logging Handler returned an error: Error::ConnectionFailed("db error: FATAL: sorry, too many clients already\n\nCaused by:\n FATAL: sorry, too many clients already")

@itowlson itowlson linked a pull request Mar 10, 2025 that will close this issue
@itowlson
Copy link
Collaborator

The perf figures get dramatically different when I make the database interaction a fraction more complex. Here's one where, with my one line table, I do an INSERT, a SELECT, and then a DELETE of the inserted row. I did the comparison using 20 bombardier connections (so as not to cause Spin 3.1.2 grief) so these aren't directly comparable to above.

Spin 3.1.2 (20 conns) (baseline):

Statistics        Avg      Stdev        Max
  Reqs/sec       818.40     313.97    1409.58
  Latency       24.40ms    11.92ms   202.83ms
  HTTP codes:
    1xx - 0, 2xx - 4104, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   749.88KB/s

The PR with a pool size of 4 - much worse than baseline:

Statistics        Avg      Stdev        Max
  Reqs/sec       170.62     134.81     738.38
  Latency      115.65ms    60.46ms   244.73ms
  HTTP codes:
    1xx - 0, 2xx - 872, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    71.23KB/s

The PR with a pool size of 64 - similarish to baseline but not great:

Statistics        Avg      Stdev        Max
  Reqs/sec       606.60     338.90    2525.12
  Latency       32.87ms    13.91ms   128.44ms
  HTTP codes:
    1xx - 0, 2xx - 3052, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:   806.49KB/s

(A pool size of 128 was a marginal improvement over 64.)

When I whomped the big pool with 200 bombardier connections I got a lot of "too many clients" errors from the database.

I tried bombarding at 100 connections. Spin 3.1.2 was wobbly and I got the occasional "too many clients".

Spin 3.1.2 (100 conns) (baseline) - this was one of the good runs

Statistics        Avg      Stdev        Max
  Reqs/sec      1391.13     661.84    3767.87
  Latency       71.32ms    15.91ms   157.82ms
  HTTP codes:
    1xx - 0, 2xx - 7023, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:     2.59MB/s

PR with pool size 64 - now distinctly better:

Statistics        Avg      Stdev        Max
  Reqs/sec      1706.48     888.22    4314.62
  Latency       58.33ms    34.37ms   377.26ms
  HTTP codes:
    1xx - 0, 2xx - 8598, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:     4.91MB/s

A 128 pool was again better but still within a standard deviation. The performance with 64 was pretty variable - that whopping SD is no lie, but the run I showed seems around in the middle.

I tried using deadpool's Fast recycling method and the throughput and latency were not noticeably better (the runs I tried seemed slightly worse, but the variation is so great that I wouldn't read anything into that).

I don't really know how to interpret any of this in terms of making choices about the pool size, or if it's even meaningful with relatively few tests on a machine with other stuff going on (you know who you are, rust-analyser). Maybe we should punt to the user/admin but that feels a bit lazy, and we still need a decent default. I realise that what data I have is pretty scattershot but I'd welcome guidance from pooling nerds, even if only pointers to better tests to run.

@itowlson
Copy link
Collaborator

I should probably stress that all these numbers were for a single local database. So there was only one pool in operation. I haven't yet tested with multiple databases, or with off-machine network latency.

@kate-goldenring
Copy link
Contributor Author

Tying up what i see as the summary from above

  • Test case A: simple test (one row table, pool size of 4)
    Conclusion: deadpool_postgres (Fast) is 3x faster
  • Test case B: complex operation (3 ops on row, pool size of 4)
    Conclusion: deadpool_postgres (Fast) is 5x slower
  • Test case C: complex operation (3 ops on row, pool size of 64)
    Conclusion: deadpool_postgres (Fast) is 0.5x slower

It seems like with the pool, the same connection may not be being used for the 3 operations. Are they all done in one transaction or in 3 separate ones?

@itowlson
Copy link
Collaborator

@kate-goldenring I would add that I ran test case C first with 20 connections (deadpool slower) and then with 100 connections (deadpool faster and more stable).

I'm not sure what "Fast" refers to in your comment - if it relates to connection recycling then these were done as Clean (I tried Fast but it wasn't faster).

I didn't use transactions but each operation within a request was done on the same Spin connection. The code I used was:

    let conn = spin_sdk::pg::Connection::open("host=localhost user=postgres password=my_password dbname=mydb")?;

    conn.execute("INSERT INTO pets(age, name, is_finicky) VALUES ($1, $2, $3)", &[ParameterValue::Int32(3), ParameterValue::Str("Hobbes".to_owned()), ParameterValue::Boolean(false)])?;
    let sel_res = conn.query("SELECT * FROM pets WHERE age < $1", &[ParameterValue::Int32(20)])?;
    conn.execute("DELETE FROM pets WHERE name = $1", &[ParameterValue::Str("Hobbes".to_owned())])?;

On the Spin side, each Postgres connection string is associated with a pool (so all requests used the same pool). The object created in Connection::open() was reused throughout the life of the Connection resource, but this is a deadpool wrapper object around the tokio_postgres::Client so I'm not sure how it works internally - my impression was that it retained the same Client throughout but I haven't investigated.

I'd be delighted if you have guidance on better ways to evaluate this - I'm flailing a bit I'm afraid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Investigating / Open For Comment
Development

Successfully merging a pull request may close this issue.

3 participants