Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alembic migration causes: ConnectionRefusedError: [Errno 111] Connect call #16959

Open
adam-brusselback opened this issue Feb 4, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@adam-brusselback
Copy link

Bug summary

I have Prefect deployed in a local k8s cluster along with CNPG in a single namespace (test-client).

I have the database connection setup correctly as far as I can tell. The prefect-server pod starts up fine, I can access the UI, I can even add a variable that I can subsequently query from the DB and see in the variable table.

But the pod then crashes after some short amount of time and throws the following error:
ConnectionRefusedError: [Errno 111] Connect call failed ('10.43.121.165', 5432) which the stack-trace indicates is caused by running the alembic migrations on startup.
The pod will then restart, and attempt to do the migration again (causing another error after some amount of time).

I've attempted to debug the best I can. I can connect to the database just fine from within the prefect-server pod by running:

python3 -c '
import asyncio
from sqlalchemy.ext.asyncio import create_async_engine

async def test_sqlalchemy():
    try:
        engine = create_async_engine("postgresql+asyncpg://app:mypassword@goacquire-cluster-prefect-rw:5432/app")
        async with engine.connect() as conn:
            print("SQLAlchemy connection successful!")
    except Exception as e:
        print(f"SQLAlchemy Error: {e}")

asyncio.run(test_sqlalchemy())
'

Which gives me a Successfully connected! message.

Any help would be greatly appreciated, as I have exhausted the ways I know to debug this type of problem.

Version info

18:00:05.090 | DEBUG   | prefect.profiles - Using profile 'ephemeral'
Version:             3.1.15
API version:         0.8.4
Python version:      3.11.11
Git commit:          3ac3d548
Built:               Thu, Jan 30, 2025 11:31 AM
OS/Arch:             linux/x86_64
Profile:             ephemeral
Server type:         server
Pydantic version:    2.10.6
Integrations:
  prefect-redis:     0.2.2

Additional context

Here is the stacktrace for the error:


 ___ ___ ___ ___ ___ ___ _____
| _ \ _ \ __| __| __/ __|_   _|
|  _/   / _|| _|| _| (__  | |
|_| |_|_\___|_| |___\___| |_|

Configure Prefect to communicate with the server with:

    prefect config set PREFECT_API_URL=http://0.0.0.0:4200/api

View the API reference documentation at http://0.0.0.0:4200/docs

Check out the dashboard at http://0.0.0.0:4200



ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/server/api/server.py", line 684, in lifespan
    await run_migrations()
  File "/usr/local/lib/python3.11/site-packages/prefect/server/api/server.py", line 592, in run_migrations
    await db.create_db()
  File "/usr/local/lib/python3.11/site-packages/prefect/server/database/interface.py", line 77, in create_db
    await self.run_migrations_upgrade()
  File "/usr/local/lib/python3.11/site-packages/prefect/server/database/interface.py", line 85, in run_migrations_upgrade
    await run_sync_in_worker_thread(alembic_upgrade)
  File "/usr/local/lib/python3.11/site-packages/prefect/utilities/asyncutils.py", line 235, in run_sync_in_worker_thread
    result = await anyio.to_thread.run_sync(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 962, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/utilities/asyncutils.py", line 245, in call_with_mark
    return call()
           ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/server/database/alembic_commands.py", line 36, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/server/database/alembic_commands.py", line 72, in alembic_upgrade
    alembic.command.upgrade(alembic_config(), revision, sql=dry_run)
  File "/usr/local/lib/python3.11/site-packages/alembic/command.py", line 406, in upgrade
    script.run_env()
  File "/usr/local/lib/python3.11/site-packages/alembic/script/base.py", line 586, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/usr/local/lib/python3.11/site-packages/alembic/util/pyfiles.py", line 95, in load_python_file
    module = load_module_py(module_id, path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/alembic/util/pyfiles.py", line 113, in load_module_py
    spec.loader.exec_module(module)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.11/site-packages/prefect/server/database/_migrations/env.py", line 201, in <module>
    run_async_from_worker_thread(apply_migrations)
  File "/usr/local/lib/python3.11/site-packages/prefect/utilities/asyncutils.py", line 256, in run_async_from_worker_thread
    return anyio.from_thread.run(call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/from_thread.py", line 59, in run
    return async_backend.run_async_from_thread(func, args, token=token)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2501, in run_async_from_thread
    return f.result()
           ^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2488, in task_wrapper
    return await func(*args)
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/server/database/_migrations/env.py", line 189, in apply_migrations
    async with engine.connect() as connection:
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/ext/asyncio/base.py", line 121, in __aenter__
    return await self.start(is_ctxmanager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/ext/asyncio/engine.py", line 274, in start
    await greenlet_spawn(self.sync_engine.connect)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 201, in greenlet_spawn
    result = context.throw(*sys.exc_info())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 3274, in connect
    return self._connection_cls(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 146, in __init__
    self._dbapi_connection = engine.raw_connection()
                             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 3298, in raw_connection
    return self.pool.connect()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 449, in connect
    return _ConnectionFairy._checkout(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 1263, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 712, in checkout
    rec = pool._do_get()
          ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/impl.py", line 179, in _do_get
    with util.safe_reraise():
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/impl.py", line 177, in _do_get
    return self._create_connection()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 390, in _create_connection
    return _ConnectionRecord(self)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 674, in __init__
    self.__connect()
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 900, in __connect
    with util.safe_reraise():
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 896, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/create.py", line 646, in connect
    return dialect.connect(*cargs, **cparams)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 622, in connect
    return self.loaded_dbapi.connect(*cargs, **cparams)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 961, in connect
    await_only(creator_fn(*arg, **kw)),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 132, in await_only
    return current.parent.switch(awaitable)  # type: ignore[no-any-return,attr-defined] # noqa: E501
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 196, in greenlet_spawn
    value = await result
            ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncpg/connection.py", line 2421, in connect
    return await connect_utils._connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncpg/connect_utils.py", line 1075, in _connect
    raise last_error or exceptions.TargetServerAttributeNotMatched(
  File "/usr/local/lib/python3.11/site-packages/asyncpg/connect_utils.py", line 1049, in _connect
    conn = await _connect_addr(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncpg/connect_utils.py", line 886, in _connect_addr
    return await __connect_addr(params, True, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncpg/connect_utils.py", line 931, in __connect_addr
    tr, pr = await connector
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asyncpg/connect_utils.py", line 802, in _create_ssl_connection
    tr, pr = await loop.create_connection(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1086, in create_connection
    raise exceptions[0]
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1070, in create_connection
    sock = await self._connect_sock(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 974, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 638, in sock_connect
    return await fut
           ^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 678, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('10.43.121.165', 5432)

ERROR:    Application startup failed. Exiting.
Server stopped!
@adam-brusselback adam-brusselback added the bug Something isn't working label Feb 4, 2025
@cicdw
Copy link
Member

cicdw commented Feb 5, 2025

@adam-brusselback sorry you're running into this; that is indeed confusing.

Here's something that might help you isolate the issue:

  1. There is a setting that allows you to turn off the auto-migration on application startup: PREFECT_SERVER_DATABASE_MIGRATE_ON_START=false. Doing this will at least prevent alembic from connecting each time you start the server - however you will need to ensure you run migrations on any upgrades.
  2. To run migrations yourself, run prefect server database upgrade and optionally pass the -y flag to proceed automatically.

This second command may help you determine if something is going on with a bad alembic configuration or something else since you can decouple these two aspects of startup.

@adam-brusselback
Copy link
Author

adam-brusselback commented Feb 6, 2025

Sadly, that doesn't work.
If I do PREFECT_SERVER_DATABASE_MIGRATE_ON_START=false, the pod doesn't stay up long enough for me to do anything at all. So I can't run prefect server database upgrade to check what is happening.

EDIT: Okay, figured out a workaround. First I start the container with migrate_on_start=true, then I modify the setting and do a helm update and it creates a new pod with migrate_on_start=false, which then allows me to get into the pod's shell and run prefect server database upgrade...

But when I do that, it works perfectly.

user@pop-os:/media/user/Data/IdeaProjects/monorepo$ kubectl exec -it -n test-client     prefect-server-85fc8979b6-thzb5 -- /bin/bash
I have no name!@prefect-server-85fc8979b6-thzb5:~$ prefect server database upgrade
Are you sure you want to upgrade the Prefect database at postgresql+asyncpg://app:***@goacquire-cluster-prefect-rw:5432/app? [y/N]: y
Running upgrade migrations ...
Migrations succeeded!
Prefect database at postgresql+asyncpg://app:***@goacquire-cluster-prefect-rw:5432/app upgraded!
I have no name!@prefect-server-85fc8979b6-thzb5:~$ prefect server database upgrade
Are you sure you want to upgrade the Prefect database at postgresql+asyncpg://app:***@goacquire-cluster-prefect-rw:5432/app? [y/N]: y
Running upgrade migrations ...
Migrations succeeded!
Prefect database at postgresql+asyncpg://app:***@goacquire-cluster-prefect-rw:5432/app upgraded!
I have no name!@prefect-server-85fc8979b6-thzb5:~$ 

I ran the database upgrade twice to make sure it wasn't something that worked the first time, and failed the second time.

So i'm at even more of a loss now.

@cicdw
Copy link
Member

cicdw commented Feb 6, 2025

@adam-brusselback is CNPG a part of the full helm setup and being deployed at the same time as the Prefect server? This all sounds like the database isn't actually ready until some non-trivial amount of time after the prefect server start command has already begun attempting to connect to the database. If they are being deployed at the same time you may need to consider adding an initContainer to wait for the database to come online.

Another quick-to-implement option which is ugly but seems like it would work is to change the pod's command from prefect server start to sleep 10 && prefect server start (or however long of a delay is necessary).

@adam-brusselback
Copy link
Author

Yes, they are deployed from the same chart at the same time.

Will give that a shot tomorrow.

I wouldn't have thought that would be an issue considering the whole pod is restarted until the database eventually comes online. Once the DB is online (1+ min) the prefect pod stops restarting, the migrations run, and I can access the UI (until that original error occurs after a couple min).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants