Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unasync'ed HTTPX #722

Closed
wants to merge 10 commits into from
Closed

Unasync'ed HTTPX #722

wants to merge 10 commits into from

Conversation

florimondmanca
Copy link
Member

@florimondmanca florimondmanca commented Jan 5, 2020

I wanted to get an actual feeling for what switching to unasync would look like and require, so…

This PR provides a working^1 implementation for an unasync'ed HTTPX, from concurrency backends to SyncClient… 🎉

Uses a sync backend adapted from #525, and a patched version of unasync waiting for python-trio/unasync#53 to be released. Also adds a test_sync_client.py module.

^1: the sync UDS implementation is still TODO, which is why CI fails.

Below some notes about what I learnt. I'll start with…

So what?

There are no major blockers to unasync'ing, however:

  • Unasync'ing does move a bunch of code around, and that's a bit painful in terms of reviewing especially.
  • Unasync'ing will require us to switch most of our generically named classes such as Dispatcher, HTTPConnection, etc. to the Async-prefixed variant. So: AsyncDispatcher, AsyncHTTPConnection, etc. (Or we could make unasync more clever than it needs to be, but we definitely don't want it to be too clever/configurable.)

In terms of doing this incrementally after adding an urllib3-backed

Achievements

  • The public API hasn't changed at all. This PR is 100% backwards-compatible (except maybe some imports in the docs).
  • Typing and editor support is as good as before, and linters are happy.
  • Development process now involves a scripts/compile step whenever we want to use the sync code (e.g. for tests or local experimentation).

Workflow

My main workflow for this PR was to move code to the _async folder, generate the _sync one, and run mypy to see what's wrong at least from a typing perspective. Eg. we don't want SyncConnectionPool to be importing the current HTTPConnection, because that's an async component. (And that hints us at unasync'ing HTTPConnection too. In general unasync'ing component X requires to unasync all its dependencies, which isn't surprising.)

Typing support

I'm very pleased to say that once the _sync version is compiled, mypy is 100% happy with the output. Editor support is great too, e.g. when importing code from ._sync we get autocompletions on modules and then on the imported components.

Top-level API

The top-level API here is switched to sync (in line with #715). (Not using unasync here since we wouldn't be using the async top-level API.)

Renaming stuff

Besides handling conversion for builtins, the only mechanism that unasync provides for renaming stuff is in the form of AsyncXYZ -> SyncXYZ.

(I think this is nice, because it brings a lot of consistency w.r.t. to "how do I unasync this component.")

But it doesn't handle (1) aread() -> read() / aiter_xxx() -> iter_xxx(), (2a) AsyncIteratorStream -> IteratorStream, (2b) func(aiterator=...) -> func(iterator=...), certainly not (2) ASGIDispatch -> WSGIDispatch.

To get things working, I solved (1) by defining the mappings manually via the AsyncFixes and SyncFixes classes, that e.g. have read_response(), which is async on AsyncFixes, but sync on SyncFixes. Then the _async part of the code imports AsyncFixes, which gets unasync'ed to SyncFixes, and everything works great. Another approach would be to unasync the response, i.e. drop our recent approach of "let's put both sync and async methods on Response", and rename .aiter_xxx() to .iter_xxx() so that we expose .iter_xxx() in all cases. (That doesn't make it clear that you'd need to async for the result though, does it?)

For (2a), (Async)IteratorStream doesn't have to be defined in the _(a)sync part of the code (it's in content_streams.py right now), so I left it there but just renamed IteratorStream to the more standard SyncIteratorStream. This allows unasync to seamlessly rename references to AsyncIteratorStream to SyncIteratorStream.

(2b) is similar: internal API, so I went ahead and standardized the aiterator parameter name to iterator.

For (3), I renamed ASGIDispatch to AsyncAppDispatch, so that it gets unasync'ed to SyncAppDispatch. We can then easily re-export things to ASGIDispatch and WSGIDispatch.

Proxy modules

There are several "proxy modules" left (i.e. modules that import stuff from _async and _sync to re-export them), mostly because I didn't want to break the import paths in the tests. We'd probably need to handle that eventually, but at least this is a technique for breaking things up a bit.

To commit or not to commit the sync code?

This PR git-ignores the _sync part of the code, which means it won't show up in the source tree in version control, and scripts/compile is a prerequisite to even running the tests locally. But the developer experience feels cleaner that way. Eg. it makes it clearer that _sync should never be edited manually, and re-compiles don't trigger git highlighting in the explorer.

httpx/backends/sync.py Outdated Show resolved Hide resolved
httpx/models.py Outdated Show resolved Hide resolved
@tomchristie
Copy link
Member

Amazing stuff! Ace bit of work. Crackin stuff on the SyncBackend too!

  • I’ve been meaning to raise an issue about pushing back on UDS support - I think it was a mistake to bring it in early on, since it adds a 1% use-case, at the expense of slowing us down in getting our 99% use cases to a feature complete 1.0 release.
  • I’d still like to start by pulling in a raw SyncClient & AsyncClient based on Reintroduce sync client #716 to get us to API completeness, so long as any PRs that get us there keep the test coverage all the way through. I probably think that’d actually be an okay approach longer term too, with unasync targeting the dispatch only. It’s possible that the dispatch implementation could even get split off into a really tightly scoped httpcore with a minimal no-models dispatch API, and we can just apply unasync to that package only, while httpcore has some small areas of duplication.

@florimondmanca
Copy link
Member Author

Yes, doing things the raw way for anything above dispatchers still sounds pretty valid. I think it's a viable path, though I'm still worried that some of the renaming hoops I had to go through to get this working result in issues down the line. Surely we'll need (and be able to?) to extend unasync to accomodate our needs.

Anyway, I pushed a last commit demonstrating how we could push the madness even further and have unasync generate the tests for the SyncClient. This is based off another little hack project from today, pytest-unasync. There are several edge cases related to handling imports that pytest-unasync really doesn't handle well, and the better approach would probably be to unasync the entire test modules instead. Well anyway, that was fun!

I don't see this going anywhere near master really, so I'm going to close it proactively. Just happy to see that unasync-ing everything is possible, and that we've now got proof of that if we want to refer to it in the future. Same than with SyncBackend and #525. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants