-
-
Notifications
You must be signed in to change notification settings - Fork 22
Add API for Request, SyncRequestData, and AsyncRequestData #151
Conversation
@sethmlarson Thank you for this! It's detailed enough for us to think about actual usage, but also small enough for us to give 30% feedback. And the way you prepared the I initially assumed that this was the API we were going to expose to users, but in fact it's our internal representation of requests after some preprocessing, right? In #124 @njsmith mentions "low level" and "high level" APIs. I think we should explicitly state whether we're going to have those two APIs or not. I think a single high-level API is enough? I'm asking this because low-level urllib3 uses Regarding request data, we looked in #135 at all the things requests support. I think the end result is either bytes, a file-like object that can be rewinded or just an iterator, and this is exactly what you're supporting in For the async API, I don't know what would be best and will let @njsmith comment. Maybe the situation is similar to that of response data that we explored in #123? Oh, and by the way, since we've removed all the code in src/urllib3, I think we can remove the travis and appveyor configuration files as they're currently distracting. (But probably not in this PR.) |
265fc3c
to
df4a019
Compare
There'll probably be quite a few merge conflicts early on if I'm doing these concurrently so make sure these are squash-merged :) |
Ref: https://github.com/python-trio/urllib3/issues/135 | ||
""" | ||
|
||
def read(self, nbytes: int) -> bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding there only being one interface, I think we should have something like SyncRequestData.sync_read()
and AsyncRequestData.async_read()
or maybe we switch to using async for chunk in request.data
which would automatically get transformed by unasync but we'd forfeit the ability to grab a certain amount of data. I guess that interface is easier to implement anyways?
The reason I think this would be good to have the split is you'd only have to subclass AsyncRequestData
and could technically use that class in both async and synchronous requests. Would just need to have good error reporting if you're passed input data that is for example, an async generator, and then using it in a sync session. I'm guessing that wouldn't happen often and an error message would fix any ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example of my plan in action:
import typing
StrOrInt = typing.Union[str, int]
FormType = typing.Union[
typing.Sequence[typing.Tuple[str, StrOrInt]],
typing.Mapping[str, typing.Union[StrOrInt, typing.Sequence[StrOrInt]]],
]
def _int_to_urlenc():
values = {}
special = {0x2A, 0x2D, 0x2E, 0x5F}
for byte in range(256):
if (
(0x61 <= byte <= 0x7A)
or (0x030 <= byte <= 0x5A and byte != 0x40)
or (byte in special)
): # Keep the ASCII
values[byte] = bytes((byte,))
elif byte == 0x020: # Space -> '+'
values[byte] = b"+"
else: # Percent-encoded
values[byte] = b"%" + hex(byte)[2:].upper().encode()
return values
INT_TO_URLENC = _int_to_urlenc()
class URLEncoded(AsyncRequestData):
"""Implements x-www-form-urlencoded as a RequestData object"""
def __init__(self, form: FormType):
self._form = form
self._data = None
@property
def content_type(self) -> str:
return "application/x-www-form-urlencoded"
@property
def content_length(self) -> int:
return len(self._encode_form())
def is_rewindable(self) -> bool:
return True
def __iter__(self) -> typing.Iterable[bytes]:
yield self._encode_form()
async def __aiter__(self) -> typing.AsyncIterable[bytes]:
yield self._encode_form()
def _encode_form(self) -> bytes:
if self._data is None:
def serialize(x: str) -> bytes:
return b"".join([INT_TO_URLENC[byte] for byte in x.encode("utf-8")])
output: typing.List[bytes] = []
for k, vs in (
self._form.items() if hasattr(self._form, "items") else self._form
):
if isinstance(vs, str) or not hasattr(vs, "__iter__"):
vs = (vs,)
for v in vs:
output.append(serialize(k) + b"=" + serialize(v))
self._data = b"&".join(output)
return self._data
Suddenly this one class can be used for both synchronous and asynchronous requests. We keep the is_rewindable
(or maybe rename it to is_replayable
? or something else) and then we'd signal a "rewind" by calling __aiter__
or __iter__
again.
in the case of a timeout/socket error on an idempotent request, a redirect, | ||
etc so that the new request can be sent. This works for file-like objects | ||
and bytes / strings (where it's a no-op). | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also add a content_type
property so that can be passed to the request as a default if none is set.
What do you think about using something like this as our internal representation for request bodies? class AsyncRequestData:
content_length: Optional[int]
async def data_chunks(self) -> AsyncIterable[bytes]:
... So instead of having a stateful object that we rewind, it acts as a factory for iterators. (I'm focusing on the async version because I'm assuming the sync version will be identical except for being unasync'ed.) Random thoughts:
|
I'd rather go with this approach, the only issue I see is how an "unsynced" Session could use a single "Async/SyncRequestData"? I was proposing the interface with
You're on the same track here as me.
Definitely true.
That's true, there's not much we can do here if the underlying data isn't rewindable. We can drop that section of the API.
>>> import requests
>>> f = open("setup.py", "rb")
>>> requests.post("https://httpbin.org/anything", data=f)
<Response [200]>
>>> f.closed
False Looks like no, maybe we don't need a close method either! |
Hmm, I was imagining we'd have something like: class RequestData:
async def data_chunks(self):
if self._consumed:
raise ...
elif self._bytes is not None:
yield self._bytes
elif self._sync_iterable is not None:
self._consumed = True
for chunk in self._sync_iterable:
yield chunk
elif self._async_iterable is not None:
self._consumed = True
async for chunk in self._async_iterable:
yield chunk
elif self._sync_file is not None:
# you get the idea This would live at |
The |
Well, I'm one of those weirdos who refuses to use subclassing unless there's absolutely no other option :-). But assuming you mean that this might be an abstract interface that has multiple implementations, then yeah, unasync should just take care of that, I think? |
Closed in favor of #185. |
I started off the API design by removing everything under
src/
and replacing it with only*.pyi
files. We'll see how far I can push the API design while only using .pyi, if I need to switch to real .py files eventually that'll be fine I think?This is the first commit which adds
Request
model along with theSyncRequestData
andAsyncRequestData
wrappers which are an API that will wrap all the types of data we accept for a request (bytes, str, files, file-like objects, iterators) which will be passed down to the HTTP lifecycle and eventually the low-level backend stream objects.I'm tagging down my thoughts within the doc-strings. If you have additional comments I can add them inline as well