Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making BHTTPSerializer compliant to known-length messages #26

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ArnaudLcm
Copy link

@ArnaudLcm ArnaudLcm commented Dec 23, 2024

Motivation:

As defined in RFC9292, you can choose to use either the known-length format or an indeterminate-length format. The purpose of this PR is to give the possibility for users to use the known-length format.

Modifications:

  • Finite State Machine (FSM): Added an FSM to the serializer to define allowed state transitions. This prevents invalid sequences, such as having a Body HTTP Part followed by a Header HTTP Part.
  • BHTTPSerializer: Introduced a flag within BHTTPSerializer to determine whether the known-length or indeterminate-length format will be used.
  • Serialization of Known-Length Sections: Added methods to BHTTPSerializer specifically for serializing known-length sections.
  • Unit-tests: I updated unit tests to cover scenarios where we have out-of-order HTTP parts but also to verify the known-length implementation.

Design Choices:

  • Buffer Reference: The output buffer is kept as a reference to optimize performance. Copying the entire buffer at the end of the serialization would be inefficient.
  • Inlining serializeContentChunk: I inlined the serializeContentChunk function to minimize overhead from function prologues and epilogues, as it is expected to be used frequently.
  • Buffers for Known-Length Format: For known-length formats, two buffers were introduced: chunkBuffer and fieldSectionBuffer. These buffers are used when the request/response consists of multiple parts. I did some researchs to find what could be the optimal initial capacities for those 2 buffers. I found out that I could set them to 500 and 700, respectively, based on data from the following sources:
  • FSM Transition Definitions: The state transition definitions in the FSM are declared as static, with the intention that these definitions will be placed in the data section of the compiled binary. This should save memory but as I only used Swift for a couple of days now, I prefer to take distance. I tried to verify this by reviewing the assembly output here: https://godbolt.org/z/s88Kaxqn8 but I wasn't able to confirm with certainty due to the noise in the output.

Arnaud added 2 commits December 23, 2024 18:37
Motivation:

As defined in [RFC9292](https://www.rfc-editor.org/rfc/rfc9292),  you can choose to use either the known-length format or an indeterminate-length format.
The purpose of this PR is to give the possibility for users to use the known-length format.

Modifications:

    - Finite State Machine (FSM): Added an FSM to the serializer to define allowed state transitions. This prevents invalid sequences, such as having a Body HTTP Part followed by a Header HTTP Part.
    - BHTTPSerializer: Introduced a flag within BHTTPSerializer to determine whether the known-length or indeterminate-length format will be used.
    - Serialization of Known-Length Sections: Added methods to BHTTPSerializer specifically for serializing known-length sections.
    - Unit-tests: I updated unit tests to cover scenarios where we have out-of-order HTTP parts but also to verify the known-length implementation.

Design Choices:

    - Buffer Reference: The output buffer is kept as a reference to optimize performance. Copying the entire buffer at the end of the serialization would be inefficient.
    - Inlining serializeContentChunk: I inlined the serializeContentChunk function to minimize overhead from function prologues and epilogues, as it is expected to be used frequently.
    - Buffers for Known-Length Format: For known-length formats, two buffers were introduced: chunkBuffer and fieldSectionBuffer. These buffers are used when the request/response consists of multiple parts.
     I did some researchs to find what could be the optimal initial capacities for those 2 buffers.
     I found out that I could set them to 500 and 700, respectively, based on data from the following sources:
        - SPDY Whitepaper for header field sizes: https://www.chromium.org/spdy/spdy-whitepaper
        - Arxiv paper on HTTP body size for body content size: https://arxiv.org/pdf/1405.2330
     (If needed, I can provide more details on how these values were derived.)
     However, as underlined in the ByteBuffer implementation: https://github.com/apple/swift-nio/blob/main/Sources/NIOCore/ByteBuffer-core.swift, the initial capacity is set to: let newCapacity = capacity == 0 ? 0 : capacity.nextPowerOf2ClampedToMax() . Therefore, I decided to not use those values as it would very likely to be way too much. Given this, I decided not to use the predefined values, as they would likely be too large. Instead, I opted to rely on the initial size encountered when initializing the two buffers.
    - FSM Transition Definitions: The state transition definitions in the FSM are declared as static, with the intention that these definitions will be placed in the data section of the compiled binary. This should save memory but as I only used Swift for a couple of days now, I prefer to take distance. I tried to verify this by reviewing the assembly output here: https://godbolt.org/z/s88Kaxqn8 but I wasn't able to confirm with certainty due to the noise in the output.
@ArnaudLcm ArnaudLcm marked this pull request as ready for review December 24, 2024 18:42
Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start, thanks for this! I've left a few notes in the diff.

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I think we're doing well here, I've left a few notes in the diff.

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, so far this is looking really nice. I've left a few stylistic notes in the diff, but it really isn't much.

@Lukasa Lukasa added the 🆕 semver/minor Adds new public API. label Jan 17, 2025
@ArnaudLcm
Copy link
Author

Cool, so far this is looking really nice. I've left a few stylistic notes in the diff, but it really isn't much.

Thank @Lukasa ! I truly appreciate it. If you have any other feedback, feel free to share, I’m really looking for ways to get better !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🆕 semver/minor Adds new public API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants