Making BHTTPSerializer compliant to known-length messages #26

ArnaudLcm · 2024-12-23T17:39:27Z

Motivation:

As defined in RFC9292, you can choose to use either the known-length format or an indeterminate-length format. The purpose of this PR is to give the possibility for users to use the known-length format.

Modifications:

Finite State Machine (FSM): Added an FSM to the serializer to define allowed state transitions. This prevents invalid sequences, such as having a Body HTTP Part followed by a Header HTTP Part.
BHTTPSerializer: Introduced a flag within BHTTPSerializer to determine whether the known-length or indeterminate-length format will be used.
Serialization of Known-Length Sections: Added methods to BHTTPSerializer specifically for serializing known-length sections.
Unit-tests: I updated unit tests to cover scenarios where we have out-of-order HTTP parts but also to verify the known-length implementation.

Design Choices:

Buffer Reference: The output buffer is kept as a reference to optimize performance. Copying the entire buffer at the end of the serialization would be inefficient.
Inlining serializeContentChunk: I inlined the serializeContentChunk function to minimize overhead from function prologues and epilogues, as it is expected to be used frequently.
Buffers for Known-Length Format: For known-length formats, two buffers were introduced: chunkBuffer and fieldSectionBuffer. These buffers are used when the request/response consists of multiple parts. I did some researchs to find what could be the optimal initial capacities for those 2 buffers. I found out that I could set them to 500 and 700, respectively, based on data from the following sources:
- SPDY Whitepaper for header field sizes: https://www.chromium.org/spdy/spdy-whitepaper
- Arxiv paper on HTTP body size for body content size: https://arxiv.org/pdf/1405.2330
  (If needed, I can provide more details on how these values were derived.)
  However, as underlined in the ByteBuffer implementation: https://github.com/apple/swift-nio/blob/main/Sources/NIOCore/ByteBuffer-core.swift, the initial capacity is set to: let newCapacity = capacity == 0 ? 0 : capacity.nextPowerOf2ClampedToMax(). Therefore, I decided to not use those values as it would very likely to be way too much. Given this, I opted to rely on the first size encountered when initializing the two buffers.
FSM Transition Definitions: The state transition definitions in the FSM are declared as static, with the intention that these definitions will be placed in the data section of the compiled binary. This should save memory but as I only used Swift for a couple of days now, I prefer to take distance. I tried to verify this by reviewing the assembly output here: https://godbolt.org/z/s88Kaxqn8 but I wasn't able to confirm with certainty due to the noise in the output.

Motivation: As defined in [RFC9292](https://www.rfc-editor.org/rfc/rfc9292), you can choose to use either the known-length format or an indeterminate-length format. The purpose of this PR is to give the possibility for users to use the known-length format. Modifications: - Finite State Machine (FSM): Added an FSM to the serializer to define allowed state transitions. This prevents invalid sequences, such as having a Body HTTP Part followed by a Header HTTP Part. - BHTTPSerializer: Introduced a flag within BHTTPSerializer to determine whether the known-length or indeterminate-length format will be used. - Serialization of Known-Length Sections: Added methods to BHTTPSerializer specifically for serializing known-length sections. - Unit-tests: I updated unit tests to cover scenarios where we have out-of-order HTTP parts but also to verify the known-length implementation. Design Choices: - Buffer Reference: The output buffer is kept as a reference to optimize performance. Copying the entire buffer at the end of the serialization would be inefficient. - Inlining serializeContentChunk: I inlined the serializeContentChunk function to minimize overhead from function prologues and epilogues, as it is expected to be used frequently. - Buffers for Known-Length Format: For known-length formats, two buffers were introduced: chunkBuffer and fieldSectionBuffer. These buffers are used when the request/response consists of multiple parts. I did some researchs to find what could be the optimal initial capacities for those 2 buffers. I found out that I could set them to 500 and 700, respectively, based on data from the following sources: - SPDY Whitepaper for header field sizes: https://www.chromium.org/spdy/spdy-whitepaper - Arxiv paper on HTTP body size for body content size: https://arxiv.org/pdf/1405.2330 (If needed, I can provide more details on how these values were derived.) However, as underlined in the ByteBuffer implementation: https://github.com/apple/swift-nio/blob/main/Sources/NIOCore/ByteBuffer-core.swift, the initial capacity is set to: let newCapacity = capacity == 0 ? 0 : capacity.nextPowerOf2ClampedToMax() . Therefore, I decided to not use those values as it would very likely to be way too much. Given this, I decided not to use the predefined values, as they would likely be too large. Instead, I opted to rely on the initial size encountered when initializing the two buffers. - FSM Transition Definitions: The state transition definitions in the FSM are declared as static, with the intention that these definitions will be placed in the data section of the compiled binary. This should save memory but as I only used Swift for a couple of days now, I prefer to take distance. I tried to verify this by reviewing the assembly output here: https://godbolt.org/z/s88Kaxqn8 but I wasn't able to confirm with certainty due to the noise in the output.

Lukasa

Great start, thanks for this! I've left a few notes in the diff.

Sources/ObliviousHTTP/BHTTPSerializer.swift

…M and better framing indicator computation

Lukasa

Thanks for this! I think we're doing well here, I've left a few notes in the diff.

Sources/ObliviousHTTP/BHTTPSerializer.swift

Lukasa

Cool, so far this is looking really nice. I've left a few stylistic notes in the diff, but it really isn't much.

Sources/ObliviousHTTP/BHTTPSerializer.swift

ArnaudLcm · 2025-01-17T18:51:26Z

Cool, so far this is looking really nice. I've left a few stylistic notes in the diff, but it really isn't much.

Thank @Lukasa ! I truly appreciate it. If you have any other feedback, feel free to share, I’m really looking for ways to get better !

Sources/ObliviousHTTP/BHTTPSerializer.swift

…sure exclusive access on buffers

Arnaud added 2 commits December 23, 2024 18:37

Adding more documentation for the BHTTPSerializer init

0d67526

ArnaudLcm marked this pull request as ready for review December 24, 2024 18:42

Lukasa reviewed Jan 2, 2025

View reviewed changes

Arnaud added 3 commits January 2, 2025 16:47

Switch enum BHTTPSerializerType to a struct called SerializerType

8e9351f

Refactor BHTTPSerializer: removing the ensureState function on the FS…

c1524f3

…M and better framing indicator computation

Merge serializations and transitions logic into the serializer FSM.

1fa44b7

Lukasa reviewed Jan 17, 2025

View reviewed changes

Sources/ObliviousHTTP/BHTTPSerializer.swift Outdated Show resolved Hide resolved

Sources/ObliviousHTTP/BHTTPSerializer.swift Outdated Show resolved Hide resolved

Switch BHTTP Serializer type and state to private

8789022

Lukasa reviewed Jan 17, 2025

View reviewed changes

Lukasa added the 🆕 semver/minor Adds new public API. label Jan 17, 2025

Add few stylistic fixs for BHTTPSerialzier

fc9daf6

Lukasa reviewed Jan 20, 2025

View reviewed changes

Sources/ObliviousHTTP/BHTTPSerializer.swift Outdated Show resolved Hide resolved

Sources/ObliviousHTTP/BHTTPSerializer.swift Outdated Show resolved Hide resolved

Sources/ObliviousHTTP/BHTTPSerializer.swift Outdated Show resolved Hide resolved

ArnaudLcm added 2 commits January 21, 2025 17:47

Change SerializerType protocol from Equatable to Hashble, Sendable

8dfb9b8

Move serialization process from BHTTPSerializer to a new struct to en…

2010ef9

…sure exclusive access on buffers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making BHTTPSerializer compliant to known-length messages #26

Making BHTTPSerializer compliant to known-length messages #26

ArnaudLcm commented Dec 23, 2024 •

edited

Loading

Lukasa left a comment

Lukasa left a comment

Lukasa left a comment

ArnaudLcm commented Jan 17, 2025

Making BHTTPSerializer compliant to known-length messages #26

Are you sure you want to change the base?

Making BHTTPSerializer compliant to known-length messages #26

Conversation

ArnaudLcm commented Dec 23, 2024 • edited Loading

Motivation:

Modifications:

Design Choices:

Lukasa left a comment

Choose a reason for hiding this comment

Lukasa left a comment

Choose a reason for hiding this comment

Lukasa left a comment

Choose a reason for hiding this comment

ArnaudLcm commented Jan 17, 2025

ArnaudLcm commented Dec 23, 2024 •

edited

Loading