Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 5266: Do not parse the same TLS record fragments twice #240

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
350f3a6
Bug 5266: Do not parse the same TLS record fragments twice
rousskov Apr 18, 2023
75f04a1
Bug 5266: Do not parse the same TLS record fragments twice
eduard-bagdasaryan Dec 4, 2023
6ef59b6
Adjusted the solution after testing and removed stale code
eduard-bagdasaryan Dec 6, 2023
f3650cf
Undid an out-of-scope change
eduard-bagdasaryan Dec 6, 2023
095601c
Undone an out-of-scope change
eduard-bagdasaryan Dec 7, 2023
328b704
Simplified, removing try/catch from HandshakeParser::parseMessages()
eduard-bagdasaryan Dec 18, 2023
6932606
Moved tkMessages.commit() into individual parsers
eduard-bagdasaryan Dec 19, 2023
2180c95
Do not skip 'empty' message fragments
eduard-bagdasaryan Dec 21, 2023
32c24e5
Moved tkMessages.commit() back into parseMessages()
eduard-bagdasaryan Dec 21, 2023
c5f70c1
Reworked cycles over tkRecords and tkMessages
eduard-bagdasaryan Dec 26, 2023
9a96bc7
Polished method names and a description.
eduard-bagdasaryan Dec 26, 2023
eba1b8e
Polished with 'auto'.
eduard-bagdasaryan Dec 26, 2023
5d4b9b5
Removed a typo
eduard-bagdasaryan Dec 26, 2023
e304647
A couple of fixes
eduard-bagdasaryan Dec 28, 2023
3a88001
Added a debugs() and simplified parseModernRecord()
eduard-bagdasaryan Dec 29, 2023
71802a6
Call tkRecords.commit() calls from parsers to parseHello()
eduard-bagdasaryan Dec 29, 2023
dbf2481
Refactored HandshakeParser::parseMessages()
eduard-bagdasaryan Jan 4, 2024
6b12750
Simplified, removing ChangeCipherSpec class
eduard-bagdasaryan Jan 5, 2024
47eb627
Do not call tkMessages.atEnd() before rollback()
eduard-bagdasaryan Jan 5, 2024
4b525e4
Do not silently ignore empty messages in parseNonEmptyMessages()
eduard-bagdasaryan Jan 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions src/parser/BinaryTokenizer.h
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ class BinaryTokenizer
/// this method avoids append overheads during incremental parsing
void reinput(const SBuf &data, const bool expectMore) { data_ = data; expectMore_ = expectMore; }

/// adds more data bytes to parse
void append(const SBuf &data) { data_.append(data); }

/// make progress: future parsing failures will not rollback beyond this point
void commit();

Expand Down Expand Up @@ -110,6 +113,11 @@ class BinaryTokenizer
/// debugging helper for parsed multi-field structures
void got(uint64_t size, const char *description) const;

/// whether more data bytes may arrive in the future
auto expectingMore() const { return expectMore_; }
/// allow or prohibit arriving more data bytes in the future
void expectingMore(bool val) { expectMore_ = val; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, in my earlier change request, I did not try to imply that this method should be renamed as well: The testing method should be spelled expectingMore() -- we are inspecting what the object is currently doing (the corresponding private data member should have been named the same but we are not going to rename it in this PR). The setting method should be spelled expectMore() -- we are telling the object what to do in the future.

Suggested change
void expectingMore(bool val) { expectMore_ = val; }
void expectMore(const bool em) { expectMore_ = em; }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.


const BinaryTokenizerContext *context; ///< debugging: thing being parsed

protected:
Expand Down
57 changes: 32 additions & 25 deletions src/security/Handshake.cc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

#include "squid.h"
#include "base/IoManip.h"
#include "parser/forward.h"
#include "sbuf/Stream.h"
#include "security/Handshake.h"
#if USE_OPENSSL
Expand Down Expand Up @@ -274,24 +275,26 @@ Security::HandshakeParser::parseModernRecord()
Must(record.fragment.length() || record.type == ContentType::ctApplicationData);

if (currentContentType != record.type) {
parseMessages();
Must(tkMessages.atEnd()); // no currentContentType leftovers
fragments = record.fragment;
currentContentType = record.type;
} else {
fragments.append(record.fragment);
}

if (tkRecords.atEnd() && !done)
rousskov marked this conversation as resolved.
Show resolved Hide resolved
parseMessages();
const auto haveUnparsedRecordBytes = !tkRecords.atEnd();
const auto expectMoreRecordLayerBytes = tkRecords.expectingMore();
// TODO: consider adding BinaryTokenizer::exhausted() instead
const auto expectMoreMessageLayerBytes = haveUnparsedRecordBytes || expectMoreRecordLayerBytes;

tkMessages.expectingMore(expectMoreMessageLayerBytes);
tkMessages.append(record.fragment);

parseMessages();
}

/// parses one or more "higher-level protocol" frames of currentContentType
void
Security::HandshakeParser::parseMessages()
{
tkMessages.reset(fragments, false);
for (; !tkMessages.atEnd(); tkMessages.commit()) {
rousskov marked this conversation as resolved.
Show resolved Hide resolved
for (tkMessages.rollback(); !done; tkMessages.commit()) {
switch (currentContentType) {
case ContentType::ctChangeCipherSpec:
parseChangeCipherCpecMessage();
Expand Down Expand Up @@ -631,33 +634,37 @@ Security::HandshakeParser::parseSupportedVersionsExtension(const SBuf &extension
void
Security::HandshakeParser::skipMessage(const char *description)
{
// tkMessages/fragments can only contain messages of the same ContentType.
// skip at least one byte
if (tkMessages.atEnd())
throw Parser::InsufficientInput();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TLS record may have a zero-length fragment (e.g., RFC 8446 says that "Zero-length fragments of Application Data MAY be sent"). Thus, a valid TLS message may have zero length1, and our code must handle zero-length TLS fragments and empty messages (without getting stuck, etc.).

This "at least one byte" check can be reasonably interpreted as effectively using the opposite invariant -- that every message has some bytes. PR's current parseMessages() loop also effectively assumes that successfully parsing any message allows us to make progress which is not technically true for empty messages.

Moreover, if tkMessages is no longer expectingMore() and is atEnd(), then it may be wrong to throw InsufficientInput here because our input was sufficient to parse existing tkMessages -- more tkMessages bytes (with the same record type) are not expected and adding those bytes would lead to a parsing failure). Yet, parseMessages() loop should stop.

To find the right solution, we can try to (re)define what parseMessages() means. Once we have that definition, and assuming we are not going to add more special exceptions, we will through InsufficientInput when tkMessages do not have enough bytes to satisfy the defined parseMessages() goal.

After rejecting many candidates for various reasons, I can suggest only two reasonable parseMessages() definitions:

  • parseMessages() incrementally parses all sequential currentContentType TLS fragments.
  • "Optimization": Same as the above, but the function may stop earlier if done becomes truthy2.

Official code may be seen as trying to implement the first bullet (which ignores done state while we are parsing tkMessages), but official code has several bugs (that this PR is trying to fix). For example, official code does not throw InsufficientInput when there are not enough bytes to parse ChangeCipherSpec. Official code also does not throw InsufficientInput when there are no bytes to parse the second (not yet received) sequential application data fragment after parsing the first such fragment. In summary, official code does not implement the first bullet design.

The current PR parseMessages() quits parsing tkMessages when done becomes truthy. However,
current PR code does not stop parsing after successfully parsing all sequential currentContentType TLS fragments! It keeps going (if done is still nil). This is probably a PR bug -- I failed to find a reasonable definition for parseMessages() that would fit the current PR code.

AFAICT, to implement the above parseMessages() definition(s), we have to do the following:

  1. Remove this check.
  2. Replace skipMessage() call in parseChangeCipherCpecMessage() with proper ChangeCipherSpec parsing code so that we do not skip multiple ChangeCipherSpecs in that method (as if it is one spec) and simplify skipMessage() semantics by focusing skipMessage() on two other use cases/callers.
  3. Refactor parseMessages() to make two different kinds of message parsing explicit:
/// Incrementally parses all sequential currentContentType TLS fragments.
/// Successfully stops parsing earlier if `done` becomes set.
void
Security::HandshakeParser::parseMessages()
{
    switch (currentContentType) {
    case ContentType::ctChangeCipherSpec:
        return parseNonEmptyMessages(parseChangeCipherSpecMessage);
    case ContentType::ctAlert:
        return parseNonEmptyMessages(parseAlertMessage);
    case ContentType::ctHandshake:
        return parseNonEmptyMessages(parseHandshakeMessage);
    case ContentType::ctApplicationData:
        return skipPossiblyEmptyMessages("app data [fragment]");
    }
    return skipPossiblyEmptyMessages("unknown ContentType msg [fragment]");
}

/// Incrementally parses all sequential currentContentType messages using the given TLS message parser.
/// Each message is assumed to be serialized using at least one byte.
/// At least one message is expected per sequence.
/// Successfully stops parsing earlier if `done` becomes set.
void
Security::HandshakeParser::parseNonEmptyMessages(... messageParser)
{
    // XXX: We should not silently ignore empty message sequences
    for (tkMessages.rollback(); !done && !tkMessages.exhausted(); tkMessages.commit())
        messageParser();
}

The existing rollback/commit loop will go inside the new parseNonEmptyMessages() method. AFAICT, to match the new definition, that loop should run until tkMessages are exhausted (i.e. atEnd() and not expectingMore()) unless done becomes truthy earlier (the second bullet "optimization").

In the above sketch, skipPossiblyEmptyMessages() is a skipMessage() replacement that does not need a loop. It should throw InsufficientInput unless tkMessages are exhausted. I am not sure whether it needs rollback() and/or commit().

The above XXX in sketched parseNonEmptyMessages() will need to be addressed as well, which will probably require more refactoring, but it may be easier to do later, as a separate commit/step.

Footnotes

  1. The same conclusion is also true for invalid TLS messages (i.e., messages that violate some TLS MUSTs by being empty) AFAICT. If only invalid messages could have zero length, then we could, perhaps, get away with throwing InsufficientInput excpetions when seeing such messages. Since valid messages may have zero length, we cannot claim insufficient input when parsing them.

  2. This is not just a performance optimization: By stopping earlier, we prevent triggering Must() violations and similar problems in rare cases where TLS input is malformed or unsupported (but sufficient for Squid to see whatever it wanted to see, making done truthy).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement this approach (up to 47eb627).

The above XXX

I added this XXX for now but I am not sure how 'empty message sequences' are possible here.
HandshakeParser::parseModernRecord() guarantees that tkMessages will get at least one byte here (due to the protection 'MUST NOT send zero-length [non-application] fragments') - i.e., exhausted() should be false. Also, 'done' should have been checked earlier in parseHello().

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this XXX for now but I am not sure how 'empty message sequences' are possible here. HandshakeParser::parseModernRecord() guarantees that tkMessages will get at least one byte here (due to the protection 'MUST NOT send zero-length [non-application] fragments')

Do you mean that we are "protected" by a "MUST NOT" text in an RFC? Surely an attacker can violate that MUST -- we cannot rely on that kind of protection or guarantees in a parser... AFAICT, the sketched code will just ignore/skip such invalid messages (i.e. there are no infinite loops or similar dangers here). This is not the end of the world, but to the parseNonEmptyMessages() caller it would look like a non-empty message was successfully parsed and that feels wrong, especially when some future refactoring might unknowingly create an infinite loop when handling such an invalid input (because all the method name suggests that this code should not worry about empty messages).

Also, 'done' should have been checked earlier in parseHello().

Yes, but that messageParser() call inside the loop can make done truthy, so we have to check on every iteration AFAICT.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we are "protected" by a "MUST NOT" text in an RFC?

No, I meant the protection in HandshakeParser::parseModernRecord():

Must(record.fragment.length() || record.type == ContentType::ctApplicationData);

With this protection, only ctApplicationData can be empty, which are handled by skipPossiblyEmptyMessages().

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I meant [Must(record.fragment.length()...)] in HandshakeParser::parseModernRecord

Ah, I see. I agree that the above Must() protects parseNonEmptyMessages() from empty input.

However, it is difficult to assert that protection using the current code structure. I think we should use that protection to avoid the XXX without introducing assertions. We are expecting at least one byte, so we should parse input before checking whether it is exhausted:

tkMessages.rollback();
do {
    parseMessage();
    tkMessages.commit();
} while (!done && !tkMessages.exhausted());

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, adjusted at 4b525e4.

// tkMessages can only contain messages of the same ContentType.
// To skip a message, we can and should skip everything we have [left]. If
// we have partial messages, debugging will mislead about their boundaries.
// we buffered a partial message, we will need to read/skip multiple times.
tkMessages.skip(tkMessages.leftovers().length(), description);
}

bool
Security::HandshakeParser::parseHello(const SBuf &data)
{
try {
if (!expectingModernRecords.configured())
expectingModernRecords.configure(!isSslv2Record(data));
if (!expectingModernRecords.configured())
expectingModernRecords.configure(!isSslv2Record(data));

// data contains everything read so far, but we may read more later
tkRecords.reinput(data, true);
tkRecords.rollback();
while (!done) {
try {
if (tkRecords.atEnd())
return false;

// data contains everything read so far, but we may read more later
tkRecords.reinput(data, true);
tkRecords.rollback();
while (!done)
parseRecord();
debugs(83, 7, "success; got: " << done);
// we are done; tkRecords may have leftovers we are not interested in
return true;
}
catch (const Parser::BinaryTokenizer::InsufficientInput &) {
debugs(83, 5, "need more data");
return false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "return false" for InsufficientInput case was correct, and the caller relies on it AFAICT: We have to keep it or move InsufficientInput processing to the caller. The added atEnd() check above is another red flag. Overall, I am not sure that this method refactoring improved things, but perhaps I am missing some essential changes in this diff noise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, these changes failed one of my per-byte test - the tkRecords.atEnd() check does not work when tkRecords have some unparsed bytes insufficient for the next record. We need to know somehow who threw InsufficientInput() in the catch() below - and, depending on that, either return false (insufficient tkRecords) or continue to the next record (insufficient tkMessages). I tried to add some synced() method (whether all parsed bytes were comitted) into BinaryTokenizer and calculate this state as (!synced() || atEnd()) for tkMessages. But it looks a bit complicated/unreliable - we should not calculate the state again with the exception at hand. Unfortunately, InsufficientInput() does not have any attribute about its initiator.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either return false (insufficient tkRecords) or continue to the next record (insufficient tkMessages). ... Unfortunately, InsufficientInput() does not have any attribute about its initiator.

To treat InsufficientInput emitted by message layer differently, catch InsufficientInput emitted by the that layer at that layer (i.e. when we parse messages). InsufficientInput caught while making parseRecord() calls will be specific to the records layer. AFAICT, InsufficientInput emitted while parsing messages should be "caught, reported, and ignored" -- the control will then bubble up to the next parseRecord() loop iteration.

Alternatively, we could parse all records we have read before parsing the messages in the accumulated fragments. If message layer always sees all available same-type records, then InsufficientInput thrown by any layer will have the same meaning -- need to read more bytes (record bytes and message bytes). However, it is not clear to me whether such refactoring will make the code better or worse overall. Please try to fix the current (i.e. "parse one record fragment at a time") code before we attempt such a significant refactoring.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InsufficientInput emitted by the that layer at that layer

Ok. I hesitated doing this because recently we rejected tkMessages try/catch for a reason.

}
catch (const Parser::BinaryTokenizer::InsufficientInput &) {
debugs(83, 5, "need more data");
}
}
return false; // unreached
// we are done; tkRecords may have leftovers we are not interested in
return true;
}

/// A helper function to create a set of all supported TLS extensions
Expand Down
3 changes: 0 additions & 3 deletions src/security/Handshake.h
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,6 @@ class HandshakeParser

const char *done; ///< not nil if we got what we were looking for

/// concatenated TLSPlaintext.fragments of TLSPlaintext.type
SBuf fragments;

/// TLS record layer (parsing uninterpreted data)
Parser::BinaryTokenizer tkRecords;

Expand Down
Loading