Preventing (and fixing) parser mismatch vulnerabilities

As I discussed previously, parser mismatches are an underappreciated class of vulnerabilities. In that post I described what they are and showcased a few examples, but today I'd like to talk about what to do about them.

I'll present a few options, with advice on when to use each technique.

Today's example

First, we'll need an example to use for all of the techniques. None of the real examples I had was useful for illustrating all of the techniques, so here's an imaginary (but still realistic) one.

Let's imagine a microservice architecture where an authorization server sits in front of an backend server. The auth server accepts signed requests in the form of a JSON document with an embedded signature and checks whether the signature matches the user and request. If the signature is valid for that request, the auth server sends the JSON on to the backend server so it can act on the request. For instance, if Bob's client needed to load his account data, it would send {"user": "bob", "do": "get-account-data", "sig": "Tm90...IHJlYWw="} where the sig is a signature over the rest of the object, using Bob's key.

And here's what an exploit might look like, with a repeated user key:

{
  "user": "alice",
  "user": "bob",
  "do": "get-account-data",
  "sig": "Tm90...IHJlYWw="
}

The JSON standard explicitly does not indicate how repeated keys should be handled; an implementation may take the first value, take the last value, throw an error, anything! So perhaps the authorization server takes the last value, "bob", and checks the signature using Bob's key. Some internal call like check_sig(user="bob", do="get-account-data", sig="Tm90...IHJlYWw=") returns true, so the JSON is passed along to the backend server. If the backend server has a different JSON parser, it might take the first (or lexicographically first!) value, "alice", and interpret the request as "get Alice's account data". In this way, Bob can illegitimately gain access to Alice's account data.

(The rabbit hole goes much deeper than just repeated keys; you may not even want a signature, and definitely not an embedded signature. See Latacora's How (not) to sign a JSON object for a much more in-depth treatment. This toy example is also missing some other essential security features. Basically, don't do anything like this.)

So, how will we fix this?

Technique 0: "Don't do that, then"

We have to start here: Since all parser mismatch vulnerabilities involve the use of multiple parsers (that disagree about certain inputs)... why not just use the same parser in both places? The authorization and backend servers could use the same JSON parser.

And yes, if you can do this, it should work. But if you're already in the situation of asking "how do I prevent parser mismatch", there's probably some good reason this isn't an option. Maybe the two code locations are in different programming languages, different applications that are not deployed in lockstep, or even just are owned by different people. For instance, one of the parsers might be in your web server and the other in the user's browser. Or multiple peer clients implement a protocol (e.g. bittorrent), and you have only authored one of them.

Also consider change over time. Even if the two locations use the same parser now, one might be changed to use a different one later, and parser mismatch could be reintroduced.

Don't get me wrong, using the same parser everywhere would be great! And I think it's very much worth using parser-generators like ANTLR to turn formal grammars into executable code, since the same grammar can be reused across multiple languages. (I've used this approach in my URL parsing library to good effect: In goes a grammar lifted straight from the RFCs, and out comes parser code that perfectly implements the spec.) Avoiding handwritten parsers, striving to make unambiguous specs, providing formal grammars for data formats—all of these will reduce parser mismatch. But none of these is a guarantee, and often these things aren't under your control anyhow.

Technique 1: Pass the parse

On to the first technique. The essence of parser mismatch is that the parsers disagree on the parse for certain inputs. So one option is to have the first code location do its parsing, then pass the parsed elements to the second code location.

Here's a strawman solution to the JSON example that nevertheless clearly illustrates the principle: The authorization server parses the input, verifies the signature, and then places the parsed pieces into an HTTP request to the backend server, like so:

POST /api/get-account-data

As-User: bob

Well, barring URL and HTTP header injections, there's certainly no opportunity for disagreement, now; parsing of the JSON happens just once, and the extraneous alice field never reaches the backend. But we'd have to dramatically change the format the authorization server speaks to the backend! Why can't we just send the parsed pieces as JSON, like we were doing originally? And my answer is that it simply would have been confusing from a pedagogical standpoint. ;-) But that's not a good architectural reason, so let's stick with sending JSON instead: The authorization server explicitly constructs {"user": "bob", "do": "get-account-data"} and sends that to the backend. The code might look like this:

val request_data = parse_json(request.body);
if check_sig(request_data) {
    send_to_backend(to_json({
        'user': request_data['user'],
        'do': request_data['do'],
    }))
}

That redundancy is still a little awkward, and we'll see a cleaner option in the next section. Before that, though, some quick notes on when this option is appropriate:

  • Always works... if you can use it.
  • Only possible when one code location is passing data to another. Doesn't work if e.g. two peers in a network are parsing the same piece of data. Works great if both locations are in the same application and a parsed data structure can be passed around, but people are usually already doing that for the sake of performance. (Big exception: URLs are often parsed multiple times in the same application.)
  • Requires that you have control over how the data is conveyed from one location to the next.
  • Prevents extraneous data from being passed around. This isn't just for performance. When used in a guard/actor pair (as defined in the first post) it also helps ensure the actor only receives information that was fully understood by the guard. This can reduce attack surface and make it easier to reason about the application.

Technique 2: Reserialize

Notice that the effectiveness of the code sample in the pass-the-parse section relied entirely on the parse_json/to_json pair of calls. The code happened to drop the sig field, since that's irrelevant to the backend server, but including it probably wouldn't cause any harm. If we didn't mind including it, then this code would be functionally equivalent:

val request_data = parse_json(request.body);
if check_sig(request_data) {
    send_to_backend(to_json(request_data))
}

The request data gets round-tripped through a parser and then a serializer. The parser removes ambiguities, and the serializer re-encodes the data back into the original format. The assumption here is that even if the parser accepts ambiguous inputs, the serializer is extremely unlikely to produce ambiguous outputs. I refer to this technique as reserialization.

It looks a lot cleaner than pass-the-parse, and the resulting code is completely general. The vulnerability is prevented by code that doesn't know anything about the specifics of the original attack, or even which fields or ambiguities would have enabled the vulnerability.

There's some risk that without the explicit construction that occurs in pass-the-parse, some extraneous information could be sent to the backend and be acted on inappropriately. This can itself result in vulnerabilities, but it's no worse than the original situation and at least the parser mismatch is prevented.

A few points on how this differs from pass-the-parse, since they can look very similar:

  • Reserialization always uses the same input format for both the authorization and backend servers, while pass-the-parse can use different ones.
  • Pass-the-parse always uses explicit construction, while reserialization does not require it (though it can, optionally).

Sometimes it's not clear which is in use: Code that performed a filtering step on the fields of the parsed request data before passing it to the backend could be described as implementing either technique.

So, when does this make sense to use? Here's a look at the benefits and constraints of this technique:

  • No need to change how the first location speaks to the second location.
  • Only requires changes to the first location.
  • Patch may not require any knowledge of business logic.
  • Allows (and requires) same input format for both code locations.
  • Requires having a full parser and a complete data model. This almost certainly wouldn't work for the HTTP header splitting or HTTP request smuggling examples from my last post. What tool would you reach for that could reserialize an entire HTTP request, down to fixing the ambiguities in those examples? Even HTML would be a little iffy; in the industry, very few tools model HTML as data, preferring to pass it around as text (or templates, at best.) But for JSON and URLs? Works great.
  • The extra work required may have performance implications.

Technique 3: Be strict

So far we've looked at two very similar techniques that both rely on parsing (and accepting) ambiguous inputs. But there's another clear option: Reject ambiguous inputs! Use a strict parser and just reject anything that doesn't match the grammar. This will prevent disagreements on malformed inputs by simply not letting those inputs through.

"Strict" can mean different things. If the format in question has a well-defined grammar, such as in RFC 3986 for URLs, then a parser generated from the grammar (e.g. using ANTLR) will reject invalid inputs. In the example with URLs and hostnames, the server could reject a backslash-containing URL outright, so the browser's lax parser never gets a chance to see it.

In the JSON example, the authorization server could simply turn on strict-mode in its JSON parser and reject the malicious input. This would be a case of being stricter than the spec, and is probably the right option.

No one can reasonably complain about their API call with duplicate JSON keys being rejected, but there are some areas where there's an expectation of laxness. HTML is a classic example, with a deeply ingrained bias towards trying to do something reasonable with malformed inputs (to the point of even expanding the spec to allow things that would otherwise be malformed, and indicating how to process those according to a certain model of user intent.) How should a guard/actor pair work with user-submitted HTML that contains syntax errors if rejection is not acceptable? What many applications choose is to convert any malformed syntax into safely escaped characters and properly nested blocks, producing new, valid HTML as a prelude to checking it for acceptability. (This is not the same kind of reserialization as before, as the meaning of the input may be changed significantly in the process of covering over the various syntax errors.) This preprocessing is an error-prone process and may drop or mangle parts of the input in ways that annoy the user, but on the whole this seems to be more acceptable to the general population than rejection with a syntax error.

In the other direction, you may have the freedom to be much more strict than the spec, and only accept a narrow subset of what it would allow. If you accept URLs, do you need to accept all protocols? Do you need to accept URLs with a userinfo component? What about a host component with non-ASCII characters? Do you need to allow a hostname to end in a period? It's always good to consider edge cases that represent possible legitimate user behavior, but they don't always need to be supported. Sometimes it's OK to draw a line and say that the caller needs to take on some responsibility for clean inputs. Just bear in mind that it can be very, very difficult to anticipate all of the situations your code might encounter; just ask people with the family name "Null" or "O'Malley" or the first name "Anna Marie" their opinion of software that makes assumptions about reasonable inputs.

Besides the issue of how strict you can be, sometimes this technique just isn't available at an architectural level. If you have a guard/actor pair as in today's example (and as described in the previous post) then having either the guard or the actor do the rejection is likely to be acceptable and effective. But sometimes the two locations are parsing "in parallel", not in sequence with one another, and having one reject and the other accept the input may count as a failure. This might be the case for nodes in a cryptocurrency network.

Summing up the characteristics of this technique:

  • May not always have the freedom to just reject questionable inputs, although "strict" isn't just one thing.
  • If code locations are in a sequence, rejecting inputs in either location may work. Unlike the other techniques, this one can sometimes work even when you can't control the first location in a sequence.
  • Not a guaranteed fix. You can always imagine the second location using a parser written so terribly that some input will be ambiguous or parsed incorrectly.

Conclusions

Depending on how you think about it, there are anywhere from two to four options presented here. Despite the similarity of pass-the-parse and reserialization, they're actually on opposite sides of a divide:

  • Technique 0, using the same parser in both places, seemed like a trivial and often-useless suggestion. But pass-the-parse is actually a sneaky way of doing just that: The data is only parsed once, for a given format, and so only one code location needs a parser. Now the "same" parser is used in all (one) locations it is needed.
  • Reserialization and strict parsing, on the other hand, are resigned to the idea that there will be multiple parsers. They instead try to narrow the space of possible inputs that an attacker can make use of. One technique alters ambiguous inputs to be in a safer subset of the spec, while the other simply rejects them.

Beyond such questions of ontology, there are a variety of practical concerns involved in choosing which technique to use. This is also not an exhaustive list. (I'd love to provide a flowchart or table to help with decision-making, but it seems impractical. Let me know if you can come up with one!) More importantly, though, I hope that this has given you a better sense of how to grapple with parser mismatches at a structural level and that you'll be better able to recognize them and work around them in whatever way is most appropriate for your code and architecture.

Updates

  • 2024-05-08: Replaced example of parentheses in URLs, as it was not correct. Sub-delims may be used in URL components unencoded so long as they do not introduce ambiguity. The new text uses error-tolerant HTML parsing and mangling as the example of a place where software chooses to use preprocessing.

No comments yet. Feed icon

Self-service commenting is not yet reimplemented after the Wordpress migration, sorry! For now, you can respond by email; please indicate whether you're OK with having your response posted publicly (and if so, under what name).