Http Message Parsing in Java Web Servers
Parsing a request message from a client is the very beginning step of a web server.
A typical GET
request in text, for example, may look something like this:
GET /steventen HTTP/1.1\r\n
Host: github.com
Accept-Language: en-us
Accept: text/json
User-Agent: curl/7.16.3
Cookie: a=123; b=456;\r\n\r\n
This syntax and content follow RFC 7230 and RFC 7231. When a web server receives the above message (or chunk of it), it needs to parse or decode the information (path, headers, …) from it and then start the next procedures such as routing, proxying, authentication and so on.
I’ve spent some time looking for a standard Java library that could parse the above raw HTTP
request message from bytes or string into a Java object, from which I can access headers, cookies, etc., programmatically.
I thought it would be something so common that there must be a standard library for it,
to contrary, after reading the source code of most popular web server libraries written in Java, such as
Tomcat
, Jetty
, Netty
, Undertow
, and so on, I found that everyone implements a
Http parser by themselves. Furthermore, when I looked into each implementation, I found that none of them can be used as standalone.
Every library requires additional arguments other than the raw HTTP message itself.
The following section shows how Http message paring/decoding looks like in those web server libraries written in Java, and describes how the Http parser from each library could be “used”.
What I Found
Jetty
In Jetty, the HttpParser is the main class that handles
the Http message parsing. To create a new instance of the HttpParser
class
in Jetty however, it requires a HttpHandler
which has only one concrete implementing class which itself requires a lot of things to create.
An example of using the http parser in Jetty is like this:
ByteBuffer buffer = BufferUtil.toBuffer(
"GET / HTTP/1.1\r\n" +
"Host: localhost\r\n" +
"Name: value\r\n\r\n");
HttpParser.RequestHandler handler = new Handler();
HttpParser parser = new HttpParser(handler);
More examples can be found here. However, please note that a handler class is required, which is not trivial to create! Take a look at this implementation in test for example.
Undertow
The main logic of parsing the http request in Undertow
can be found in the HttpRequestParser class.
And below is an example of how to use it:
byte[] in = "GET /somepath%2fotherPath HTTP/1.1\r\n\r\n".getBytes();
final ParseState context = new ParseState(10);
HttpServerExchange result = new HttpServerExchange(null);
HttpRequestParser.instance(OptionMap.create(UndertowOptions.ALLOW_ENCODED_SLASH, true)).handle(ByteBuffer.wrap(in), context, result);
// get method
result.getRequestMethod()
// get path
result.getRequestPath()
As you can see, instances of ParseState
and HttpServerExchange
are required. However, the ParseState
class that is required to handle the input http message is not public
accessible!
Therefore, you can’t really use the Http parser from Undertow
in your own project.
Netty
Similar like other web servers, Netty also introduced the concept of channel to the parsing process. The http parser
is inside HttpRequestDecoder.
Below is an example code of using the decoder
to parse a raw http request:
String crlf = "\r\n";
String request = "GET https://github.com/steventen HTTP/1.1" + crlf +
"Host: github.com" + crlf +
"Accept-Language: en-us" + crlf +
"Accept: text/json" + crlf +
"Accept-Encoding: gzip, deflate, br" + crlf +
"Cookie: a=123; b=456;c=wfhhnchauhd"
+ crlf + crlf;
EmbeddedChannel channel = new EmbeddedChannel(new HttpRequestDecoder());
channel.writeInbound(Unpooled.copiedBuffer(request, CharsetUtil.US_ASCII));
HttpRequest req = channel.readInbound();
req.headers();
req.method();
req.uri();
Set<Cookie> cookies = ServerCookieDecoder.LAX.decode(req.headers().get("Cookie"));
Although it is still not ideal, you can use the parser from Netty
in your own project with the EmbeddedChannel
.
But there are concerns around the performance that needs to be further evaluated, since the message needs to pass through a channel each time in order to decode.
Tomcat
To be honest, it is the only library that I couldn’t get how it really works by reading the source code. What’s even worse is that, when I tried to find examples by reading their tests, I found that a simple tomcat server has to be created in each test case, you can read the tests from here. Maybe I missed something, but this is really concerning to me, as this means that there are NO unit tests around it, but integration tests!
Conclusion
After the whole journey, I feel really disappointed. Not only because I couldn’t simply use the Http parsers from those popular web server libraries in Java, but also all existing libraries created their own interface that includes extra things into this basic but core functionality to meet their own needs.
I understand that, for many reasons, it might need many different things into the http parsing process, but to the core of the parsing itself, only one thing is really required: the Http message. I think all the libraries are failure cases of the principle of Separation of Concerns, and there has to be a solution to avoid that.
Extra
To satisfy my curiosity, I peek into the source of golang, and find its implementation here.
func ReadRequest(b *bufio.Reader) (*Request, error)
Yes! That’s what I really want, just as simple and clear as that!
An example of using the ReadRequest
from Go to parse a raw http request is like this:
crlf := "\r\n"
raw := "GET https://github.com/steventen HTTP/1.1" + crlf +
"Host: github.com" + crlf +
"Accept-Language: en-us" + crlf +
"Accept: text/json" + crlf +
"Accept-Encoding: gzip, deflate, br" + crlf +
"Cookie: a=123; b=456;c=wfhhnchauhd" + crlf + crlf
req, _ := ReadRequest(bufio.NewReader(strings.NewReader(raw)))
req.Cookies()
fmt.Println(req.Method)
fmt.Println(req.Header)
fmt.Println(req.URL)
Benchmark
Updated on December 29, 2021
I’ve got a chance to benchmark the performance of the http parsers that could be used as standalone, the details and the source code can be found on Github. The table below shows the final result:
Library Name | Performance (ns/op) |
---|---|
net/http package in Go |
4753 |
netty in Java |
5637.774 |
RawHttp in Java | 7109.697 |