You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by Roland Weber <os...@dubioso.net> on 2007/08/19 10:12:10 UTC

HTTP parsing - analysis

Hi Oleg, all,

I've taken a good look at the HTTP parsing code, with
a focus on the header of the messages. What I found is
that the actual parsing (and formatting) code is spread
across various static methods in o.a.h.message.
The message parsers for both traditional IO and NIO
are simply collecting full lines, feeding them to the
static parser methods afterwards. Header continuation
lines are handled by the message parsers.

The parts I want to make easily replaceable are mostly
in the static methods. The short term goal is to allow
the replacing of parser and writer of the HTTP/1.x
protocol specification in requests and responses, in
order to allow for SIP/2.0. I've also noticed that we
cannot generate messages with header continuation lines.

There parsing of headers into elements is not done
while a message is received, but on demand at a later
time. There is no corresponding way to format a header
from elements. I'd like to make the parser replaceable
too, but that has a rather low priority.

cheers,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


Re: HTTP parsing - analysis

Posted by Roland Weber <os...@dubioso.net>.
Oleg Kalnichevski wrote:
> With all due respect I do not think this solves any real _practical_
> problem, but merely rearranges things more to your personal liking.

It sure has to do with personal liking to some degree. But currently,
if I want to replace the parser for the protocol version spec, I have
to do the following (rough guess):

- write the new parser replacing the one in BasicHttpVersionFormat
- copy&modify the parser in BasicRequestLine and/or BasicStatusLine,
  to call my new parser instead of the default one
- define a new HttpRequestParser and/or HttpResponseParser
  to call the modified parser from the previous step
- define a new connection class that creates my new parser
- define a new operator that instantiates my new connection class

That is a lot of wrapping code for a little change in functionality.
Static methods are notoriously hard to replace, and that is why
I don't like them.
Supporting SIP may be a pointless exercise or not, but we'll never
be able to tell until we've done it. I consider it a good test case
for the flexibility of the framework. HTTPCLIENT-661 is a real-life
example where replacing a parser would solve a user problem. A bit
further back, we also had a real-life problem where a user needed
to communicate with a broken server that sent a second status line
in the headers.[1] No problem if users can easily plug in tolerant
parsers for the headers.

> I can live with that and will not stand in your way.
> But I want to ask you of two things:
> 
> (1) I invested A LOT of time and efforts into ensuring that parsing and
> formatting code generates nearly no intermediate garbage on the heap and
> is quite fast. It would be very regrettable if these code optimizations
> are lost

ACK.

> 
> (2) Please try to keep parsing and formatting logic in HttpCore base and
> NIO code logically consistent.

I will. The idea is to collect the static methods into classes with
non-static ones, while keeping the actual parsing and formatting logic
unchanged. Everything currently static is stateless so only one instance
of each class has to be created.
The part where state comes into play is the header continuation parsing,
which currently already requires one object per message to be instantiated.
I don't plan to change that significantly. At best, I can factor out the
currently duplicated code for header continuation parsing, making base
and NIO even more consistent.

I'll provide patches for review some time next week.

cheers,
  Roland

[1]
http://mail-archives.apache.org/mod_mbox/jakarta-httpcomponents-dev/200608.mbox/%3C1155093190.1580.267953912@webmail.messagingengine.com%3E

---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


Re: HTTP parsing - analysis

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Sun, 2007-08-19 at 10:12 +0200, Roland Weber wrote:
> Hi Oleg, all,
> 
> I've taken a good look at the HTTP parsing code, with
> a focus on the header of the messages. What I found is
> that the actual parsing (and formatting) code is spread
> across various static methods in o.a.h.message.
> The message parsers for both traditional IO and NIO
> are simply collecting full lines, feeding them to the
> static parser methods afterwards. Header continuation
> lines are handled by the message parsers.
> 
> The parts I want to make easily replaceable are mostly
> in the static methods. The short term goal is to allow
> the replacing of parser and writer of the HTTP/1.x
> protocol specification in requests and responses, in
> order to allow for SIP/2.0. I've also noticed that we
> cannot generate messages with header continuation lines.
> 
> There parsing of headers into elements is not done
> while a message is received, but on demand at a later
> time. There is no corresponding way to format a header
> from elements. I'd like to make the parser replaceable
> too, but that has a rather low priority.
> 
> cheers,
>   Roland
> 

Roland

With all due respect I do not think this solves any real _practical_
problem, but merely rearranges things more to your personal liking. I
can live with that and will not stand in your way. But I want to ask you
of two things:

(1) I invested A LOT of time and efforts into ensuring that parsing and
formatting code generates nearly no intermediate garbage on the heap and
is quite fast. It would be very regrettable if these code optimizations
are lost

(2) Please try to keep parsing and formatting logic in HttpCore base and
NIO code logically consistent.

Oleg 



> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org