You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Tony Thompson <To...@stone-ware.com> on 2007/04/04 20:46:42 UTC

Question about MultiThreadedHttpConnectionManager

First a little background on my application.  I have a proxy application
that accepts requests from an HTTP user agent and then uses the Apache
HttpClient to facilitate passing that request to a particular host.  I
had been using the 2.0.2 version of the HttpClient and everything was
working perfectly.  Recently I upgraded my code to the 3.0.1 version of
the HttpClient and I have noticed a weird little issue.  I am using the
MultiThreadedHttpConnectionManager and I think the issue I am having is
in there but I am not sure.

So, a little info on the issue that I am seeing.  With the 3.0.1 client,
my application works fine for a bit.  Seems like it works OK until a
certain amount of data is moved through it or something.  I have been
unable to track down why it works and then stops working.  I have done
some packet traces of the client in a working state and when it is
broken.  The issue seems to be that in the middle of an HTTP 1.1
conversation, the client is opening a second socket to the host and is
splitting the requests across the two sockets.  Even though all of the
data is getting to the server, the server is confused by the fact that
it is now coming in from 2 different connections.  I suppose it could be
an issue with the way I am using the client but, I am confused why it
worked perfectly with the 2.0.2 client and works for a bit with the
3.0.1 client before it breaks.

Here are the basic steps I go through in my code for one HTTP request:
1. agent submits request
2. proxy gets the request and creates an HttpMethod
3. proxy determines what host to submit the request and gets a
HostConfiguration for that host
4. proxy does HttpClient.executeMethod( hostConfig, method )
5. response is processed and proxy calls method.releaseConnection()
.....
n. connection is closed between agent and proxy

So, steps 1 - 5 are executed several times:

AGENT              PROXY                          HOST
------------------------------------------------------
req ---------------> req on port 1700 ----------->
resp <-------------- resp <---------------------- resp
req ---------------> req on port 1700 ----------->
resp <-------------- resp <---------------------- resp
req ---------------> req on port 1700 ----------->
resp <-------------- resp <---------------------- resp
req ---------------> req on port 1700 ----------->
resp <-------------- resp <---------------------- resp
close

When the conversation breaks:

AGENT              PROXY                          HOST
------------------------------------------------------
req ---------------> req on port 1700 ----------->
resp <-------------- resp <---------------------- resp
req ---------------> req on port 1701 ----------->
resp <-------------- resp <---------------------- resp
req ---------------> req on port 1701 ----------->
resp <-------------- resp <---------------------- resp
req ---------------> req on port 1700 ----------->
resp <-------------- resp <---------------------- resp
close

The question I have is about the MultiThreadedHttpConnectionManager.
Did it used to keep track of the connection it would make subsequent
requsts on in a ThreadLocal (even though I was calling
releaseConnection() between requests) possibly and now it now longer
does that?  That is speculation on my part but like I said, I can't
explain why it worked in 2.0.2 and why now it thinks it needs to open
another connection at some point during that conversation.

If anyone was able to make sense of my long email and has any input, I
would appreciate it.
Tony
 
This message (and any associated files) is intended only for the 
use of the individual or entity to which it is addressed and may 
contain information that is confidential, subject to copyright or
constitutes a trade secret. If you are not the intended recipient 
you are hereby notified that any dissemination, copying or 
distribution of this message, or files associated with this message, 
is strictly prohibited. If you have received this message in error, 
please notify us immediately by replying to the message and deleting 
it from your computer. Messages sent to and from Stoneware, Inc.
may be monitored.

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Question about MultiThreadedHttpConnectionManager

Posted by Roland Weber <os...@dubioso.net>.
Hi Tony,

> If my client opens a persistent HTTP 1.1 connection to the server and
> sends 4 requests and then sends the 5th request with a Connection: close
> header, all 5 of those requests are supposed to be sent over the same
> connection, right?

If your client is single-threaded and releases the connection each time
after processing the response and before sending the next request, yes.
In the absence of other threads, that is. However, it is a performance
optimization that the open connection is re-used. It does or rather
should not affect the correctness of the application. Servers should
not care whether requests come in over the same connection, or another
one.

> What is happening and what I was attempting to
> demonstrate in my primitive picture was the fact that the HttpClient at
> some point decides that it is going to start a new connection and splits
> those 5 requests across 2 connections.  In my application, the same
> thread is sending all 5 of those requests but I do have other threads
> sendings requests to the same host at that time.

All your threads share the same pool of connections. If the one
connection in your pool that is already open gets handed out to
a different thread, your first thread gets a new connection to
the same host. Works as designed.

> Once the load starts
> to increase, the MTHCM starts opening more connections as it should but
> those connections start getting crossed up more as things go on.

MTHCM _should_ open as many connections as allowed by the
"connections per host" limit, which by default is set to 2.
If it opens more, please file a bug report. A test case that
exhibits the problem would be welcome.

> But, I have seen WebDAV requests that are
> expecting all 5 of those requests to come through on the same connection
> blow chunks because the HttpClient is breaking protocol.

HttpClient implements the HTTP protocol, not WebDAV. If WebDAV has
additional constraints that need to be enforced, you should use a
WebDAV client implementation. If your WebDAV server expects all
requests over a single connection although that is not specified
by WebDAV (it sure isn't by HTTP), the server is buggy. Either way,
it is not HttpClient that is "breaking protocol".

To work around this problem will be tricky, since the connection
and connection manager classes are closely tied to eachother in 3.x.
The Slide project has a WebDAV client implementation:
http://jakarta.apache.org/slide/index.html

> I am not keeping state in the server since it is really just a
> transparent proxy so client state is maintained in the originating
> client.

Make sure that the backend server does not send cookies.
Because if it does, you are keeping state without knowing it.

cheers,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


RE: Question about MultiThreadedHttpConnectionManager

Posted by Tony Thompson <To...@stone-ware.com>.
Roland,

>> The issue seems to be that in the middle of an HTTP 1.1 conversation,

>> the client is opening a second socket to the host and is splitting
the 
>> requests across the two sockets.  Even though all of the data is 
>> getting to the server, the server is confused by the fact that it is 
>> now coming in from 2 different connections.
>
>It is perfectly normal for a client application (Browser, HttpClient)
to open more than one connection if requests can be executed 
>simultaneously.
>If the server gets confused by that, the server application is broken.
>Nevertheless, you can set the "connections per host" limit of the
connection manager to 1, then only one connection will be used and no 
>requests are executed in parallel.

I understand that but this is where things get a little fuzzy for me.
If my client opens a persistent HTTP 1.1 connection to the server and
sends 4 requests and then sends the 5th request with a Connection: close
header, all 5 of those requests are supposed to be sent over the same
connection, right?  What is happening and what I was attempting to
demonstrate in my primitive picture was the fact that the HttpClient at
some point decides that it is going to start a new connection and splits
those 5 requests across 2 connections.  In my application, the same
thread is sending all 5 of those requests but I do have other threads
sendings requests to the same host at that time.  Once the load starts
to increase, the MTHCM starts opening more connections as it should but
those connections start getting crossed up more as things go on.  For
most HTTP conversations having those requests split across multiple
connections may not be a big deal other than not being as efficient with
the server as it should.  But, I have seen WebDAV requests that are
expecting all 5 of those requests to come through on the same connection
blow chunks because the HttpClient is breaking protocol.  Am I missing
some key piece of information here?


>> 4. proxy does HttpClient.executeMethod( hostConfig, method )
>
>HttpClient has only one default HttpState. A proxy that is serving
multiple clients should maintain a separate HttpState for each client 
>session, and pass that state in the executeMethod call.
>Otherwise, a backend server that uses cookies to trace client sessions
will indeed get confused, because the wrong cookies are sent 
>back.

I am not keeping state in the server since it is really just a
transparent proxy so client state is maintained in the originating
client.

If some of my assumptions above are correct, is there something I can do
to work better with the HttpClient or does the HttpClient need to handle
this differently?

Thanks
Tony
 
This message (and any associated files) is intended only for the 
use of the individual or entity to which it is addressed and may 
contain information that is confidential, subject to copyright or
constitutes a trade secret. If you are not the intended recipient 
you are hereby notified that any dissemination, copying or 
distribution of this message, or files associated with this message, 
is strictly prohibited. If you have received this message in error, 
please notify us immediately by replying to the message and deleting 
it from your computer. Messages sent to and from Stoneware, Inc.
may be monitored.

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: Question about MultiThreadedHttpConnectionManager

Posted by Roland Weber <os...@dubioso.net>.
Hello Tony,

> The issue seems to be that in the middle of an HTTP 1.1
> conversation, the client is opening a second socket to the host and is
> splitting the requests across the two sockets.  Even though all of the
> data is getting to the server, the server is confused by the fact that
> it is now coming in from 2 different connections.

It is perfectly normal for a client application (Browser, HttpClient) to
open more than one connection if requests can be executed simultaneously.
If the server gets confused by that, the server application is broken.
Nevertheless, you can set the "connections per host" limit of the
connection manager to 1, then only one connection will be used and no
requests are executed in parallel.

> 4. proxy does HttpClient.executeMethod( hostConfig, method )

HttpClient has only one default HttpState. A proxy that is serving
multiple clients should maintain a separate HttpState for each
client session, and pass that state in the executeMethod call.
Otherwise, a backend server that uses cookies to trace client
sessions will indeed get confused, because the wrong cookies are
sent back.

> AGENT              PROXY                          HOST
> ------------------------------------------------------
> req ---------------> req on port 1700 ----------->
> resp <-------------- resp <---------------------- resp
> req ---------------> req on port 1701 ----------->
> resp <-------------- resp <---------------------- resp
> req ---------------> req on port 1701 ----------->
> resp <-------------- resp <---------------------- resp
> req ---------------> req on port 1700 ----------->
> resp <-------------- resp <---------------------- resp
> close

If the port number refers to the local port of your proxy,
I don't see a problem here.

> The question I have is about the MultiThreadedHttpConnectionManager.
> Did it used to keep track of the connection it would make subsequent
> requsts on in a ThreadLocal (even though I was calling
> releaseConnection() between requests) possibly and now it now longer
> does that?

No. MTHCM is thread safe, it does not maintain per-thread state.
It does re-use connections if multiple requests are sent to the
same host. Everything except connection state is tracked by one
or more HttpState objects, as mentioned above.

hope that helps,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org