You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by Jeffrey Dever <js...@sympatico.ca> on 2003/02/02 22:08:44 UTC

problem with recycling methods - use case

The idea of recycling methods is good, but at the moment, its not very 
useful.  Consider the following use case:

1) User needs to make a bunch of GETs as efficiently as possible
2) All the resources that they want to get are full string urls
3) In a loop, iterate over the list of resources and do the GET

In code, I'm thinking of somthing like this:

client = new HttpClient()
method = new GetMethod()
for (int i=0; i< resources.length; i++) {
    method.setUri(resources[i])
    client.executeMethod(method)
    body = method.getResponseAsStream()
    //do somthing with body
    method.recycle()
}

Two issues with this:

1) There needs to be some way to set the entire uri from a method call. 
 Right now we can use a full uri in the constructor, but after its 
constructed, we're stuck having to try and peice it together with calls 
to setPath(), setQuery() and such.

Perhaps we should add setUri(String) to HttpMethod.  I think its best to 
have a string, as opposed to a URL or a URI object because its most 
portable, and we have a bunch of parsing code for it already.  The 
constructor that takes a uri can call the setUri method to do its work, 
for code reuse.  This is not hard to do but will break any clients that 
implement the HttpMethod interface.

2) From a connection standpoint, if the new url is to the same 
protocol://host:port then I would expect that the connection be reused 
for performance purposes.  If not, then a new connection has to be 
established and the old one should probablly be closed.  But the call to 
recycle() will release the connection so it won't be re-used.  The point 
of calling recycle is for efficiency but if we are always releasing the 
connection, then all we are saving is object construction time which is 
tiny compared to the time to establish a connection.

We could have a use case where all the GETs are to the same 
protocol://host:port, (there is code in Slide like this) but this just a 
special case that is covered by arbitrairy urls.

Comments?  Anyone want to work on this?

Jandalf.



Re: problem with recycling methods - use case

Posted by Eric Johnson <er...@tibco.com>.
Michael Becke wrote:

[snip]

>
> I've been looking into this a little more and I'm actually not sure if 
> AutoCloseInputStream should close the stream or not.  I vaguely 
> remember when this was first written and the various interactions are 
> quite complex.  In most cases the AutoCloseInputStream is not wrapping 
> the actual socket stream.  Usually there is another stream in the 
> middle, either a ContentLengthInputStream or ChunkedInputStream.  Both 
> of which do not close the socket stream.  The only case a socket input 
> stream will be closed is when there is no chunking or content length.  
> For this case it is difficult to determine when the response content 
> is complete and therefore when it can be reused.  In this case it 
> might actually be reasonable to close the socket stream and force a 
> reconnect.  What does everyone think?

I spent quite a while crawling over this part of the code, and it seems 
to me that if you don't have a content length or chunked encoding on the 
response, the RFC indicates that the only way for the client to detect 
the end of the content on the response is to actually close the socket. 
 In the case where the "AutoCloseInputStream" wraps the raw socket, the 
raw socket should be closed, because the server is going to close it anyway.

I went over this code carefully to try to insure that calling "close" on 
the getResponseBodyAsStream() result would _always_ be safe.  As a 
result, there is a fairly intricate dance between the stream wrapper, 
the method, and the connection manager, so that they all stay in sync 
with each other, and the appropriate amount of data from a persistent 
connection will be read, rather than attempting to "scan" to find the 
next occurrence of "HTTP/1.1" in the byte stream.

-Eric.


Re: problem with recycling methods - use case

Posted by Michael Becke <be...@u.washington.edu>.
>> 2) From a connection standpoint, if the new url is to the same 
>> protocol://host:port then I would expect that the connection be 
>> reused for performance purposes.  If not, then a new connection has 
>> to be established and the old one should probablly be closed.  But 
>> the call to recycle() will release the connection so it won't be 
>> re-used.  The point of calling recycle is for efficiency but if we 
>> are always releasing the connection, then all we are saving is object 
>> construction time which is tiny compared to the time to establish a 
>> connection.
>
> In this case the connection should be reused and I believe it is.  It 
> should work something like the following:
>
> 1) recycle() is called which then calls releaseConnection()
> 2) releaseConnection() closes the response stream which causes the any 
> unread response to be read ( note this is not the actual socket input 
> stream but an instance of AutoCloseInputStream)
> 3) the connection is closed if needed (if connection close was set)
> 4) the connection goes back to the connection manager
> 5) the next request to the connection manager should reuse the same 
> connection if appropriate
>
> Now that I've said all of this is, it appears that 
> AutoCloseInputStreamis not doing quite what it should.  
> AutoCloseInputStream.notifyWatcher() calls super.close() which will 
> close the socket InputStream.  This is not what we want.  Once this is 
> changed this problem should go away.

I've been looking into this a little more and I'm actually not sure if 
AutoCloseInputStream should close the stream or not.  I vaguely 
remember when this was first written and the various interactions are 
quite complex.  In most cases the AutoCloseInputStream is not wrapping 
the actual socket stream.  Usually there is another stream in the 
middle, either a ContentLengthInputStream or ChunkedInputStream.  Both 
of which do not close the socket stream.  The only case a socket input 
stream will be closed is when there is no chunking or content length.  
For this case it is difficult to determine when the response content is 
complete and therefore when it can be reused.  In this case it might 
actually be reasonable to close the socket stream and force a 
reconnect.  What does everyone think?

Mike

Re: problem with recycling methods - use case

Posted by Michael Becke <be...@u.washington.edu>.
On Sunday, February 2, 2003, at 04:08 PM, Jeffrey Dever wrote:

> The idea of recycling methods is good, but at the moment, its not very 
> useful.  Consider the following use case:
>
> 1) User needs to make a bunch of GETs as efficiently as possible
> 2) All the resources that they want to get are full string urls
> 3) In a loop, iterate over the list of resources and do the GET
>
> In code, I'm thinking of somthing like this:
>
> client = new HttpClient()
> method = new GetMethod()
> for (int i=0; i< resources.length; i++) {
>    method.setUri(resources[i])
>    client.executeMethod(method)
>    body = method.getResponseAsStream()
>    //do somthing with body
>    method.recycle()
> }
>
> Two issues with this:
>
> 1) There needs to be some way to set the entire uri from a method 
> call. Right now we can use a full uri in the constructor, but after 
> its constructed, we're stuck having to try and peice it together with 
> calls to setPath(), setQuery() and such.
>
> Perhaps we should add setUri(String) to HttpMethod.  I think its best 
> to have a string, as opposed to a URL or a URI object because its most 
> portable, and we have a bunch of parsing code for it already.  The 
> constructor that takes a uri can call the setUri method to do its 
> work, for code reuse.  This is not hard to do but will break any 
> clients that implement the HttpMethod interface.

Agreed.  There should be a setURI(String).  I think someone brought 
this up a few weeks ago.  I'll take care of it if you like.

> 2) From a connection standpoint, if the new url is to the same 
> protocol://host:port then I would expect that the connection be reused 
> for performance purposes.  If not, then a new connection has to be 
> established and the old one should probablly be closed.  But the call 
> to recycle() will release the connection so it won't be re-used.  The 
> point of calling recycle is for efficiency but if we are always 
> releasing the connection, then all we are saving is object 
> construction time which is tiny compared to the time to establish a 
> connection.

In this case the connection should be reused and I believe it is.  It 
should work something like the following:

1) recycle() is called which then calls releaseConnection()
2) releaseConnection() closes the response stream which causes the any 
unread response to be read ( note this is not the actual socket input 
stream but an instance of AutoCloseInputStream)
3) the connection is closed if needed (if connection close was set)
4) the connection goes back to the connection manager
5) the next request to the connection manager should reuse the same 
connection if appropriate

Now that I've said all of this is, it appears that 
AutoCloseInputStreamis not doing quite what it should.  
AutoCloseInputStream.notifyWatcher() calls super.close() which will 
close the socket InputStream.  This is not what we want.  Once this is 
changed this problem should go away.

Mike