You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by Jeffrey Dever <js...@sympatico.ca> on 2003/02/02 22:08:44 UTC
problem with recycling methods - use case
The idea of recycling methods is good, but at the moment, its not very
useful. Consider the following use case:
1) User needs to make a bunch of GETs as efficiently as possible
2) All the resources that they want to get are full string urls
3) In a loop, iterate over the list of resources and do the GET
In code, I'm thinking of somthing like this:
client = new HttpClient()
method = new GetMethod()
for (int i=0; i< resources.length; i++) {
method.setUri(resources[i])
client.executeMethod(method)
body = method.getResponseAsStream()
//do somthing with body
method.recycle()
}
Two issues with this:
1) There needs to be some way to set the entire uri from a method call.
Right now we can use a full uri in the constructor, but after its
constructed, we're stuck having to try and peice it together with calls
to setPath(), setQuery() and such.
Perhaps we should add setUri(String) to HttpMethod. I think its best to
have a string, as opposed to a URL or a URI object because its most
portable, and we have a bunch of parsing code for it already. The
constructor that takes a uri can call the setUri method to do its work,
for code reuse. This is not hard to do but will break any clients that
implement the HttpMethod interface.
2) From a connection standpoint, if the new url is to the same
protocol://host:port then I would expect that the connection be reused
for performance purposes. If not, then a new connection has to be
established and the old one should probablly be closed. But the call to
recycle() will release the connection so it won't be re-used. The point
of calling recycle is for efficiency but if we are always releasing the
connection, then all we are saving is object construction time which is
tiny compared to the time to establish a connection.
We could have a use case where all the GETs are to the same
protocol://host:port, (there is code in Slide like this) but this just a
special case that is covered by arbitrairy urls.
Comments? Anyone want to work on this?
Jandalf.
Re: problem with recycling methods - use case
Posted by Eric Johnson <er...@tibco.com>.
Michael Becke wrote:
[snip]
>
> I've been looking into this a little more and I'm actually not sure if
> AutoCloseInputStream should close the stream or not. I vaguely
> remember when this was first written and the various interactions are
> quite complex. In most cases the AutoCloseInputStream is not wrapping
> the actual socket stream. Usually there is another stream in the
> middle, either a ContentLengthInputStream or ChunkedInputStream. Both
> of which do not close the socket stream. The only case a socket input
> stream will be closed is when there is no chunking or content length.
> For this case it is difficult to determine when the response content
> is complete and therefore when it can be reused. In this case it
> might actually be reasonable to close the socket stream and force a
> reconnect. What does everyone think?
I spent quite a while crawling over this part of the code, and it seems
to me that if you don't have a content length or chunked encoding on the
response, the RFC indicates that the only way for the client to detect
the end of the content on the response is to actually close the socket.
In the case where the "AutoCloseInputStream" wraps the raw socket, the
raw socket should be closed, because the server is going to close it anyway.
I went over this code carefully to try to insure that calling "close" on
the getResponseBodyAsStream() result would _always_ be safe. As a
result, there is a fairly intricate dance between the stream wrapper,
the method, and the connection manager, so that they all stay in sync
with each other, and the appropriate amount of data from a persistent
connection will be read, rather than attempting to "scan" to find the
next occurrence of "HTTP/1.1" in the byte stream.
-Eric.
Re: problem with recycling methods - use case
Posted by Michael Becke <be...@u.washington.edu>.
>> 2) From a connection standpoint, if the new url is to the same
>> protocol://host:port then I would expect that the connection be
>> reused for performance purposes. If not, then a new connection has
>> to be established and the old one should probablly be closed. But
>> the call to recycle() will release the connection so it won't be
>> re-used. The point of calling recycle is for efficiency but if we
>> are always releasing the connection, then all we are saving is object
>> construction time which is tiny compared to the time to establish a
>> connection.
>
> In this case the connection should be reused and I believe it is. It
> should work something like the following:
>
> 1) recycle() is called which then calls releaseConnection()
> 2) releaseConnection() closes the response stream which causes the any
> unread response to be read ( note this is not the actual socket input
> stream but an instance of AutoCloseInputStream )
> 3) the connection is closed if needed (if connection close was set)
> 4) the connection goes back to the connection manager
> 5) the next request to the connection manager should reuse the same
> connection if appropriate
>
> Now that I've said all of this is, it appears that
> AutoCloseInputStream is not doing quite what it should.
> AutoCloseInputStream.notifyWatcher() calls super.close() which will
> close the socket InputStream. This is not what we want. Once this is
> changed this problem should go away.
I've been looking into this a little more and I'm actually not sure if
AutoCloseInputStream should close the stream or not. I vaguely
remember when this was first written and the various interactions are
quite complex. In most cases the AutoCloseInputStream is not wrapping
the actual socket stream. Usually there is another stream in the
middle, either a ContentLengthInputStream or ChunkedInputStream. Both
of which do not close the socket stream. The only case a socket input
stream will be closed is when there is no chunking or content length.
For this case it is difficult to determine when the response content is
complete and therefore when it can be reused. In this case it might
actually be reasonable to close the socket stream and force a
reconnect. What does everyone think?
Mike
Re: problem with recycling methods - use case
Posted by Michael Becke <be...@u.washington.edu>.
On Sunday, February 2, 2003, at 04:08 PM, Jeffrey Dever wrote:
> The idea of recycling methods is good, but at the moment, its not very
> useful. Consider the following use case:
>
> 1) User needs to make a bunch of GETs as efficiently as possible
> 2) All the resources that they want to get are full string urls
> 3) In a loop, iterate over the list of resources and do the GET
>
> In code, I'm thinking of somthing like this:
>
> client = new HttpClient()
> method = new GetMethod()
> for (int i=0; i< resources.length; i++) {
> method.setUri(resources[i])
> client.executeMethod(method)
> body = method.getResponseAsStream()
> //do somthing with body
> method.recycle()
> }
>
> Two issues with this:
>
> 1) There needs to be some way to set the entire uri from a method
> call. Right now we can use a full uri in the constructor, but after
> its constructed, we're stuck having to try and peice it together with
> calls to setPath(), setQuery() and such.
>
> Perhaps we should add setUri(String) to HttpMethod. I think its best
> to have a string, as opposed to a URL or a URI object because its most
> portable, and we have a bunch of parsing code for it already. The
> constructor that takes a uri can call the setUri method to do its
> work, for code reuse. This is not hard to do but will break any
> clients that implement the HttpMethod interface.
Agreed. There should be a setURI(String). I think someone brought
this up a few weeks ago. I'll take care of it if you like.
> 2) From a connection standpoint, if the new url is to the same
> protocol://host:port then I would expect that the connection be reused
> for performance purposes. If not, then a new connection has to be
> established and the old one should probablly be closed. But the call
> to recycle() will release the connection so it won't be re-used. The
> point of calling recycle is for efficiency but if we are always
> releasing the connection, then all we are saving is object
> construction time which is tiny compared to the time to establish a
> connection.
In this case the connection should be reused and I believe it is. It
should work something like the following:
1) recycle() is called which then calls releaseConnection()
2) releaseConnection() closes the response stream which causes the any
unread response to be read ( note this is not the actual socket input
stream but an instance of AutoCloseInputStream )
3) the connection is closed if needed (if connection close was set)
4) the connection goes back to the connection manager
5) the next request to the connection manager should reuse the same
connection if appropriate
Now that I've said all of this is, it appears that
AutoCloseInputStream is not doing quite what it should.
AutoCloseInputStream.notifyWatcher() calls super.close() which will
close the socket InputStream. This is not what we want. Once this is
changed this problem should go away.
Mike