You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Ken Krugler <kk...@krugle.net> on 2005/10/26 17:43:21 UTC

Long delay in httpclient

Hi all,

We've occasionally run into an odd situation, where a fetcher thread 
will hang for a long time (up to two hours), then suddenly continue 
running just fine.

When it hangs, it's always in the commons-httpclient.jar, at 
ChunkedInputStream.exhaustInputStream(). This is code that tries to 
read any residual bytes from the incoming socket data stream, in 
response to the protocol-httpclient code releasing the connection.

I don't know (yet) whether the hang is inside of the call to 
inStream.read(), or if this call is constantly returning a result 
length of 0 (versus -1).

I've searched the commons-httpclient bug database and not found any 
mention of this specific issue. There was one post to the 
commons-httpclient developer mailing list about ways in which 
something like this might happen - for example, there's a multi-chunk 
response, but the server decides that it has sent all the data it 
needs to send, while the client is still waiting for a chunk. But 
those don't seem to match our situation, where we're just trying to 
flush any data.

The change I'm trying out now is that exhaustInputStream() will 
terminate when either inStream.read() returns -1, or some length of 
time (30 seconds, in my case) has gone by with only 0 byte results 
being returned by the read call.

Does this make sense? Am I missing something else I should be trying?

Thanks,

-- Ken
-- 
Ken Krugler
Krugle, Inc.
+1 530-470-9200