You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by Leo Galambos <le...@centrum.cz> on 2003/08/04 18:31:15 UTC

Connection break

Hi.

I write a robot for a search engine. The robot must harvest all files 
which are shorter than a few kilobytes (let's say 100kB) - longer  files 
are not important, because they are often archives or long sheets about 
nothing.

I cannot find a robust style in which I could drop a connection (GET 
over HTTP/1.0 and HTTP/1.1) when the incoming data stream exceeds the 
upper limit. I do it by closing the input stream, which is constructed 
by getResponseAsStream, followed by releaseConnection. Is it OK?

My second point is related to "retrying" you have in your docs 
(http://jakarta.apache.org/commons/httpclient/tutorial.html - catch 
block of HttpRecovableException). When I do something like this, I found 
out that I had to call method.recycle() in the catch block, or the 
connection was not reinitialized and everything fails. Could you 
enlighten me on this? Is it a bug in the guide? (I have tried it on 2.0-b1).

And my last point - when I run the robot under stress conditions, some 
connections seem to be frozen, although I use setConnectionTimeout. Is 
it a known issue? How should I debug it so that you can get a valuable 
log? It happens after 1-2 hours of run, so the log could have a few gigas...

Thank you

-g-



Re: Connection break

Posted by Michael Becke <be...@u.washington.edu>.
Hi Leo,

Here are a few additions to Arian's comments.

> I cannot find a robust style in which I could drop a connection (GET 
> over HTTP/1.0 and HTTP/1.1) when the incoming data stream exceeds the 
> upper limit. I do it by closing the input stream, which is constructed 
> by getResponseAsStream, followed by releaseConnection. Is it OK?

This works.  It would also be okay to just release the connection.

This won't actually drop the connection but it will consume the rest of 
the stream.  To force a connection close you will have to handle your 
own connection management (not recommended) so that you will have access 
to the HttpConnection object.

> And my last point - when I run the robot under stress conditions, some 
> connections seem to be frozen, although I use setConnectionTimeout. Is 
> it a known issue? How should I debug it so that you can get a valuable 
> log? It happens after 1-2 hours of run, so the log could have a few 
> gigas...

There are two reasons that I can think of for a connection to get 
frozen.  One of them is connection timeout and the other is SO_TIMEOUT. 
  SO_TIMEOUT can be set via HttpClient.setTimeout().  This is the 
timeout when reading from a connection.

I hope this helps.

Mike


Re: Connection break

Posted by Oleg Kalnichevski <ol...@apache.org>.
> I cannot find a robust style in which I could drop a connection (GET 
> over HTTP/1.0 and HTTP/1.1) when the incoming data stream exceeds the 
> upper limit. I do it by closing the input stream, which is constructed 
> by getResponseAsStream, followed by releaseConnection. Is it OK?
> 

As far as I know invoking close on the stream returned by
HttpMethod#getResponseAsStream() will not actually close the underlying
socket stream. At the moment there's simply no reliable way to abort a
request. This problem is known to us and we are planning to address it
in the next version release (probably 2.1). It is too late for the 2.0
version which is already in the final release phase. 

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20288

Oleg


Re: Connection break

Posted by Adrian Sutton <ad...@intencha.com>.
> My second point is related to "retrying" you have in your docs 
> (http://jakarta.apache.org/commons/httpclient/tutorial.html - catch 
> block of HttpRecovableException). When I do something like this, I 
> found out that I had to call method.recycle() in the catch block, or 
> the connection was not reinitialized and everything fails. Could you 
> enlighten me on this? Is it a bug in the guide? (I have tried it on 
> 2.0-b1).

Yep, it's a bug in the guide.  The GetMethod should be created inside 
the while loop so that a new one is created for each retry.  Calling 
recycle would also work.  I've added it to my todo list to fix.

I'll leave someone more knowledgeable to answer your other questions as 
I'm not entirely sure.

Regards,

Adrian Sutton.

----------------------------------------------
Intencha "tomorrow's technology today"
Ph: 38478913 0422236329
Suite 8/29 Oatland Crescent
Holland Park West 4121
Australia QLD
www.intencha.com