You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by ot...@ivillage.com on 2002/01/06 08:32:01 UTC

Re: bug: HTTPClient hangs when the server doesn't close the connection

Hello,

I don't have a fix for this, but I found a bit more information about the cause related to the code.
In the log I found that the problematic method is HttpMethodBase's readStatusLine method, since the log looks like this:

DEBUG [Crawler-5] org.apache.commons.httpclient.HttpMethod - HttpMethodBase.readStatusLine(HttpState,HttpConnection): 06 Jan 2002 00:01:42,952
DEBUG [Crawler-5] org.apache.commons.httpclient.HttpConnection - HttpConnection.readLine(): 06 Jan 2002 00:01:42,952
...
...
a bunch of log lines showing HttpConnection.readLine() follow.

The code in the readStatusLine method that I think is the culprit is this:

        String statusLine = conn.readLine();

        while(statusLine != null && !statusLine.startsWith("HTTP/")) {
            statusLine = conn.readLine();
        }


This while loop never ends.
Recall that the problem was that the server responded without any headers and returned the HTML body right away.
So this loop will keep reading the body text desperately looking for a line that starts with "HTTP/", and that line never comes.
I think that's problem #1.

I think there is another problem in HttpConnection's readLine() method because, as you saw in the original message included below, the server stopped sending the body text and started to hang and HttpConnection did not detect that.
Instead of closing the connection after a while it just held onto it.

The relevant portion of readLine() is this:

        for(;;) {
            int ch = _input.read();
//            log.debug("HttpConnection.readLine() read " + ch);
            if(ch < 0) {
                if(buf.length() == 0) {
                    return null;
                } else {
                    break;
                }
            } else if (ch == '\r') {
//                log.debug("HttpConnection.readLine() found \\r, continuing");
                continue;
            } else if (ch == '\n') {
//                log.debug("HttpConnection.readLine() found \\n, breaking");
                break;
            }
            buf.append((char)ch);
        }

_input variable is InputStream gotten from the instance of Socket.
My guess is that _input.read() is where things got stuck.

Hm, I just realized I wasn't setting the socket timeout on HttpConnection/Socket like I thought I was. Uh :(
Nevertheless, things should not get stuck like this, especially if setting the timeout is not enforced.
Although the timeout currently has no effect as onnections are still not timing out after 30 seconds like I'd like them to.

Otis




On Sat, 05 January 2002, otisg@ivillage.com wrote:

> 
> Hello,
> 
> This looks like a bug...
> I noticed that HTTPClient hangs in some cases.  I turned on HTTPClient logging and it looks like it likes to hang when the server does not close the connection.
> Check these logs, they show things pretty clearly.
> 
> First, a healthy interaction with a web server.  httpclient.wire logging turns this up when a normal response is received:
> 
> INFO  [Crawler-5] httpclient.wire - >> "GET  HTTP/1.1
> ": 06 Jan 2002 00:01:42,505
> INFO  [Crawler-5] httpclient.wire - >> "Host: www.faz.de
> ": 06 Jan 2002 00:01:42,507
> INFO  [Crawler-5] httpclient.wire - >> "Content-Length: 0
> ": 06 Jan 2002 00:01:42,507
> INFO  [Crawler-5] httpclient.wire - >> "user-agent: None
> ": 06 Jan 2002 00:01:42,507
> INFO  [Crawler-5] httpclient.wire - >> \r\n: 06 Jan 2002 00:01:42,508
> INFO  [Crawler-5] httpclient.wire - << "HTTP/1.1 400 Bad Request" [\r\n]: 06 Jan 2002 00:01:42,702
> INFO  [Crawler-5] httpclient.wire - << "Server: Microsoft-IIS/5.0" [\r\n]: 06 Jan 2002 00:01:42,703
> INFO  [Crawler-5] httpclient.wire - << "Date: Sun, 06 Jan 2002 05:01:33 GMT" [\r\n]: 06 Jan 2002 00:01:42,703
> INFO  [Crawler-5] httpclient.wire - << "Set-Cookie: BIGipServerfazde=906039724.20480.0000; path=/" [\r\n]: 06 Jan 2002 00:01:42,704
> INFO  [Crawler-5] httpclient.wire - << "Content-Type: text/html" [\r\n]: 06 Jan 2002 00:01:42,704
> INFO  [Crawler-5] httpclient.wire - << "Content-Length: 87" [\r\n]: 06 Jan 2002 00:01:42,705
> INFO  [Crawler-5] httpclient.wire - << "<html><head><title>Error</title></head><body>The parameter is incorrect. </body></html>": 06 Jan 2002 00:01:42,713
> 
> 
> The above includes a nice GET request, response headers, and the body.
> It happens to be a 400 Bad Request, but that doesn't matter here.
> 
> 
> On the other hand, when HTTPClient hung its interaction with the web server looked something like this:
> 
> INFO  [Crawler-5] httpclient.wire - >> "GET  HTTP/1.1
> ": 06 Jan 2002 00:01:42,950
> INFO  [Crawler-5] httpclient.wire - >> "Host: www.focus.de
> ": 06 Jan 2002 00:01:42,951
> INFO  [Crawler-5] httpclient.wire - >> "Content-Length: 0
> ": 06 Jan 2002 00:01:42,951
> INFO  [Crawler-5] httpclient.wire - >> "user-agent: Librisia {OG}/0.1
> ": 06 Jan 2002 00:01:42,951
> INFO  [Crawler-5] httpclient.wire - >> \r\n: 06 Jan 2002 00:01:42,952
> INFO  [Crawler-5] httpclient.wire - << "<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 3.0//EN"> <HTML>" [\r\n]: 06 Jan 2002 00:01:43,477
> INFO  [Crawler-5] httpclient.wire - << "<HEAD> <TITLE>FOCUS Online | Rubriken | Inhalt | Fehler</TITLE>" [\r\n]: 06 Jan 2002 00:01:43,478
> ...
> more HTML here
> ...
> INFO  [Crawler-5] httpclient.wire - << "Da wir unsere Seiten mehrmals t&auml;glich" [\r\n]: 06 Jan 2002 00:01:43,881
> 
> 
> Notice a few things:
> 1. There is no response header, the first thing that HTTPClient gets back is the HTML body.
> 
> 2. This server encountered some error (see 'Fehler' in the TITLE - Fehler means error auf Deutsch), maybe that's why it didn't respond properly.
> 
> 3. The very last data sent from the server is some text, but note that it looks like it was cut off (there are no </BODY> nor </HTML> tags, even though the response contained a nice, correct HTML up to that point).
> So it looks like the server choked, started hanging or something, and HTTPClient stopped there as well and remained listening for the rest of the server's response, which never came.
> 
> Finally:
> 
> netstat -a|grep focus
> tcp        0      0 77-108-148-98.nyc:34555 focus.de:http           ESTABLISHED
> 
> 
> Obviously, HTTPClient is still holding onto this connection to www.focus.de server!
> 
> 
> I noticed this behaviour with other servers, too.  I don't have their httpclient.wire logs, but see this from netstat -a:
> 
> tcp        0      0 77-108-148-98.nyc:34819 www.ananzi.co.za:http   CLOSE_WAIT
> 
> 
> This looks like a HTTPClient bug to me.
> I don't know enough about the HTTPClient internals yet to fix it and send a diff.
> Notice that this also implies the that connection/socket timeout fix from a few weeks ago doesn't really work, which is what I observed, too.  No matter what timeout I use, if connections do time out they invariably time out after a bit more than 3 minutes which, I presume, is some Java's internal, default socket timeout value.
> 
> Does anyone know how to fix this 'hang bug'?
> 
> Thanks,
> Otis
> P.S.
> If it helps, this is the last log line from HttpConnection, which seems to show that readLine() method is the method that got stuck.
> 
> DEBUG [Crawler-5] org.apache.commons.httpclient.HttpConnection - HttpConnection.readLine(): 06 Jan 2002 00:01:43,881
> 
> _________________________________________________________________
> iVillage.com: Solutions for Your Life 
> Check out the most exciting women's community on the Web   
> http://www.ivillage.com
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>

_________________________________________________________________
iVillage.com: Solutions for Your Life 
Check out the most exciting women's community on the Web   
http://www.ivillage.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>