You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by George Ludwig <sf...@yahoo.com> on 2005/12/15 18:17:59 UTC

unable to get connection from MultiThreadedConnectionManager

I have a distributed crawler that I'm debugging, and
it appears that my last issue is with HttpClient.
Currently what happens is, everything runs well for a
while, but then some of the nodes stop working because
the seem to be unable to get a connection from the
connection manager, because they are waiting on a
semaphore notification that never comes.

If anyone has any ides on this one, please let me
know!

-George (debug info follows)

I enabled logging on the
MultiThreadedHttpConnectionManager, specifically
because that's where the failure seems to occur. Here
are the last few debug lines from a node that has
stopped:

DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
Freeing connection,
hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
enter
HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
Notifying thread waiting on host pool,
hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
enter
HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
Getting free connection,
hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
enter
HttpConnectionManager.releaseConnection(HttpConnection)
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
Freeing connection,
hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
enter
HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
No-one waiting on host pool, notifying next waiting
thread.
DEBUG
[org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
Unable to get a connection, waiting...,
hostConfig=HostConfiguration[host=http://evangelion.polito.it]



In addition, here is a jrockit thread dump that
indicates a download thread is trying to get a
connection from the connection manager, but is waiting
for notification (there are actually many threads in
this state):



"ThreadPool-Crawler DownloadJob
PooledThread-0-running" id=256 idx=0xa2
tid=-1325696080 prio=5 alive, in native, waiting
    -- Waiting for notification on:
org/apache/commons/httpclient/MultiThreadedHttpConnectionManager$ConnectionPool@0x201a5c58[fatlock]
    at jrockit/vm/Threads.waitForSignal()V(Native
Method)
    at
jrockit/vm/Locks.wait(Ljava/lang/Object;)V(Unknown
Source)[optimized]
    at
jrockit/vm/Locks.wait(Ljava/lang/Object;J)V(Unknown
Source)
    at
org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.doGetConnection(Lorg/apache/commons/httpclient/HostConfiguration;J)Lorg/apache/commons/httpclient/HttpConnection;(MultiThreadedHttpConnectionManager.java:509)
    ^-- Lock released while waiting:
org/apache/commons/httpclient/MultiThreadedHttpConnectionManager$ConnectionPool@0x201a5c58[fatlock]
    at
org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.getConnectionWithTimeout(Lorg/apache/commons/httpclient/HostConfiguration;J)Lorg/apache/commons/httpclient/HttpConnection;(MultiThreadedHttpConnectionManager.java:394)
    at
org/apache/commons/httpclient/HttpMethodDirector.executeMethod(Lorg/apache/commons/httpclient/HttpMethod;)V(HttpMethodDirector.java:152)
    at
org/apache/commons/httpclient/HttpClient.executeMethod(Lorg/apache/commons/httpclient/HostConfiguration;Lorg/apache/commons/httpclient/HttpMethod;Lorg/apache/commons/httpclient/HttpState;)I(HttpClient.java:396)
    at
org/apache/commons/httpclient/HttpClient.executeMethod(Lorg/apache/commons/httpclient/HttpMethod;)I(HttpClient.java:324)[inlined]
    at
crawler/util/FetcherUtil.getContentAsString(Ljava/lang/String;)Ljava/lang/String;(FetcherUtil.java:68)[optimized]
    at
crawler/fetch/Downloadable.run()V(Downloadable.java:44)[optimized]
    at
services/threadpool/ThreadPoolThread.run()V(ThreadPoolThread.java:83)
    at jrockit/vm/RNI.c2java()V(Native Method)
    -- end of trace


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: unable to get connection from MultiThreadedConnectionManager

Posted by Roland Weber <ht...@dubioso.net>.
Hi George,

if you use the IGNORE_COOKIES policy, you don't have
to create a new state for each request. Just create one
for each thread. This will still be an improvement over
using a shared state, where the threads will lock out
each other while looking for cookies that aren't there.
The HostConfiguration is at least not heavyweight:

http://svn.apache.org/repos/asf/jakarta/commons/proper/httpclient/trunk/src/java/org/apache/commons/httpclient/HostConfiguration.java

DNS lookups should not happen until the connection to the
server is opened.

> Here is a new code fragment that leads to the execute
> call:
> 
> (url is the full string of the object I want to
> download, i.e. http://www.adobe.com/reader.zip)
> 
> HttpURL hurl = new HttpURL(url);
> HostConfiguration config= new HostConfiguration();
> config.setHost(hurl);
> int statusCode =
> httpClient.executeMethod(config,method,new
> HttpState());

You don't have to create the HostConfiguration object
yourself, unless you want to set request specific
things like a proxy. HttpClient will clone a default
host configuration and call setHost() if null is
passed as the host config. It's not in the JavaDocs,
but have a look at the code:

http://svn.apache.org/repos/asf/jakarta/commons/proper/httpclient/trunk/src/java/org/apache/commons/httpclient/HttpClient.java

> I understand now what I seemed to be collecting too
> many cookies, although I am still confused why I was
> collecting any of them at all, given that I seet the
> cookie policy to ignore before executing every
> request, i.e.:
> method.getParams().setCookiePolicy(CookiePolicy.IGNORE_COOKIES);

Strange indeed. Consider changing the default cookie spec:
http://jakarta.apache.org/commons/httpclient/cookies.html

Maybe there were some requests for which the cookie policy
was not set by mistake?

hope that helps,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: unable to get connection from MultiThreadedConnectionManager

Posted by George Ludwig <sf...@yahoo.com>.
--- George Ludwig <sf...@yahoo.com> wrote:

> Roland,
> 
> Thanks for the info. I set the conn count to 2000
> just
> to make sure there was enough connection overhead.
> I'm
> actually using only about 4 dedicated download

Oops, I meant *40*!



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: unable to get connection from MultiThreadedConnectionManager

Posted by George Ludwig <sf...@yahoo.com>.
Roland,

Thanks for the info. I set the conn count to 2000 just
to make sure there was enough connection overhead. I'm
actually using only about 4 dedicated download threads
per jvm, although I may increase that number
significantly after I get these issues worked out.

I just changed the code to use a new HostConfiguration
as well as HttpState per request. I assume the
HttpState object is lightweight to construct, but is
the HostConfiguration similarly lightweight? I use an
HttpURL object created from the download url string as
a constructor parameter to the HostConfiguration. Does
this do any inherent DNS  resolution the the standard
java URL class does?

Here is a new code fragment that leads to the execute
call:

(url is the full string of the object I want to
download, i.e. http://www.adobe.com/reader.zip)

HttpURL hurl = new HttpURL(url);
HostConfiguration config= new HostConfiguration();
config.setHost(hurl);
int statusCode =
httpClient.executeMethod(config,method,new
HttpState());

I understand now what I seemed to be collecting too
many cookies, although I am still confused why I was
collecting any of them at all, given that I seet the
cookie policy to ignore before executing every
request, i.e.:
method.getParams().setCookiePolicy(CookiePolicy.IGNORE_COOKIES);

Anyway, thanks again...I just restarted the system,
we'll see how long it survives this time ;)

-George

--- Roland Weber <ht...@dubioso.net> wrote:

> The connection manager is configured for 2000
> connections, is this
> also the number of threads you are using
> simultaneously?
> 
> If performance is an issue for your application, you
> should call
> httpClient.execute(HostConfiguration, HttpMethod,
> HttpState)
> with a dedicated state for each thread or a new
> state for each
> request, depending on your usage scenario.
> Otherwise, all of your
> threads will collect the cookies they receive in the
> same default
> state. Access to the default state is synchronized,
> and searching
> for matching cookies in the whole collection could
> eventually
> serialize all of your requests. That's on top of the
> delay for
> searching through all the cookies by string-matching
> host names.
> 
> hope that helps,
>   Roland
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> httpclient-user-help@jakarta.apache.org
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: unable to get connection from MultiThreadedConnectionManager

Posted by Roland Weber <ht...@dubioso.net>.
Hi George,

>     if (!headerOK) {
>       method.releaseConnection();
>       throw new
> InvalidMIMETypeException(offendingHeader);
>     }

You don't have to release the connection here, since that will be
done in the finally{} block. You could call method.abort() to make
*really* sure the connection is not re-used. But I doubt that will
make a difference, since you already set the Connection: close
header before.

The connection manager is configured for 2000 connections, is this
also the number of threads you are using simultaneously?

If performance is an issue for your application, you should call
httpClient.execute(HostConfiguration, HttpMethod, HttpState)
with a dedicated state for each thread or a new state for each
request, depending on your usage scenario. Otherwise, all of your
threads will collect the cookies they receive in the same default
state. Access to the default state is synchronized, and searching
for matching cookies in the whole collection could eventually
serialize all of your requests. That's on top of the delay for
searching through all the cookies by string-matching host names.

hope that helps,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: unable to get connection from MultiThreadedConnectionManager

Posted by George Ludwig <sf...@yahoo.com>.
Oleg,

I'm *pretty* sure I'm always calling close connection.
I'm attaching the code for the fetch util, please tell
me if I'm missing anything (but I'm pretty sure I'm
not).

For background, the crawler is designed to only
download from unique sets of ip addresses at any given
time (unique across all distributed crawlers). For
this reason, we want to close connections pretty
quickly after downloading because it's unlikely we
will need the same connection again any time soon.

Any comments on making this code better is much
appreciated.

-George 

static { 
  HttpConnectionManagerParams connMgrParams = 
    new HttpConnectionManagerParams();
  connMgrParams.setConnectionTimeout(4000);
  connMgrParams.setSoTimeout(4000);
  connMgrParams.setLinger(4000);
  // set to one so we don't hammer a web site by
accident
  connMgrParams.setMaxConnectionsPerHost(
    HostConfiguration.ANY_HOST_CONFIGURATION,1);
  connMgrParams.setMaxTotalConnections(2000);
  MultiThreadedHttpConnectionManager connMgr = 
    new MultiThreadedHttpConnectionManager();
  connMgr.setParams(connMgrParams);
  // static instance of HttpClient used by all threads
  httpClient= new HttpClient(connMgr); 
}

public String getContentAsString(String url) throws
Exception {
  GetMethod method = new GetMethod(url);
  StringBuffer buffer = new StringBuffer();
  try {
    method.getParams().setCookiePolicy(
      CookiePolicy.IGNORE_COOKIES);
    method.addRequestHeader( "Connection", "close");
    int statusCode = httpClient.executeMethod(method);
    Header header =
method.getRequestHeader("Content-Type");
    Header[] headers = method.getRequestHeaders();
    boolean headerOK=true;
    String offendingHeader=null;
    for (int i=0;i<headers.length && headerOK;i++) {
      if
(!(isValidContentType(headers[i].toString()))) {
        offendingHeader=headers[i].toString();
	headerOK=false;
        break;
      }
    }
    if (!headerOK) {
      method.releaseConnection();
      throw new
InvalidMIMETypeException(offendingHeader);
    }
    BufferedReader reader = new BufferedReader(
      new InputStreamReader(
method.getResponseBodyAsStream(),       
        method.getResponseCharSet())); 
    // consume the response entity
    int ch =0;
    while(((ch=reader.read())!=-1) && 
      buffer.length()<MAX_DOC_SIZE)
      buffer.append((char)ch);
  } finally {
    method.releaseConnection();
  }
  return buffer.toString();
}



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: unable to get connection from MultiThreadedConnectionManager

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Thu, Dec 15, 2005 at 09:17:59AM -0800, George Ludwig wrote:
> I have a distributed crawler that I'm debugging, and
> it appears that my last issue is with HttpClient.
> Currently what happens is, everything runs well for a
> while, but then some of the nodes stop working because
> the seem to be unable to get a connection from the
> connection manager, because they are waiting on a
> semaphore notification that never comes.
> 
> If anyone has any ides on this one, please let me
> know!
> 
> -George (debug info follows)
> 

George

Are you sure you _always_ call HttpMethod#releaseConnection() when you
are done processing an HTTP response?

Oleg


> I enabled logging on the
> MultiThreadedHttpConnectionManager, specifically
> because that's where the failure seems to occur. Here
> are the last few debug lines from a node that has
> stopped:
> 
> DEBUG
> [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
> Freeing connection,
> hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
> DEBUG
> [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
> enter
> HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
> DEBUG
> [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
> Notifying thread waiting on host pool,
> hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
> DEBUG
> [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
> enter
> HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
> DEBUG
> [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
> Getting free connection,
> hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
> DEBUG
> [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
> enter
> HttpConnectionManager.releaseConnection(HttpConnection)
> DEBUG
> [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
> Freeing connection,
> hostConfig=HostConfiguration[host=http://www.oregonrealtors.org]
> DEBUG
> [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
> enter
> HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration)
> DEBUG
> [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
> No-one waiting on host pool, notifying next waiting
> thread.
> DEBUG
> [org.apache.commons.httpclient.MultiThreadedHttpConnectionManager]
> Unable to get a connection, waiting...,
> hostConfig=HostConfiguration[host=http://evangelion.polito.it]
> 
> 
> 
> In addition, here is a jrockit thread dump that
> indicates a download thread is trying to get a
> connection from the connection manager, but is waiting
> for notification (there are actually many threads in
> this state):
> 
> 
> 
> "ThreadPool-Crawler DownloadJob
> PooledThread-0-running" id=256 idx=0xa2
> tid=-1325696080 prio=5 alive, in native, waiting
>     -- Waiting for notification on:
> org/apache/commons/httpclient/MultiThreadedHttpConnectionManager$ConnectionPool@0x201a5c58[fatlock]
>     at jrockit/vm/Threads.waitForSignal()V(Native
> Method)
>     at
> jrockit/vm/Locks.wait(Ljava/lang/Object;)V(Unknown
> Source)[optimized]
>     at
> jrockit/vm/Locks.wait(Ljava/lang/Object;J)V(Unknown
> Source)
>     at
> org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.doGetConnection(Lorg/apache/commons/httpclient/HostConfiguration;J)Lorg/apache/commons/httpclient/HttpConnection;(MultiThreadedHttpConnectionManager.java:509)
>     ^-- Lock released while waiting:
> org/apache/commons/httpclient/MultiThreadedHttpConnectionManager$ConnectionPool@0x201a5c58[fatlock]
>     at
> org/apache/commons/httpclient/MultiThreadedHttpConnectionManager.getConnectionWithTimeout(Lorg/apache/commons/httpclient/HostConfiguration;J)Lorg/apache/commons/httpclient/HttpConnection;(MultiThreadedHttpConnectionManager.java:394)
>     at
> org/apache/commons/httpclient/HttpMethodDirector.executeMethod(Lorg/apache/commons/httpclient/HttpMethod;)V(HttpMethodDirector.java:152)
>     at
> org/apache/commons/httpclient/HttpClient.executeMethod(Lorg/apache/commons/httpclient/HostConfiguration;Lorg/apache/commons/httpclient/HttpMethod;Lorg/apache/commons/httpclient/HttpState;)I(HttpClient.java:396)
>     at
> org/apache/commons/httpclient/HttpClient.executeMethod(Lorg/apache/commons/httpclient/HttpMethod;)I(HttpClient.java:324)[inlined]
>     at
> crawler/util/FetcherUtil.getContentAsString(Ljava/lang/String;)Ljava/lang/String;(FetcherUtil.java:68)[optimized]
>     at
> crawler/fetch/Downloadable.run()V(Downloadable.java:44)[optimized]
>     at
> services/threadpool/ThreadPoolThread.run()V(ThreadPoolThread.java:83)
>     at jrockit/vm/RNI.c2java()V(Native Method)
>     -- end of trace
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org