You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Martin Grotzke <ma...@javakaffee.de> on 2008/02/05 19:21:31 UTC

MultiThreadedHttpConnectionManager waiting in doGetConnection

Hi,

we have problems with our application: we use the solrj client that uses
internally commons httpclient / MultiThreadedHttpConnectionManager, the
httpclient is release 3.1.

From time to time (once/twice a week, with 150.000 requests per day) the
requests are blocking, then we have lots of threads (e.g. 300, 400)
waiting in MTHCM.doGetConnection (l. 518):

"TP-Processor149" daemon prio=10 tid=0x00002aab431b2c00 nid=0x4bc8 in Object.wait() [0x0000000043250000..0x0000000043253bc0]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aaabf751468> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
        at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
        - locked <0x00002aaabf751468> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
        at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)

Additionally I see the MultiThreadedHttpConnectionManager cleanup thread
waiting at ReferenceQueue.remove:

"MultiThreadedHttpConnectionManager cleanup" daemon prio=10 tid=0x00002aab439de800 nid=0x11d1 in Object.wait() [0x0000000
04241b000..0x000000004241bd40]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aaabfca80e8> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
        - locked <0x00002aaabfca80e8> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
        at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)

Now I wonder, why so many threads are waiting in doGetConnection. They
should be interrupted when MTHCM.notifyWaitingThread is invoked, so it
seems as if this is not called.
Though, this should be the case, asuming that there are/were connections
in use that should be released by solrj otherwise no thread should reach
the wait.

One thread dump containing 302 MTHCM.doGetConnection in state waiting
has 6 threads where the solrj client is processing the response, and
these are not blocking or anything.

AFAICS the solrj client uses httpclient correctly, especially in that it
releases the connection within a finally block (see also [1]).

Some additional facts:
- DefaultMaxConnectionsPerHost is set to 32
- MaxTotalConnections is set to 128
- ConnectionManagerTimeout is not set / 0

Has someone an idea what might be wrong here, or experienced anything
similar?

Thanx in advance,
cheers,
Martin


[1]
http://svn.apache.org/viewvc/lucene/solr/trunk/client/java/solrj/src/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java?view=markup




Re: MultiThreadedHttpConnectionManager waiting in doGetConnection

Posted by Martin Grotzke <ma...@javakaffee.de>.
Hi Roland,

I just want to give feedback concerning the issue we had: now the new
version of solrj is running for ~10 days on our production server and we
didn't encounter the problem again. So it seems that this caused the
problem - and MultiThreadedHttpConnectionManager is fine :)

Thanx again for your help,
cheers,
Martin


On Thu, 2008-02-07 at 11:23 +0100, Martin Grotzke wrote:
> Hi Roland,
> 
> sorry for providing such bad input/information :S
> 
> > The numbers are wrong, but the situation is actually worse.
> So the problem is that we have only 2 running threads (checked out
> connections) but an increasing number of waiting threads, right?
> 
> > Well, it looks like connections are leaking. So my next guess is
> > that there is some kind of error situation in which the connection
> > is not correctly released. The solrj code looks clean though.
> > Is there any other application or daemon thread that uses the
> > same connection manager?
> This should not be the case, as we're creating the CommonsHttpSolrServer
> simply with a URL, so that it creates a new HttpClient with a new MTHCM.
> 
> Whoo, I just realized that the DefaultMaxConnectionsPerHost are NOT
> increased in the version of CommonsHttpSolrServer we're running in
> production, and therefore is still 2.
> 
> So having two threads processing a response is correct and the problem
> might be related to the response processing...
> 
> Btw. we right now have a new production release in the pipeline with the
> new version of solrj (CommonsHttpSolrServer with
> DefaultMaxConnectionsPerHost set to 32). This new version of solrj
> contains also changes to the XMLResponseParser, one contains threading
> related changes (https://issues.apache.org/jira/browse/SOLR-360).
> 
> So we could wait for the release and see if we encounter the problem
> again or if it's already fixed - hopefully :)
> 
> Thanx a lot for your help,
> cheers,
> Martin
> 
> 
> 
> On Wed, 2008-02-06 at 21:41 +0100, Roland Weber wrote:
> > Hello Martin,
> > 
> > >> Hm, I didn't get the last part about not reaching the wait. You have
> > >> more than 300 threads and just 128 connections, so I don't see a
> > >> problem with 200 threads waiting at the same time if the machine
> > >> is busy.
> > > Then I wasn't clear enough (sorry for my bad english :)), we do not have
> > > 200 but 300 threads waiting at the same time...
> > 
> > I was talking in orders of magnitude, not in specific numbers.
> > 
> > > The threaddump (http://senduit.com/93f7d2) shows 302 waiting threads and
> > > 6 that are running.
> > > 
> > > I count 302 by doing this:
> > > 
> > > grep "doGetConnection(MultiThreadedHttpConnectionManager.java:518)" threaddump.txt | wc -l
> > 
> > > I count 6 with this:
> > > 
> > > grep CommonsHttpSolrServer threaddump.txt | grep -v CommonsHttpSolrServer.java:222 | wc -l
> > 
> > The 6 matches are for the following threads:
> > 
> > "http-8080-2" daemon prio=10 tid=0x00002aab40860800 nid=0x1210 runnable
> > [0x0000000041725000..0x0000000041728dc0]
> > "TP-Processor45" daemon prio=10 tid=0x00002aab4103cc00 nid=0x11c9 runnable
> > [0x0000000042210000..0x0000000042213d40]
> > "http-8080-2" daemon prio=10 tid=0x00002aab40860800 nid=0x1210 runnable
> > [0x0000000041725000..0x0000000041728dc0]
> > "TP-Processor45" daemon prio=10 tid=0x00002aab4103cc00 nid=0x11c9 runnable
> > [0x0000000042210000..0x0000000042213d40]
> > "http-8080-2" daemon prio=10 tid=0x00002aab40860800 nid=0x1210 runnable
> > [0x0000000041725000..0x0000000041728dc0]
> > "TP-Processor45" daemon prio=10 tid=0x00002aab4103cc00 nid=0x11c9 runnable
> > [0x0000000042210000..0x0000000042213d40]
> > 
> > Looks to me as if you're counting duplicates.
> > Same is true for the waiting threads:
> > 
> > grep -A 10 -B 2 "Thread.State: WAITING" threaddump.txt | grep Processor149
> > 
> > returns three matches for TP-Processor149.
> > Maybe that is because there are three thread dumps in your file?
> > 
> > grep -B 1 "Full thread dump" threaddump.txt
> > 
> > 2008-01-29 11:00:02
> > Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.6.0_02-b05 mixed mode):
> > --
> > 2008-01-29 11:12:01
> > Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.6.0_02-b05 mixed mode):
> > --
> > 2008-01-29 11:22:01
> > Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.6.0_02-b05 mixed mode):
> > 
> > 
> > > Asuming that not, we do not have 128 checked out connections but only 6.
> > 
> > The numbers are wrong, but the situation is actually worse.
> > 
> > > The effect is also, that these 302 thread are blocking "forever" and we
> > > have to restart the server, as no new requests are being served...
> > 
> > Well, it looks like connections are leaking. So my next guess is
> > that there is some kind of error situation in which the connection
> > is not correctly released. The solrj code looks clean though.
> > Is there any other application or daemon thread that uses the
> > same connection manager?
> > 
> > You probably cannot easily switch to a 1.5 JVM to see whether that
> > makes a difference?
> > 
> > cheers,
> >   Roland
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> > For additional commands, e-mail: httpclient-users-help@hc.apache.org
> > 

Re: MultiThreadedHttpConnectionManager waiting in doGetConnection

Posted by Martin Grotzke <ma...@javakaffee.de>.
Hi Roland,

sorry for providing such bad input/information :S

> The numbers are wrong, but the situation is actually worse.
So the problem is that we have only 2 running threads (checked out
connections) but an increasing number of waiting threads, right?

> Well, it looks like connections are leaking. So my next guess is
> that there is some kind of error situation in which the connection
> is not correctly released. The solrj code looks clean though.
> Is there any other application or daemon thread that uses the
> same connection manager?
This should not be the case, as we're creating the CommonsHttpSolrServer
simply with a URL, so that it creates a new HttpClient with a new MTHCM.

Whoo, I just realized that the DefaultMaxConnectionsPerHost are NOT
increased in the version of CommonsHttpSolrServer we're running in
production, and therefore is still 2.

So having two threads processing a response is correct and the problem
might be related to the response processing...

Btw. we right now have a new production release in the pipeline with the
new version of solrj (CommonsHttpSolrServer with
DefaultMaxConnectionsPerHost set to 32). This new version of solrj
contains also changes to the XMLResponseParser, one contains threading
related changes (https://issues.apache.org/jira/browse/SOLR-360).

So we could wait for the release and see if we encounter the problem
again or if it's already fixed - hopefully :)

Thanx a lot for your help,
cheers,
Martin



On Wed, 2008-02-06 at 21:41 +0100, Roland Weber wrote:
> Hello Martin,
> 
> >> Hm, I didn't get the last part about not reaching the wait. You have
> >> more than 300 threads and just 128 connections, so I don't see a
> >> problem with 200 threads waiting at the same time if the machine
> >> is busy.
> > Then I wasn't clear enough (sorry for my bad english :)), we do not have
> > 200 but 300 threads waiting at the same time...
> 
> I was talking in orders of magnitude, not in specific numbers.
> 
> > The threaddump (http://senduit.com/93f7d2) shows 302 waiting threads and
> > 6 that are running.
> > 
> > I count 302 by doing this:
> > 
> > grep "doGetConnection(MultiThreadedHttpConnectionManager.java:518)" threaddump.txt | wc -l
> 
> > I count 6 with this:
> > 
> > grep CommonsHttpSolrServer threaddump.txt | grep -v CommonsHttpSolrServer.java:222 | wc -l
> 
> The 6 matches are for the following threads:
> 
> "http-8080-2" daemon prio=10 tid=0x00002aab40860800 nid=0x1210 runnable
> [0x0000000041725000..0x0000000041728dc0]
> "TP-Processor45" daemon prio=10 tid=0x00002aab4103cc00 nid=0x11c9 runnable
> [0x0000000042210000..0x0000000042213d40]
> "http-8080-2" daemon prio=10 tid=0x00002aab40860800 nid=0x1210 runnable
> [0x0000000041725000..0x0000000041728dc0]
> "TP-Processor45" daemon prio=10 tid=0x00002aab4103cc00 nid=0x11c9 runnable
> [0x0000000042210000..0x0000000042213d40]
> "http-8080-2" daemon prio=10 tid=0x00002aab40860800 nid=0x1210 runnable
> [0x0000000041725000..0x0000000041728dc0]
> "TP-Processor45" daemon prio=10 tid=0x00002aab4103cc00 nid=0x11c9 runnable
> [0x0000000042210000..0x0000000042213d40]
> 
> Looks to me as if you're counting duplicates.
> Same is true for the waiting threads:
> 
> grep -A 10 -B 2 "Thread.State: WAITING" threaddump.txt | grep Processor149
> 
> returns three matches for TP-Processor149.
> Maybe that is because there are three thread dumps in your file?
> 
> grep -B 1 "Full thread dump" threaddump.txt
> 
> 2008-01-29 11:00:02
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.6.0_02-b05 mixed mode):
> --
> 2008-01-29 11:12:01
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.6.0_02-b05 mixed mode):
> --
> 2008-01-29 11:22:01
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.6.0_02-b05 mixed mode):
> 
> 
> > Asuming that not, we do not have 128 checked out connections but only 6.
> 
> The numbers are wrong, but the situation is actually worse.
> 
> > The effect is also, that these 302 thread are blocking "forever" and we
> > have to restart the server, as no new requests are being served...
> 
> Well, it looks like connections are leaking. So my next guess is
> that there is some kind of error situation in which the connection
> is not correctly released. The solrj code looks clean though.
> Is there any other application or daemon thread that uses the
> same connection manager?
> 
> You probably cannot easily switch to a 1.5 JVM to see whether that
> makes a difference?
> 
> cheers,
>   Roland
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
> 

Re: MultiThreadedHttpConnectionManager waiting in doGetConnection

Posted by Roland Weber <os...@dubioso.net>.
Hello Martin,

>> Hm, I didn't get the last part about not reaching the wait. You have
>> more than 300 threads and just 128 connections, so I don't see a
>> problem with 200 threads waiting at the same time if the machine
>> is busy.
> Then I wasn't clear enough (sorry for my bad english :)), we do not have
> 200 but 300 threads waiting at the same time...

I was talking in orders of magnitude, not in specific numbers.

> The threaddump (http://senduit.com/93f7d2) shows 302 waiting threads and
> 6 that are running.
> 
> I count 302 by doing this:
> 
> grep "doGetConnection(MultiThreadedHttpConnectionManager.java:518)" threaddump.txt | wc -l

> I count 6 with this:
> 
> grep CommonsHttpSolrServer threaddump.txt | grep -v CommonsHttpSolrServer.java:222 | wc -l

The 6 matches are for the following threads:

"http-8080-2" daemon prio=10 tid=0x00002aab40860800 nid=0x1210 runnable
[0x0000000041725000..0x0000000041728dc0]
"TP-Processor45" daemon prio=10 tid=0x00002aab4103cc00 nid=0x11c9 runnable
[0x0000000042210000..0x0000000042213d40]
"http-8080-2" daemon prio=10 tid=0x00002aab40860800 nid=0x1210 runnable
[0x0000000041725000..0x0000000041728dc0]
"TP-Processor45" daemon prio=10 tid=0x00002aab4103cc00 nid=0x11c9 runnable
[0x0000000042210000..0x0000000042213d40]
"http-8080-2" daemon prio=10 tid=0x00002aab40860800 nid=0x1210 runnable
[0x0000000041725000..0x0000000041728dc0]
"TP-Processor45" daemon prio=10 tid=0x00002aab4103cc00 nid=0x11c9 runnable
[0x0000000042210000..0x0000000042213d40]

Looks to me as if you're counting duplicates.
Same is true for the waiting threads:

grep -A 10 -B 2 "Thread.State: WAITING" threaddump.txt | grep Processor149

returns three matches for TP-Processor149.
Maybe that is because there are three thread dumps in your file?

grep -B 1 "Full thread dump" threaddump.txt

2008-01-29 11:00:02
Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.6.0_02-b05 mixed mode):
--
2008-01-29 11:12:01
Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.6.0_02-b05 mixed mode):
--
2008-01-29 11:22:01
Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.6.0_02-b05 mixed mode):


> Asuming that not, we do not have 128 checked out connections but only 6.

The numbers are wrong, but the situation is actually worse.

> The effect is also, that these 302 thread are blocking "forever" and we
> have to restart the server, as no new requests are being served...

Well, it looks like connections are leaking. So my next guess is
that there is some kind of error situation in which the connection
is not correctly released. The solrj code looks clean though.
Is there any other application or daemon thread that uses the
same connection manager?

You probably cannot easily switch to a 1.5 JVM to see whether that
makes a difference?

cheers,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: MultiThreadedHttpConnectionManager waiting in doGetConnection

Posted by Martin Grotzke <ma...@javakaffee.de>.
Hi,

On Wed, 2008-02-06 at 16:36 +0100, Roland Weber wrote:
> Hm, I didn't get the last part about not reaching the wait. You have
> more than 300 threads and just 128 connections, so I don't see a
> problem with 200 threads waiting at the same time if the machine
> is busy.
Then I wasn't clear enough (sorry for my bad english :)), we do not have
200 but 300 threads waiting at the same time...
The threaddump (http://senduit.com/93f7d2) shows 302 waiting threads and
6 that are running.

I count 302 by doing this:

grep "doGetConnection(MultiThreadedHttpConnectionManager.java:518)" threaddump.txt | wc -l

These threads have "CommonsHttpSolrServer.java:222" in the stack.

I count 6 with this:

grep CommonsHttpSolrServer threaddump.txt | grep -v CommonsHttpSolrServer.java:222 | wc -l


Is anything wrong with this?


Asuming that not, we do not have 128 checked out connections but only 6.

The effect is also, that these 302 thread are blocking "forever" and we
have to restart the server, as no new requests are being served...

> I'm not saying that MTHCM is bug-free, but let us first consider
> if it is operating correctly. 
Hopefully - I would prefer that it's simply a wrong usage of
httpclient :)

> Is it possible that you have several
> hundred threads trying to access the same target host?
Yes, I would say. All the requests access a loadbalancer that fronts two
search servers.

> These would end up waiting on the same pool, and only 32 connections are given
> out for each pool. 6 threads not blocked, 26 blocked on reading or
> writing, and the rest waiting for a connection to be released?
Ok, so we should have 32 instead of 6 running solrj clients.

Thanx a lot for your help,
cheers,
Martin



Re: MultiThreadedHttpConnectionManager waiting in doGetConnection

Posted by Roland Weber <os...@dubioso.net>.
Hello Martin,

> Now I wonder, why so many threads are waiting in doGetConnection. They
> should be interrupted when MTHCM.notifyWaitingThread is invoked, so it
> seems as if this is not called.

It is called only if a connection is returned and given to the specific
pool on which a thread is waiting.

> Though, this should be the case, asuming that there are/were connections
> in use that should be released by solrj otherwise no thread should reach
> the wait.

Hm, I didn't get the last part about not reaching the wait. You have
more than 300 threads and just 128 connections, so I don't see a
problem with 200 threads waiting at the same time if the machine
is busy.

> One thread dump containing 302 MTHCM.doGetConnection in state waiting
> has 6 threads where the solrj client is processing the response, and
> these are not blocking or anything.

6 threads not blocking at all? How many are blocked on reading from
or writing to a connection? If there are 122, you know what the
problem is.

> - DefaultMaxConnectionsPerHost is set to 32
> - MaxTotalConnections is set to 128
> - ConnectionManagerTimeout is not set / 0
> 
> Has someone an idea what might be wrong here

I'm not saying that MTHCM is bug-free, but let us first consider
if it is operating correctly. Is it possible that you have several
hundred threads trying to access the same target host? These would
end up waiting on the same pool, and only 32 connections are given
out for each pool. 6 threads not blocked, 26 blocked on reading or
writing, and the rest waiting for a connection to be released?

In your thread dump, can you please check on how many different
connection pools the 300+ threads are waiting?

cheers,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org