You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Frederik Kraus <fr...@gmail.com> on 2011/09/28 12:58:23 UTC

strange performance issue with many shards on one server

 Hi, 


I am experiencing a strange issue doing some load tests. Our setup:

- 2 server with each 24 cpu cores, 130GB of RAM
- 10 shards per server (needed for response times) running in a single tomcat instance
- each query queries all 20 shards (distributed search)

- each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries)
- all caches are warmed / high cache hit rates (99%) etc.


Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries.

Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. 

Any ideas are greatly appreciated :)

Fred.


Re: strange performance issue with many shards on one server

Posted by Lance Norskog <go...@gmail.com>.
Come cache hit problems can be fixed with the Large Pages feature.

http://www.google.com/search?q=large+pages

On Wed, Sep 28, 2011 at 3:30 PM, Federico Fissore <fe...@fissore.org>wrote:

> Frederik Kraus, il 28/09/2011 23:16, ha scritto:
>
>   Yep, I'm not getting more than 50-60% CPU during those load tests.
>>
>>
> I would try reducing the number of shards. A part from the memory
> discussion, this really seems to me a concurrency issue: too many threads
> waiting for other threads to complete, too many context switches...
>
> recently, on a lots-of-cores database server, we INCREASED speed by
> REDUCING the number of cores/threads each query was allowed to use (making
> sense of our customer investment)
> maybe you can get a similar effect by reducing the number of pieces your
> distributed search has to merge
>
> my 2 eurocents
>
> federico
>



-- 
Lance Norskog
goksron@gmail.com

Re: strange performance issue with many shards on one server

Posted by Federico Fissore <fe...@fissore.org>.
Frederik Kraus, il 28/09/2011 23:16, ha scritto:
>   Yep, I'm not getting more than 50-60% CPU during those load tests.
>

I would try reducing the number of shards. A part from the memory 
discussion, this really seems to me a concurrency issue: too many 
threads waiting for other threads to complete, too many context switches...

recently, on a lots-of-cores database server, we INCREASED speed by 
REDUCING the number of cores/threads each query was allowed to use 
(making sense of our customer investment)
maybe you can get a similar effect by reducing the number of pieces your 
distributed search has to merge

my 2 eurocents

federico

Re: strange performance issue with many shards on one server

Posted by Frederik Kraus <fr...@gmail.com>.
 Yep, I'm not getting more than 50-60% CPU during those load tests. 


Am Mittwoch, 28. September 2011 um 23:01 schrieb Jaeger, Jay - DOT:

> Yes, that thread waits (in the sense that nothing useful gets done), but during that time, from the perspective of the applications and OS, that CPU is busy: it is not "waiting" in such a way that you can dispatch a different process.
> 
> The point is, that if this was actually the problem, it would show up in a higher CPU utilization than the correspondent reported.
> 
> -----Original Message-----
> From: Federico Fissore [mailto:federico@fissore.org] 
> Sent: Wednesday, September 28, 2011 2:04 PM
> To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
> Subject: Re: strange performance issue with many shards on one server
> 
> Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto:
> > That would still show up as the CPU being busy.
> 
> i don't know how the program (top, htop, whatever) displays the value 
> but when the cpu has a cache miss definitely that thread sits and waits 
> for a number of clock cycles
> with 130GB of ram (per server?) I suspect caches miss as a rule
> 
> just a suspicion however, nothing I'll bet on



RE: strange performance issue with many shards on one server

Posted by "Jaeger, Jay - DOT" <Ja...@dot.wi.gov>.
Yes, that thread waits (in the sense that nothing useful gets done), but during that time, from the perspective of the applications and OS, that CPU is busy: it is not "waiting" in such a way that you can dispatch a different process.

The point is, that if this was actually the problem, it would show up in a higher CPU utilization than the correspondent reported.

-----Original Message-----
From: Federico Fissore [mailto:federico@fissore.org] 
Sent: Wednesday, September 28, 2011 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: strange performance issue with many shards on one server

Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto:
> That  would still show up as the CPU being busy.
>

i don't know how the program (top, htop, whatever) displays the value 
but when the cpu has a cache miss definitely that thread sits and waits 
for a number of clock cycles
with 130GB of ram (per server?) I suspect caches miss as a rule

just a suspicion however, nothing I'll bet on

Re: strange performance issue with many shards on one server

Posted by Federico Fissore <fe...@fissore.org>.
Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto:
> That  would still show up as the CPU being busy.
>

i don't know how the program (top, htop, whatever) displays the value 
but when the cpu has a cache miss definitely that thread sits and waits 
for a number of clock cycles
with 130GB of ram (per server?) I suspect caches miss as a rule

just a suspicion however, nothing I'll bet on

RE: strange performance issue with many shards on one server

Posted by "Jaeger, Jay - DOT" <Ja...@dot.wi.gov>.
That  would still show up as the CPU being busy.

-----Original Message-----
From: Federico Fissore [mailto:federico@fissore.org] 
Sent: Wednesday, September 28, 2011 6:12 AM
To: solr-user@lucene.apache.org
Subject: Re: strange performance issue with many shards on one server

Frederik Kraus, il 28/09/2011 12:58, ha scritto:
>   Hi,
>
>
> I am experiencing a strange issue doing some load tests. Our setup:
>

just because I've listened to JUG mates talking about that at the last 
meeting, could it be that your CPUs are spending their time getting 
things from RAM to CPU cache?

maybe that, say, 10% CPU power is spent on the bus

federico

Re: strange performance issue with many shards on one server

Posted by Federico Fissore <fe...@fissore.org>.
Frederik Kraus, il 28/09/2011 12:58, ha scritto:
>   Hi,
>
>
> I am experiencing a strange issue doing some load tests. Our setup:
>

just because I've listened to JUG mates talking about that at the last 
meeting, could it be that your CPUs are spending their time getting 
things from RAM to CPU cache?

maybe that, say, 10% CPU power is spent on the bus

federico

Re: strange performance issue with many shards on one server

Posted by Frederik Kraus <fr...@gmail.com>.

Am Mittwoch, 28. September 2011 um 16:40 schrieb Toke Eskildsen:

> On Wed, 2011-09-28 at 12:58 +0200, Frederik Kraus wrote:
> > - 10 shards per server (needed for response times) running in a single tomcat instance
> 
> Have you tested that sharding actually decreases response times in your
> case? I see the idea in decreasing response times with sharding at the
> cost of decreasing throughput, but the added overhead of merging is
> non-trivial.
Yep unfortunately, the queries have huge boolean filterqueries for ACLs etc. which just take too long to compute in a single thread.

> 
> > - each query queries all 20 shards (distributed search)
> > 
> > - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries)
> > - all caches are warmed / high cache hit rates (99%) etc.
> 
> > Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries.
> 
> It sounds as if there's a hard limit on the number of concurrent users
> somewhere. I am no expert in httpclient, but the blocked threads in your
> thread dump seems to indicate that they wait for connections to be
> established rather than for results to be produced.
> 
> I seem to remember that tomcat has a default limit on 200 concurrent
> connections and with 10 shards/search, that is just 200 / (10
> shard_connections + 1 incoming_connection) = 18 concurrent searches.
> 

I have gradually bumped all of this up to (almost) infinity with no effect ;)


> > Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. 
> 
> Could be garbage collection, especially since it shows under high load
> which might result in more old objects and thereby trigger full gc.
 GC is only spending something like 50-100ms total for a 10min load test 




Re: strange performance issue with many shards on one server

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2011-09-28 at 12:58 +0200, Frederik Kraus wrote:
> - 10 shards per server (needed for response times) running in a single tomcat instance

Have you tested that sharding actually decreases response times in your
case? I see the idea in decreasing response times with sharding at the
cost of decreasing throughput, but the added overhead of merging is
non-trivial.

> - each query queries all 20 shards (distributed search)
> 
> - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries)
> - all caches are warmed / high cache hit rates (99%) etc.

> Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries.

It sounds as if there's a hard limit on the number of concurrent users
somewhere. I am no expert in httpclient, but the blocked threads in your
thread dump seems to indicate that they wait for connections to be
established rather than for results to be produced.

I seem to remember that tomcat has a default limit on 200 concurrent
connections and with 10 shards/search, that is just 200 / (10
shard_connections + 1 incoming_connection) = 18 concurrent searches.

> Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. 

Could be garbage collection, especially since it shows under high load
which might result in more old objects and thereby trigger full gc.


Re: strange performance issue with many shards on one server

Posted by Ken Krugler <kk...@transpac.com>.
Hi Frederik,

Did you figure out a solution to this problem?

I'm asking because I recently ran into a similar problem, with a similar setup (8 shards on one server).

Occasionally a query will take a very long time. Occasionally I see timeout exceptions with the HTTP requests. E.g.

> 348914 [pool-19-thread-14] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The
>  server localhost failed to respond
> 348915 [pool-19-thread-14] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request


Restarting Jetty seems to clear up the problem temporarily.

I've been looking at the code in Solr that handles distributed requests - and it's got some interesting smells, so I wouldn't be surprised if there's an issue related to how it's using HttpClient.

Regards,

-- Ken


On Sep 28, 2011, at 5:21am, Frederik Kraus wrote:

> I just had a look at the thread-dump, pasting 3 examples here:
> 
> 
> 'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982) 
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643) 
> at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) 
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423) 
> at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) 
> at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422) 
> at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892) 
> at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198) 
> at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) 
> at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
> at java.lang.Thread.run(Thread.java:662) 
> 
> 'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447) 
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) 
> at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) 
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) 
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
> at java.lang.Thread.run(Thread.java:662) 
> and 
> 
> 'http-8080-381' Id=6859, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, total cpu time=990.0000ms user time=920.0000ms
> 
> at sun.misc.Unsafe.park(Native Method) 
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) 
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) 
> at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164) 
> at org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469) 
> at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271) 
> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) 
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) 
> at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) 
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) 
> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) 
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) 
> at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) 
> at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) 
> at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554) 
> at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) 
> at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
> at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) 
> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) 
> at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) 
> at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) 
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) 
> at java.lang.Thread.run(Thread.java:662) 
> 
> 
> 
> 
> 
> 
> Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:
> 
>> 
>> 
>> Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
>> 
>>> Hi Fred,
>>> 
>>> ok, it's a strange behavior with same queries.
>>> Another questions:
>>> -which solr version?
>> 
>> 3.3 (might the NIOFSDirectory from 3.4 help?)
>> 
>>> -do you indexing during your load test? (because of index rebuilt)
>> nope
>> 
>>> -do you replicate your index?
>> 
>> nope 
>>> 
>>> Regards
>>> Vadim
>>> 
>>> 
>>> 
>>> 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com)>
>>> 
>>>> Hi Vladim,
>>>> 
>>>> the thing is, that those exact same queries, that take longer during a load
>>>> test, perform just fine when executed at a slower request rate and are also
>>>> random, i.e. there is no pattern in bad/slow queries.
>>>> 
>>>> My first thought was some kind of contention and/or connection starvation
>>>> for the internal shard communication?
>>>> 
>>>> Fred.
>>>> 
>>>> 
>>>> Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
>>>> 
>>>>> Hi Fred,
>>>>> analyze the queries which take longer.
>>>>> We observe our queries and see the problems with q-time with queries
>>>> which
>>>>> are complex, with phrase queries or queries which contains numbers or
>>>>> special characters.
>>>>> if you don't know it:
>>>> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
>>>>> Regards
>>>>> Vadim
>>>>> 
>>>>> 
>>>>> 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com) (mailto:
>>>> frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com))>
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> 
>>>>>> I am experiencing a strange issue doing some load tests. Our setup:
>>>>>> 
>>>>>> - 2 server with each 24 cpu cores, 130GB of RAM
>>>>>> - 10 shards per server (needed for response times) running in a single
>>>>>> tomcat instance
>>>>>> - each query queries all 20 shards (distributed search)
>>>>>> 
>>>>>> - each shard holds about 1.5 mio documents (small shards are needed due
>>>> to
>>>>>> rather complex queries)
>>>>>> - all caches are warmed / high cache hit rates (99%) etc.
>>>>>> 
>>>>>> 
>>>>>> Now for some reason we cannot seem to fully utilize all CPU power (no
>>>> disk
>>>>>> IO), ie. increasing concurrent users doesn't increase CPU-Load at a
>>>> point,
>>>>>> decreases throughput and increases the response times of the individual
>>>>>> queries.
>>>>>> 
>>>>>> Also 1-2% of the queries take significantly longer: avg somewhere at
>>>> 100ms
>>>>>> while 1-2% take 1.5s or longer.
>>>>>> 
>>>>>> Any ideas are greatly appreciated :)
>>>>>> 
>>>>>> Fred.
> 

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr





Re: strange performance issue with many shards on one server

Posted by Frederik Kraus <fr...@gmail.com>.
 Hi Ken,  

the HttpConnectionManager was actually the first thing I looked at - and bumped the Solr default of 20 up to 50, 100, 400, 10000 (which should be more or less unlimited ;) ). Unfortunately didn't really solve anything. I don't know if the "static" HttpClient is a problem here as it will be the same HttpConnectionManager for all shards …

Obviously a way of validating this would be to spawn 20 tomcat (or jetty) instances, one for each shard and 10 per server - hopefully there is an easier way ;)

By the way: Ubuntu / GC / etc. are all tuned and shouldn't be a bottleneck here. The GC only spends about 50-100ms during a 10min load test, and never a full-GC.  

Just going through a jstack dump again, it looks like the HttpConnectionManager is actually waiting for a lock …

"pool-31-thread-15776" prio=10 tid=0x00007ef544249000 nid=0x50be waiting for monitor entry [0x00007ef4d38fc000]
 java.lang.Thread.State: BLOCKED (on object monitor)
 at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
 - waiting to lock <0x00007f07dd6bfa70> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
 at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
 at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
 at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
 at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
….

Fred.  


Am Mittwoch, 28. September 2011 um 17:48 schrieb Ken Krugler:

> Hi Frederik,
>  
> I haven't directly run into this issue with Solr, but I have experienced similar issues in a related context.
>  
> In my case, I had a custom webapp that made SolrJ requests and then generated some aggregated/analyzed results.
>  
> During load testing, we ran into a few different issues...
>  
> 1. The load test software itself had an issue with scaling - I'm assuming that's not the case for you, but I've seen it happen more than once.
>  
> E.g. there's a limit to max parallel connections in the client being used to talk to Solr.
>  
> 2. We needed to tune up the SolrJ settings for the HttpConnectionManager
>  
> Under heavy load, this was running out of free connections.
>  
> Given you've got 20 shards, each request is going to spawn 20 HTTP connections.
>  
> I don't know off the top of my head how solr.SearchHandler manages connections (and whether it's possible to tune this), but from the stack trace below it sure looks like you're blocked on getting free HTTP connections.
>  
> 3. We needed to optimize our configuration for Jetty, Ubuntu, JVM GC, etc.
>  
> There are lots of knobs to twiddle here, for better or worse.
>  
> -- Ken
>  
> On Sep 28, 2011, at 5:21am, Frederik Kraus wrote:
>  
> > I just had a look at the thread-dump, pasting 3 examples here:
> >  
> >  
> > 'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)  
> > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)  
> > at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)  
> > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)  
> > at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)  
> > at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)  
> > at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)  
> > at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)  
> > at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)  
> > at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)  
> > at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)  
> > at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)  
> > at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)  
> > at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)  
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)  
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)  
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)  
> > at java.lang.Thread.run(Thread.java:662)  
> >  
> > 'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)  
> > at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)  
> > at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)  
> > at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)  
> > at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)  
> > at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)  
> > at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)  
> > at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)  
> > at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)  
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)  
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)  
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)  
> > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)  
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)  
> > at java.lang.Thread.run(Thread.java:662)  
> > and  
> >  
> > 'http-8080-381' Id=6859, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, total cpu time=990.0000ms user time=920.0000ms
> >  
> > at sun.misc.Unsafe.park(Native Method)  
> > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)  
> > at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)  
> > at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)  
> > at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164)  
> > at org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469)  
> > at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271)  
> > at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)  
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)  
> > at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)  
> > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)  
> > at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)  
> > at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)  
> > at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)  
> > at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)  
> > at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554)  
> > at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)  
> > at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)  
> > at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)  
> > at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)  
> > at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)  
> > at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)  
> > at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)  
> > at java.lang.Thread.run(Thread.java:662)  
> >  
> >  
> >  
> >  
> >  
> >  
> > Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:
> >  
> > >  
> > >  
> > > Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
> > >  
> > > > Hi Fred,
> > > >  
> > > > ok, it's a strange behavior with same queries.
> > > > Another questions:
> > > > -which solr version?
> > >  
> > > 3.3 (might the NIOFSDirectory from 3.4 help?)
> > >  
> > > > -do you indexing during your load test? (because of index rebuilt)
> > > nope
> > >  
> > > > -do you replicate your index?
> > >  
> > > nope  
> > > >  
> > > > Regards
> > > > Vadim
> > > >  
> > > >  
> > > >  
> > > > 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com)>
> > > >  
> > > > > Hi Vladim,
> > > > >  
> > > > > the thing is, that those exact same queries, that take longer during a load
> > > > > test, perform just fine when executed at a slower request rate and are also
> > > > > random, i.e. there is no pattern in bad/slow queries.
> > > > >  
> > > > > My first thought was some kind of contention and/or connection starvation
> > > > > for the internal shard communication?
> > > > >  
> > > > > Fred.
> > > > >  
> > > > >  
> > > > > Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
> > > > >  
> > > > > > Hi Fred,
> > > > > > analyze the queries which take longer.
> > > > > > We observe our queries and see the problems with q-time with queries
> > > > > which
> > > > > > are complex, with phrase queries or queries which contains numbers or
> > > > > > special characters.
> > > > > > if you don't know it:
> > > > > http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> > > > > > Regards
> > > > > > Vadim
> > > > > >  
> > > > > >  
> > > > > > 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com) (mailto:
> > > > > frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com))>
> > > > > >  
> > > > > > > Hi,
> > > > > > >  
> > > > > > >  
> > > > > > > I am experiencing a strange issue doing some load tests. Our setup:
> > > > > > >  
> > > > > > > - 2 server with each 24 cpu cores, 130GB of RAM
> > > > > > > - 10 shards per server (needed for response times) running in a single
> > > > > > > tomcat instance
> > > > > > > - each query queries all 20 shards (distributed search)
> > > > > > >  
> > > > > > > - each shard holds about 1.5 mio documents (small shards are needed due
> > > > > to
> > > > > > > rather complex queries)
> > > > > > > - all caches are warmed / high cache hit rates (99%) etc.
> > > > > > >  
> > > > > > >  
> > > > > > > Now for some reason we cannot seem to fully utilize all CPU power (no
> > > > > disk
> > > > > > > IO), ie. increasing concurrent users doesn't increase CPU-Load at a
> > > > > point,
> > > > > > > decreases throughput and increases the response times of the individual
> > > > > > > queries.
> > > > > > >  
> > > > > > > Also 1-2% of the queries take significantly longer: avg somewhere at
> > > > > 100ms
> > > > > > > while 1-2% take 1.5s or longer.
> > > > > > >  
> > > > > > > Any ideas are greatly appreciated :)
> > > > > > >  
> > > > > > > Fred.
>  
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr



Re: strange performance issue with many shards on one server

Posted by Ken Krugler <kk...@transpac.com>.
Hi Frederik,

I haven't directly run into this issue with Solr, but I have experienced similar issues in a related context.

In my case, I had a custom webapp that made SolrJ requests and then generated some aggregated/analyzed results.

During load testing, we ran into a few different issues...

1. The load test software itself had an issue with scaling - I'm assuming that's not the case for you, but I've seen it happen more than once.

E.g. there's a limit to max parallel connections in the client being used to talk to Solr.

2. We needed to tune up the SolrJ settings for the HttpConnectionManager

Under heavy load, this was running out of free connections.

Given you've got 20 shards, each request is going to spawn 20 HTTP connections.

I don't know off the top of my head how solr.SearchHandler manages connections (and whether it's possible to tune this), but from the stack trace below it sure looks like you're blocked on getting free HTTP connections.

3. We needed to optimize our configuration for Jetty, Ubuntu, JVM GC, etc.

There are lots of knobs to twiddle here, for better or worse.

-- Ken

On Sep 28, 2011, at 5:21am, Frederik Kraus wrote:

> I just had a look at the thread-dump, pasting 3 examples here:
> 
> 
> 'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982) 
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643) 
> at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) 
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423) 
> at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) 
> at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422) 
> at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892) 
> at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198) 
> at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) 
> at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
> at java.lang.Thread.run(Thread.java:662) 
> 
> 'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447) 
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) 
> at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) 
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) 
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
> at java.lang.Thread.run(Thread.java:662) 
> and 
> 
> 'http-8080-381' Id=6859, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, total cpu time=990.0000ms user time=920.0000ms
> 
> at sun.misc.Unsafe.park(Native Method) 
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) 
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) 
> at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164) 
> at org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469) 
> at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271) 
> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) 
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) 
> at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) 
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) 
> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) 
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) 
> at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) 
> at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) 
> at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554) 
> at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) 
> at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
> at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) 
> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) 
> at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) 
> at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) 
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) 
> at java.lang.Thread.run(Thread.java:662) 
> 
> 
> 
> 
> 
> 
> Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:
> 
>> 
>> 
>> Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
>> 
>>> Hi Fred,
>>> 
>>> ok, it's a strange behavior with same queries.
>>> Another questions:
>>> -which solr version?
>> 
>> 3.3 (might the NIOFSDirectory from 3.4 help?)
>> 
>>> -do you indexing during your load test? (because of index rebuilt)
>> nope
>> 
>>> -do you replicate your index?
>> 
>> nope 
>>> 
>>> Regards
>>> Vadim
>>> 
>>> 
>>> 
>>> 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com)>
>>> 
>>>> Hi Vladim,
>>>> 
>>>> the thing is, that those exact same queries, that take longer during a load
>>>> test, perform just fine when executed at a slower request rate and are also
>>>> random, i.e. there is no pattern in bad/slow queries.
>>>> 
>>>> My first thought was some kind of contention and/or connection starvation
>>>> for the internal shard communication?
>>>> 
>>>> Fred.
>>>> 
>>>> 
>>>> Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
>>>> 
>>>>> Hi Fred,
>>>>> analyze the queries which take longer.
>>>>> We observe our queries and see the problems with q-time with queries
>>>> which
>>>>> are complex, with phrase queries or queries which contains numbers or
>>>>> special characters.
>>>>> if you don't know it:
>>>> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
>>>>> Regards
>>>>> Vadim
>>>>> 
>>>>> 
>>>>> 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com) (mailto:
>>>> frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com))>
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> 
>>>>>> I am experiencing a strange issue doing some load tests. Our setup:
>>>>>> 
>>>>>> - 2 server with each 24 cpu cores, 130GB of RAM
>>>>>> - 10 shards per server (needed for response times) running in a single
>>>>>> tomcat instance
>>>>>> - each query queries all 20 shards (distributed search)
>>>>>> 
>>>>>> - each shard holds about 1.5 mio documents (small shards are needed due
>>>> to
>>>>>> rather complex queries)
>>>>>> - all caches are warmed / high cache hit rates (99%) etc.
>>>>>> 
>>>>>> 
>>>>>> Now for some reason we cannot seem to fully utilize all CPU power (no
>>>> disk
>>>>>> IO), ie. increasing concurrent users doesn't increase CPU-Load at a
>>>> point,
>>>>>> decreases throughput and increases the response times of the individual
>>>>>> queries.
>>>>>> 
>>>>>> Also 1-2% of the queries take significantly longer: avg somewhere at
>>>> 100ms
>>>>>> while 1-2% take 1.5s or longer.
>>>>>> 
>>>>>> Any ideas are greatly appreciated :)
>>>>>> 
>>>>>> Fred.
> 

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr




Re: strange performance issue with many shards on one server

Posted by Vadim Kisselmann <v....@googlemail.com>.
Hmm, sorry don't know...
My ideas:
- tomcat generate this problem (for example: maxthreads, number of
connections...)
- JVM - Options, especially GC
- index locks, eventually an open issue in jira

Regards
Vadim




2011/9/28 Frederik Kraus <fr...@gmail.com>

> I just had a look at the thread-dump, pasting 3 examples here:
>
>
> 'pool-31-thread-8233' Id=11626, BLOCKED on
> lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
> total cpu time=20.0000ms user time=20.0000ms
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)
> at
> org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)
> at
> org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
> at
> org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)
> at
> org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)
> at
> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)
> at
> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
> at
> org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
> 'pool-31-thread-8232' Id=11625, BLOCKED on
> lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
> total cpu time=20.0000ms user time=20.0000ms
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
> at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
> at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
> at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
> at
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> and
>
> 'http-8080-381' Id=6859, WAITING on
> lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720,
> total cpu time=990.0000ms user time=920.0000ms
>
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
> at
> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164)
> at
> org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> at java.lang.Thread.run(Thread.java:662)
>
>
>
>
>
>
> Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:
>
> >
> >
> > Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
> >
> > > Hi Fred,
> > >
> > > ok, it's a strange behavior with same queries.
> > > Another questions:
> > > -which solr version?
> >
> > 3.3 (might the NIOFSDirectory from 3.4 help?)
> >
> > > -do you indexing during your load test? (because of index rebuilt)
> > nope
> >
> > > -do you replicate your index?
> >
> > nope
> > >
> > > Regards
> > > Vadim
> > >
> > >
> > >
> > > 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:
> frederik.kraus@gmail.com)>
> > >
> > > > Hi Vladim,
> > > >
> > > > the thing is, that those exact same queries, that take longer during
> a load
> > > > test, perform just fine when executed at a slower request rate and
> are also
> > > > random, i.e. there is no pattern in bad/slow queries.
> > > >
> > > > My first thought was some kind of contention and/or connection
> starvation
> > > > for the internal shard communication?
> > > >
> > > > Fred.
> > > >
> > > >
> > > > Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
> > > >
> > > > > Hi Fred,
> > > > > analyze the queries which take longer.
> > > > > We observe our queries and see the problems with q-time with
> queries
> > > > which
> > > > > are complex, with phrase queries or queries which contains numbers
> or
> > > > > special characters.
> > > > > if you don't know it:
> > > >
> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> > > > > Regards
> > > > > Vadim
> > > > >
> > > > >
> > > > > 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:
> frederik.kraus@gmail.com) (mailto:
> > > > frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com))>
> > > > >
> > > > > >  Hi,
> > > > > >
> > > > > >
> > > > > > I am experiencing a strange issue doing some load tests. Our
> setup:
> > > > > >
> > > > > > - 2 server with each 24 cpu cores, 130GB of RAM
> > > > > > - 10 shards per server (needed for response times) running in a
> single
> > > > > > tomcat instance
> > > > > > - each query queries all 20 shards (distributed search)
> > > > > >
> > > > > > - each shard holds about 1.5 mio documents (small shards are
> needed due
> > > > to
> > > > > > rather complex queries)
> > > > > > - all caches are warmed / high cache hit rates (99%) etc.
> > > > > >
> > > > > >
> > > > > > Now for some reason we cannot seem to fully utilize all CPU power
> (no
> > > > disk
> > > > > > IO), ie. increasing concurrent users doesn't increase CPU-Load at
> a
> > > > point,
> > > > > > decreases throughput and increases the response times of the
> individual
> > > > > > queries.
> > > > > >
> > > > > > Also 1-2% of the queries take significantly longer: avg somewhere
> at
> > > > 100ms
> > > > > > while 1-2% take 1.5s or longer.
> > > > > >
> > > > > > Any ideas are greatly appreciated :)
> > > > > >
> > > > > > Fred.
>
>

Re: strange performance issue with many shards on one server

Posted by Frederik Kraus <fr...@gmail.com>.
I just had a look at the thread-dump, pasting 3 examples here:


'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982) 
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643) 
at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) 
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423) 
at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) 
at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422) 
at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892) 
at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198) 
at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) 
at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181) 
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486) 
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) 
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) 
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) 
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
at java.lang.Thread.run(Thread.java:662) 

'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447) 
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) 
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) 
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) 
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) 
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) 
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) 
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) 
at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) 
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
at java.lang.Thread.run(Thread.java:662) 
and 

'http-8080-381' Id=6859, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, total cpu time=990.0000ms user time=920.0000ms

at sun.misc.Unsafe.park(Native Method) 
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) 
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) 
at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164) 
at org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469) 
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271) 
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) 
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) 
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) 
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) 
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) 
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) 
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) 
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) 
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554) 
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) 
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) 
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) 
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) 
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) 
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) 
at java.lang.Thread.run(Thread.java:662) 






Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:

> 
> 
> Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
> 
> > Hi Fred,
> > 
> > ok, it's a strange behavior with same queries.
> > Another questions:
> > -which solr version?
> 
> 3.3 (might the NIOFSDirectory from 3.4 help?)
> 
> > -do you indexing during your load test? (because of index rebuilt)
> nope
> 
> > -do you replicate your index?
> 
> nope 
> > 
> > Regards
> > Vadim
> > 
> > 
> > 
> > 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com)>
> > 
> > > Hi Vladim,
> > > 
> > > the thing is, that those exact same queries, that take longer during a load
> > > test, perform just fine when executed at a slower request rate and are also
> > > random, i.e. there is no pattern in bad/slow queries.
> > > 
> > > My first thought was some kind of contention and/or connection starvation
> > > for the internal shard communication?
> > > 
> > > Fred.
> > > 
> > > 
> > > Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
> > > 
> > > > Hi Fred,
> > > > analyze the queries which take longer.
> > > > We observe our queries and see the problems with q-time with queries
> > > which
> > > > are complex, with phrase queries or queries which contains numbers or
> > > > special characters.
> > > > if you don't know it:
> > > http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> > > > Regards
> > > > Vadim
> > > > 
> > > > 
> > > > 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com) (mailto:
> > > frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com))>
> > > > 
> > > > >  Hi,
> > > > > 
> > > > > 
> > > > > I am experiencing a strange issue doing some load tests. Our setup:
> > > > > 
> > > > > - 2 server with each 24 cpu cores, 130GB of RAM
> > > > > - 10 shards per server (needed for response times) running in a single
> > > > > tomcat instance
> > > > > - each query queries all 20 shards (distributed search)
> > > > > 
> > > > > - each shard holds about 1.5 mio documents (small shards are needed due
> > > to
> > > > > rather complex queries)
> > > > > - all caches are warmed / high cache hit rates (99%) etc.
> > > > > 
> > > > > 
> > > > > Now for some reason we cannot seem to fully utilize all CPU power (no
> > > disk
> > > > > IO), ie. increasing concurrent users doesn't increase CPU-Load at a
> > > point,
> > > > > decreases throughput and increases the response times of the individual
> > > > > queries.
> > > > > 
> > > > > Also 1-2% of the queries take significantly longer: avg somewhere at
> > > 100ms
> > > > > while 1-2% take 1.5s or longer.
> > > > > 
> > > > > Any ideas are greatly appreciated :)
> > > > > 
> > > > > Fred.


Re: strange performance issue with many shards on one server

Posted by Frederik Kraus <fr...@gmail.com>.

Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:

> Hi Fred,
> 
> ok, it's a strange behavior with same queries.
> Another questions:
> -which solr version?

3.3 (might the NIOFSDirectory from 3.4 help?)
 
> -do you indexing during your load test? (because of index rebuilt)
nope
 
> -do you replicate your index?

nope 
> 
> Regards
> Vadim
> 
> 
> 
> 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com)>
> 
> > Hi Vladim,
> > 
> > the thing is, that those exact same queries, that take longer during a load
> > test, perform just fine when executed at a slower request rate and are also
> > random, i.e. there is no pattern in bad/slow queries.
> > 
> > My first thought was some kind of contention and/or connection starvation
> > for the internal shard communication?
> > 
> > Fred.
> > 
> > 
> > Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
> > 
> > > Hi Fred,
> > > analyze the queries which take longer.
> > > We observe our queries and see the problems with q-time with queries
> > which
> > > are complex, with phrase queries or queries which contains numbers or
> > > special characters.
> > > if you don't know it:
> > http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> > > Regards
> > > Vadim
> > > 
> > > 
> > > 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com) (mailto:
> > frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com))>
> > > 
> > > >  Hi,
> > > > 
> > > > 
> > > > I am experiencing a strange issue doing some load tests. Our setup:
> > > > 
> > > > - 2 server with each 24 cpu cores, 130GB of RAM
> > > > - 10 shards per server (needed for response times) running in a single
> > > > tomcat instance
> > > > - each query queries all 20 shards (distributed search)
> > > > 
> > > > - each shard holds about 1.5 mio documents (small shards are needed due
> > to
> > > > rather complex queries)
> > > > - all caches are warmed / high cache hit rates (99%) etc.
> > > > 
> > > > 
> > > > Now for some reason we cannot seem to fully utilize all CPU power (no
> > disk
> > > > IO), ie. increasing concurrent users doesn't increase CPU-Load at a
> > point,
> > > > decreases throughput and increases the response times of the individual
> > > > queries.
> > > > 
> > > > Also 1-2% of the queries take significantly longer: avg somewhere at
> > 100ms
> > > > while 1-2% take 1.5s or longer.
> > > > 
> > > > Any ideas are greatly appreciated :)
> > > > 
> > > > Fred.


Re: strange performance issue with many shards on one server

Posted by Vadim Kisselmann <v....@googlemail.com>.
Hi Fred,

ok, it's a strange behavior with same queries.
Another questions:
-which solr version?
-do you indexing during your load test? (because of index rebuilt)
-do you replicate your index?

Regards
Vadim



2011/9/28 Frederik Kraus <fr...@gmail.com>

> Hi Vladim,
>
> the thing is, that those exact same queries, that take longer during a load
> test, perform just fine when executed at a slower request rate and are also
> random, i.e. there is no pattern in bad/slow queries.
>
> My first thought was some kind of contention and/or connection starvation
> for the internal shard communication?
>
> Fred.
>
>
> Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
>
> > Hi Fred,
> > analyze the queries which take longer.
> > We observe our queries and see the problems with q-time with queries
> which
> > are complex, with phrase queries or queries which contains numbers or
> > special characters.
> > if you don't know it:
> >
> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> > Regards
> > Vadim
> >
> >
> > 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:
> frederik.kraus@gmail.com)>
> >
> > >  Hi,
> > >
> > >
> > > I am experiencing a strange issue doing some load tests. Our setup:
> > >
> > > - 2 server with each 24 cpu cores, 130GB of RAM
> > > - 10 shards per server (needed for response times) running in a single
> > > tomcat instance
> > > - each query queries all 20 shards (distributed search)
> > >
> > > - each shard holds about 1.5 mio documents (small shards are needed due
> to
> > > rather complex queries)
> > > - all caches are warmed / high cache hit rates (99%) etc.
> > >
> > >
> > > Now for some reason we cannot seem to fully utilize all CPU power (no
> disk
> > > IO), ie. increasing concurrent users doesn't increase CPU-Load at a
> point,
> > > decreases throughput and increases the response times of the individual
> > > queries.
> > >
> > > Also 1-2% of the queries take significantly longer: avg somewhere at
> 100ms
> > > while 1-2% take 1.5s or longer.
> > >
> > > Any ideas are greatly appreciated :)
> > >
> > > Fred.
>
>

Re: strange performance issue with many shards on one server

Posted by Frederik Kraus <fr...@gmail.com>.
Hi Vladim, 

the thing is, that those exact same queries, that take longer during a load test, perform just fine when executed at a slower request rate and are also random, i.e. there is no pattern in bad/slow queries.

My first thought was some kind of contention and/or connection starvation for the internal shard communication?

Fred.


Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:

> Hi Fred,
> analyze the queries which take longer.
> We observe our queries and see the problems with q-time with queries which
> are complex, with phrase queries or queries which contains numbers or
> special characters.
> if you don't know it:
> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
> Regards
> Vadim
> 
> 
> 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com)>
> 
> >  Hi,
> > 
> > 
> > I am experiencing a strange issue doing some load tests. Our setup:
> > 
> > - 2 server with each 24 cpu cores, 130GB of RAM
> > - 10 shards per server (needed for response times) running in a single
> > tomcat instance
> > - each query queries all 20 shards (distributed search)
> > 
> > - each shard holds about 1.5 mio documents (small shards are needed due to
> > rather complex queries)
> > - all caches are warmed / high cache hit rates (99%) etc.
> > 
> > 
> > Now for some reason we cannot seem to fully utilize all CPU power (no disk
> > IO), ie. increasing concurrent users doesn't increase CPU-Load at a point,
> > decreases throughput and increases the response times of the individual
> > queries.
> > 
> > Also 1-2% of the queries take significantly longer: avg somewhere at 100ms
> > while 1-2% take 1.5s or longer.
> > 
> > Any ideas are greatly appreciated :)
> > 
> > Fred.


Re: strange performance issue with many shards on one server

Posted by Vadim Kisselmann <v....@googlemail.com>.
Hi Fred,
analyze the queries which take longer.
We observe our queries and see the problems with q-time with queries which
are complex, with phrase queries or queries which contains numbers or
special characters.
if you don't know it:
http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
Regards
Vadim


2011/9/28 Frederik Kraus <fr...@gmail.com>

>  Hi,
>
>
> I am experiencing a strange issue doing some load tests. Our setup:
>
> - 2 server with each 24 cpu cores, 130GB of RAM
> - 10 shards per server (needed for response times) running in a single
> tomcat instance
> - each query queries all 20 shards (distributed search)
>
> - each shard holds about 1.5 mio documents (small shards are needed due to
> rather complex queries)
> - all caches are warmed / high cache hit rates (99%) etc.
>
>
> Now for some reason we cannot seem to fully utilize all CPU power (no disk
> IO), ie. increasing concurrent users doesn't increase CPU-Load at a point,
> decreases throughput and increases the response times of the individual
> queries.
>
> Also 1-2% of the queries take significantly longer: avg somewhere at 100ms
> while 1-2% take 1.5s or longer.
>
> Any ideas are greatly appreciated :)
>
> Fred.
>
>