You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ken Krugler <kk...@transpac.com> on 2011/12/30 05:16:02 UTC

Re: strange performance issue with many shards on one server

Hi Frederik,

Did you figure out a solution to this problem?

I'm asking because I recently ran into a similar problem, with a similar setup (8 shards on one server).

Occasionally a query will take a very long time. Occasionally I see timeout exceptions with the HTTP requests. E.g.

> 348914 [pool-19-thread-14] INFO org.apache.commons.httpclient.HttpMethodDirector - I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The
>  server localhost failed to respond
> 348915 [pool-19-thread-14] INFO org.apache.commons.httpclient.HttpMethodDirector - Retrying request


Restarting Jetty seems to clear up the problem temporarily.

I've been looking at the code in Solr that handles distributed requests - and it's got some interesting smells, so I wouldn't be surprised if there's an issue related to how it's using HttpClient.

Regards,

-- Ken


On Sep 28, 2011, at 5:21am, Frederik Kraus wrote:

> I just had a look at the thread-dump, pasting 3 examples here:
> 
> 
> 'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982) 
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643) 
> at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) 
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423) 
> at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) 
> at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422) 
> at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892) 
> at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198) 
> at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) 
> at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
> at java.lang.Thread.run(Thread.java:662) 
> 
> 'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.0000ms user time=20.0000ms
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447) 
> at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) 
> at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) 
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) 
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) 
> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) 
> at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
> at java.lang.Thread.run(Thread.java:662) 
> and 
> 
> 'http-8080-381' Id=6859, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, total cpu time=990.0000ms user time=920.0000ms
> 
> at sun.misc.Unsafe.park(Native Method) 
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) 
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) 
> at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164) 
> at org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469) 
> at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271) 
> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) 
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) 
> at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) 
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) 
> at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) 
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) 
> at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) 
> at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) 
> at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554) 
> at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) 
> at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
> at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) 
> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) 
> at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) 
> at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) 
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) 
> at java.lang.Thread.run(Thread.java:662) 
> 
> 
> 
> 
> 
> 
> Am Mittwoch, 28. September 2011 um 13:53 schrieb Frederik Kraus:
> 
>> 
>> 
>> Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann:
>> 
>>> Hi Fred,
>>> 
>>> ok, it's a strange behavior with same queries.
>>> Another questions:
>>> -which solr version?
>> 
>> 3.3 (might the NIOFSDirectory from 3.4 help?)
>> 
>>> -do you indexing during your load test? (because of index rebuilt)
>> nope
>> 
>>> -do you replicate your index?
>> 
>> nope 
>>> 
>>> Regards
>>> Vadim
>>> 
>>> 
>>> 
>>> 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com)>
>>> 
>>>> Hi Vladim,
>>>> 
>>>> the thing is, that those exact same queries, that take longer during a load
>>>> test, perform just fine when executed at a slower request rate and are also
>>>> random, i.e. there is no pattern in bad/slow queries.
>>>> 
>>>> My first thought was some kind of contention and/or connection starvation
>>>> for the internal shard communication?
>>>> 
>>>> Fred.
>>>> 
>>>> 
>>>> Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann:
>>>> 
>>>>> Hi Fred,
>>>>> analyze the queries which take longer.
>>>>> We observe our queries and see the problems with q-time with queries
>>>> which
>>>>> are complex, with phrase queries or queries which contains numbers or
>>>>> special characters.
>>>>> if you don't know it:
>>>> http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance
>>>>> Regards
>>>>> Vadim
>>>>> 
>>>>> 
>>>>> 2011/9/28 Frederik Kraus <frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com) (mailto:
>>>> frederik.kraus@gmail.com (mailto:frederik.kraus@gmail.com))>
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> 
>>>>>> I am experiencing a strange issue doing some load tests. Our setup:
>>>>>> 
>>>>>> - 2 server with each 24 cpu cores, 130GB of RAM
>>>>>> - 10 shards per server (needed for response times) running in a single
>>>>>> tomcat instance
>>>>>> - each query queries all 20 shards (distributed search)
>>>>>> 
>>>>>> - each shard holds about 1.5 mio documents (small shards are needed due
>>>> to
>>>>>> rather complex queries)
>>>>>> - all caches are warmed / high cache hit rates (99%) etc.
>>>>>> 
>>>>>> 
>>>>>> Now for some reason we cannot seem to fully utilize all CPU power (no
>>>> disk
>>>>>> IO), ie. increasing concurrent users doesn't increase CPU-Load at a
>>>> point,
>>>>>> decreases throughput and increases the response times of the individual
>>>>>> queries.
>>>>>> 
>>>>>> Also 1-2% of the queries take significantly longer: avg somewhere at
>>>> 100ms
>>>>>> while 1-2% take 1.5s or longer.
>>>>>> 
>>>>>> Any ideas are greatly appreciated :)
>>>>>> 
>>>>>> Fred.
> 

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr