You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alessandro Benedetti <ab...@apache.org> on 2016/01/04 14:21:15 UTC

[Manual Sharding] Solr distrib search cause thread exhaustion

Hi guys,
this is the scenario we are studying :

Solr 4.10.2
16 shards, a solr instance aggregating the results running a distrib query
with shards=..... ( all the shards) .

Currently we are not using shards.tolerant=true, so we throw an exception
on error.

We are in a situation when a shard is too slow to respond ( empty filter
cache, big load).
According to the timeout that the shard handler is expecting that shard is
not fast enough, and for this reason we whole request fails.

So far, everything is clear.
We need to improve the speed of the shards, managing properly the auto
warming , load balancing etc .
We can play with the tolerant factor, and possibly be tolerant of errors.

But what happens is that the solr aggregator which runs the queries against
the shards is exhausting his threads...
Looking into the code, in the case we are not tolerant we get this :

// Was there an exception?
> if (srsp.getException() != null) {
>   // If things are not tolerant, abort everything and rethrow
>   if(!tolerant) {
>    * shardHandler1.cancelAll();*
>     if (srsp.getException() instanceof SolrException) {
>       throw (SolrException)srsp.getException();
>     } else {
>       throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> srsp.getException());
>     }


I would assume that is the responsible of the thread cleaning.
Any idea why the thread cleaning should not happen properly?
Can be some jetty misconfiguration ?

Cheers
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: [Manual Sharding] Solr distrib search cause thread exhaustion

Posted by Alessandro Benedetti <ab...@apache.org>.
Yes Erick, our jetty is configured with a 10.000 threads.

Actually the puzzle got more complicated as we realised the connTimeout by
default is set to 0.
But we definetely get an error from one of the shards and the aggregator
throw the exception because not tolerant.

The weird thing is that the shard presents an error which is a typical clue
of a client closing the http connection.

*Jan 03 16:55:55 solr-a00.bug.example.com <http://solr-a00.bug.example.com>
java[10661]: 37661057 [qtp111115642-279052] ERROR
org.apache.solr.servlet.SolrDispatchFilter  –
null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://solr10.bug <http://solr10.bug>*
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
...
*Jan 03 16:55:55 solr-a00.bug.example.com <http://solr-a00.bug.example.com>
java[10661]: Caused by: org.apache.solr.client.solrj.SolrServerException:
IOException occured when talking to server at: http://solr10.bug
<http://solr10.bug>*
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:157)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Jan 03 16:55:55 solr-a00.bug.example.com java[10661]: ... 1 more
*Jan 03 16:55:55 solr-a00.bug.example.com <http://solr-a00.bug.example.com>
java[10661]: Caused by: java.net.SocketException: Connection reset*
...

*Shard Log*

*Jan 03 16:55:10 solr.bug.example.com <http://solr.bug.example.com>
java[21200]: 1214068 [qtp1018590076-595] ERROR
org.apache.solr.servlet.SolrDispatchFilter  –
null:org.eclipse.jetty.io.EofException*
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:207)
...
*Jan 03 16:55:10 solr.bug.example.com <http://solr.bug.example.com>
java[21200]: 1214073 [qtp1018590076-595] ERROR
org.apache.solr.servlet.SolrDispatchFilter  –
null:org.eclipse.jetty.io.EofException*
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
...
*Jan 03 16:55:10 solr.bug.example.com <http://solr.bug.example.com>
java[21200]: 1214074 [qtp1018590076-595] WARN
 org.eclipse.jetty.server.Response  – Committed before 500
{trace=org.eclipse.jetty.io.EofException*
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
Jan 03 16:55:10 solr.bug.example.com java[21200]: at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)




On 4 January 2016 at 16:26, Erick Erickson <er...@gmail.com> wrote:

> How many threads are you allocating for the servlet container? 10,000
> is the "usual" number.
>
> Best,
> Erick
>
> On Mon, Jan 4, 2016 at 5:21 AM, Alessandro Benedetti
> <ab...@apache.org> wrote:
> > Hi guys,
> > this is the scenario we are studying :
> >
> > Solr 4.10.2
> > 16 shards, a solr instance aggregating the results running a distrib
> query
> > with shards=..... ( all the shards) .
> >
> > Currently we are not using shards.tolerant=true, so we throw an exception
> > on error.
> >
> > We are in a situation when a shard is too slow to respond ( empty filter
> > cache, big load).
> > According to the timeout that the shard handler is expecting that shard
> is
> > not fast enough, and for this reason we whole request fails.
> >
> > So far, everything is clear.
> > We need to improve the speed of the shards, managing properly the auto
> > warming , load balancing etc .
> > We can play with the tolerant factor, and possibly be tolerant of errors.
> >
> > But what happens is that the solr aggregator which runs the queries
> against
> > the shards is exhausting his threads...
> > Looking into the code, in the case we are not tolerant we get this :
> >
> > // Was there an exception?
> >> if (srsp.getException() != null) {
> >>   // If things are not tolerant, abort everything and rethrow
> >>   if(!tolerant) {
> >>    * shardHandler1.cancelAll();*
> >>     if (srsp.getException() instanceof SolrException) {
> >>       throw (SolrException)srsp.getException();
> >>     } else {
> >>       throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> >> srsp.getException());
> >>     }
> >
> >
> > I would assume that is the responsible of the thread cleaning.
> > Any idea why the thread cleaning should not happen properly?
> > Can be some jetty misconfiguration ?
> >
> > Cheers
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: [Manual Sharding] Solr distrib search cause thread exhaustion

Posted by Erick Erickson <er...@gmail.com>.
How many threads are you allocating for the servlet container? 10,000
is the "usual" number.

Best,
Erick

On Mon, Jan 4, 2016 at 5:21 AM, Alessandro Benedetti
<ab...@apache.org> wrote:
> Hi guys,
> this is the scenario we are studying :
>
> Solr 4.10.2
> 16 shards, a solr instance aggregating the results running a distrib query
> with shards=..... ( all the shards) .
>
> Currently we are not using shards.tolerant=true, so we throw an exception
> on error.
>
> We are in a situation when a shard is too slow to respond ( empty filter
> cache, big load).
> According to the timeout that the shard handler is expecting that shard is
> not fast enough, and for this reason we whole request fails.
>
> So far, everything is clear.
> We need to improve the speed of the shards, managing properly the auto
> warming , load balancing etc .
> We can play with the tolerant factor, and possibly be tolerant of errors.
>
> But what happens is that the solr aggregator which runs the queries against
> the shards is exhausting his threads...
> Looking into the code, in the case we are not tolerant we get this :
>
> // Was there an exception?
>> if (srsp.getException() != null) {
>>   // If things are not tolerant, abort everything and rethrow
>>   if(!tolerant) {
>>    * shardHandler1.cancelAll();*
>>     if (srsp.getException() instanceof SolrException) {
>>       throw (SolrException)srsp.getException();
>>     } else {
>>       throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
>> srsp.getException());
>>     }
>
>
> I would assume that is the responsible of the thread cleaning.
> Any idea why the thread cleaning should not happen properly?
> Can be some jetty misconfiguration ?
>
> Cheers
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England