You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Doss <it...@gmail.com> on 2018/04/04 05:46:22 UTC

SOLR Cloud: 1500+ threads are in TIMED_WAITING status

We have SOLR(7.0.1) cloud 3 VM Linux instances wit 4 CPU, 90 GB RAM with
zookeeper (3.4.11) ensemble running on the same machines. We have 130 cores
of overall size of 45GB. No Sharding, almost all VMs has the same copy of
data. These nodes are under LB.

Index Config:
=============

<ramBufferSizeMB>300</ramBufferSizeMB>
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
       <int name="maxMergeAtOnce">30</int>
       <int name="maxMergeAtOnceExplicit">100</int>
       <double name="segmentsPerTier">30.0</double>
</mergePolicyFactory>

<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
       <int name="maxMergeCount">18</int>
       <int name="maxThreadCount">6</int>
</mergeScheduler>

Commit Configs:
===============
<autoCommit>
       <maxTime>${solr.autoCommit.maxTime:600000}</maxTime>
       <openSearcher>false</openSearcher>
</autoCommit>

<autoSoftCommit>
       <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
</autoSoftCommit>


We do 3500 Insert / Updates per second spread across all 130 cores, We yet
to start using selects effectively.

The problem what we are facing is at times suddenly the thread count
increase heavily which results SOLR non responsive or throwing 503 response
for client (PHP HTTP CURL) requests.

Today 04-04-2018 the thread dump shows that the peak went upto 13000+

Please hlep me in fixing this issue. Thanks!


Sample Threads:
===============

1.updateExecutor-2-thread-25746-processing-http:////
172.10.2.19:8983//solr//profileviews x:profileviews r:core_node2
n:172.10.2.18:8983_solr s:shard1 c:profileviews", "state":"TIMED_WAITING",
"lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@297be1d5",
"cpuTime":"162.4371ms", "userTime":"120.0000ms",
"stackTrace":["sun.misc.Unsafe.park(Native Method)",
"java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)",
"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)",

2. ERROR true HttpSolrCall
null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: Error from server at
172.10.2.18:8983/solr/profileviews: Server Error request:
http://172.10.2.18:8983/solr/profileviews/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F172.10.2.19%3A8983%2Fsolr%2Fprofileviews%2F&wt=javabin&version=2
Remote error message: empty String

3. So Many Threads like:
"name":"qtp959447386-21",
        "state":"TIMED_WAITING",

"lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6a1a2bf4
",
        "cpuTime":"4522.0837ms",
        "userTime":"3770.0000ms",
        "stackTrace":["sun.misc.Unsafe.park(Native Method)",

"java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)",

"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)",

"org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)",

"org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:563)",

"org.eclipse.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48)",

"org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)",
          "java.lang.Thread.run(Thread.java:748)"

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

Posted by raji <ra...@gmail.com>.

Hi,
Is there any solution found for this issue. We are using Solr 7.6 and
sometimes we do see lot of QTP threads  with the stack trace

sun.misc.Unsafe.park(Native method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:653)
org.eclipse.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48)
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:717)
java.lang.Thread.run(Thread.java:748)

Thanks,
Raji



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

Posted by Doss <it...@gmail.com>.

Hi,

Is there any network parameter that we need to fine tune? Is there any
specific tweaking needed to deploy solr in virtual machines? We use VMware. 

Thanks.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

Posted by Doss <it...@gmail.com>.

Hi Emir,

Just realised DBQ = Delete by Query,  we are not using that, we are deleting
documents using the document id / unique id.

Thanks,
Mohandoss.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

Posted by Emir Arnautović <em...@sematext.com>.

Hi Mohandoss,
I would check to see if thread increase is correlated to DBQ since it does not play well with concurrent indexing: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html <http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html>

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Apr 2018, at 10:59, Doss <it...@gmail.com> wrote:
> 
> Hi Emir,
> 
> We do fire delete queries but that is very very minimal.
> 
> Thanks!
> Mohandoss
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

Posted by Doss <it...@gmail.com>.

Hi Emir,

We do fire delete queries but that is very very minimal.

Thanks!
Mohandoss



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

Posted by Emir Arnautović <em...@sematext.com>.

Hi,
I’ve seen similar jump in thread number when DBQ was used. Do you delete documents while indexing?

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Apr 2018, at 07:56, Doss <it...@gmail.com> wrote:
> 
> @wunder
> 
> Are you sending updates in batches? Are you doing a commit after every
> update? 
> 
>>> We want the system to be near real time, so we are not doing updates in
>>> batches and also we are not doing commit after every update.
>>> autoSoftCommit once in every minute, and autoCommit once in every  10
>>> minutes.
> 
> This thread increase is not happening in all the time, on our peak hours
> where used we to get more user interactions the system works absolutely
> fine, suddenly this problem creeps up and system gets into trouble.
> 
> nproc value increased 18000. 
> 
> Did jetty related linux fine tuning  as described in the below link
> 
> http://www.eclipse.org/jetty/documentation/current/high-load.html
> 
> Thanks.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

Posted by Doss <it...@gmail.com>.

@wunder

Are you sending updates in batches? Are you doing a commit after every
update? 

>> We want the system to be near real time, so we are not doing updates in
>> batches and also we are not doing commit after every update.
>> autoSoftCommit once in every minute, and autoCommit once in every  10
>> minutes.

This thread increase is not happening in all the time, on our peak hours
where used we to get more user interactions the system works absolutely
fine, suddenly this problem creeps up and system gets into trouble.

nproc value increased 18000. 

Did jetty related linux fine tuning  as described in the below link

http://www.eclipse.org/jetty/documentation/current/high-load.html

Thanks.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

Posted by Walter Underwood <wu...@wunderwood.org>.

Are you sending updates in batches? Are you doing a commit after every update?

You should use batches and you should not commit after every update.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 4, 2018, at 8:01 PM, 苗海泉 <ms...@gmail.com> wrote:
> 
> A lot of collection time, we also found that there are a lot of time_wait
> thread, mainly committed submit thread and search thread, which led to the
> rapid decline in the speed of solr, the number of these threads up to more
> than 2,000.
> I didn't have a solution to this problem, but I found that reloading would
> reduce the number of commit threads.
> 
> 
> 
> ‌
> <https://mailtrack.io/> Sent with Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality&>
> 
> 2018-04-04 13:46 GMT+08:00 Doss <it...@gmail.com>:
> 
>> We have SOLR(7.0.1) cloud 3 VM Linux instances wit 4 CPU, 90 GB RAM with
>> zookeeper (3.4.11) ensemble running on the same machines. We have 130 cores
>> of overall size of 45GB. No Sharding, almost all VMs has the same copy of
>> data. These nodes are under LB.
>> 
>> Index Config:
>> =============
>> 
>> <ramBufferSizeMB>300</ramBufferSizeMB>
>> <mergePolicyFactory class="org.apache.solr.index.
>> TieredMergePolicyFactory">
>>       <int name="maxMergeAtOnce">30</int>
>>       <int name="maxMergeAtOnceExplicit">100</int>
>>       <double name="segmentsPerTier">30.0</double>
>> </mergePolicyFactory>
>> 
>> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
>>       <int name="maxMergeCount">18</int>
>>       <int name="maxThreadCount">6</int>
>> </mergeScheduler>
>> 
>> Commit Configs:
>> ===============
>> <autoCommit>
>>       <maxTime>${solr.autoCommit.maxTime:600000}</maxTime>
>>       <openSearcher>false</openSearcher>
>> </autoCommit>
>> 
>> <autoSoftCommit>
>>       <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
>> </autoSoftCommit>
>> 
>> 
>> We do 3500 Insert / Updates per second spread across all 130 cores, We yet
>> to start using selects effectively.
>> 
>> The problem what we are facing is at times suddenly the thread count
>> increase heavily which results SOLR non responsive or throwing 503 response
>> for client (PHP HTTP CURL) requests.
>> 
>> Today 04-04-2018 the thread dump shows that the peak went upto 13000+
>> 
>> Please hlep me in fixing this issue. Thanks!
>> 
>> 
>> Sample Threads:
>> ===============
>> 
>> 1.updateExecutor-2-thread-25746-processing-http:////
>> 172.10.2.19:8983//solr//profileviews x:profileviews r:core_node2
>> n:172.10.2.18:8983_solr s:shard1 c:profileviews", "state":"TIMED_WAITING",
>> "lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$
>> ConditionObject@297be1d5",
>> "cpuTime":"162.4371ms", "userTime":"120.0000ms",
>> "stackTrace":["sun.misc.Unsafe.park(Native Method)",
>> "java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)",
>> "java.util.concurrent.locks.AbstractQueuedSynchronizer$
>> ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)",
>> 
>> 2. ERROR true HttpSolrCall
>> null:org.apache.solr.update.processor.DistributedUpdateProcessor$
>> DistributedUpdatesAsyncException:
>> Async exception during distributed update: Error from server at
>> 172.10.2.18:8983/solr/profileviews: Server Error request:
>> http://172.10.2.18:8983/solr/profileviews/update?update.
>> distrib=TOLEADER&distrib.from=http%3A%2F%2F172.10.2.19%
>> 3A8983%2Fsolr%2Fprofileviews%2F&wt=javabin&version=2
>> Remote error message: empty String
>> 
>> 3. So Many Threads like:
>> "name":"qtp959447386-21",
>>        "state":"TIMED_WAITING",
>> 
>> "lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$
>> ConditionObject@6a1a2bf4
>> ",
>>        "cpuTime":"4522.0837ms",
>>        "userTime":"3770.0000ms",
>>        "stackTrace":["sun.misc.Unsafe.park(Native Method)",
>> 
>> "java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)",
>> 
>> "java.util.concurrent.locks.AbstractQueuedSynchronizer$
>> ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)",
>> 
>> "org.eclipse.jetty.util.BlockingArrayQueue.poll(
>> BlockingArrayQueue.java:392)",
>> 
>> "org.eclipse.jetty.util.thread.QueuedThreadPool.
>> idleJobPoll(QueuedThreadPool.java:563)",
>> 
>> "org.eclipse.jetty.util.thread.QueuedThreadPool.
>> access$800(QueuedThreadPool.java:48)",
>> 
>> "org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
>> QueuedThreadPool.java:626)",
>>          "java.lang.Thread.run(Thread.java:748)"
>> 
> 
> 
> 
> -- 
> ==============================
> 联创科技
> 知行如一
> ==============================

Re: SOLR Cloud: 1500+ threads are in TIMED_WAITING status

Posted by 苗海泉 <ms...@gmail.com>.

A lot of collection time, we also found that there are a lot of time_wait
thread, mainly committed submit thread and search thread, which led to the
rapid decline in the speed of solr, the number of these threads up to more
than 2,000.
I didn't have a solution to this problem, but I found that reloading would
reduce the number of commit threads.



‌
<https://mailtrack.io/> Sent with Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality&>

2018-04-04 13:46 GMT+08:00 Doss <it...@gmail.com>:

> We have SOLR(7.0.1) cloud 3 VM Linux instances wit 4 CPU, 90 GB RAM with
> zookeeper (3.4.11) ensemble running on the same machines. We have 130 cores
> of overall size of 45GB. No Sharding, almost all VMs has the same copy of
> data. These nodes are under LB.
>
> Index Config:
> =============
>
> <ramBufferSizeMB>300</ramBufferSizeMB>
> <mergePolicyFactory class="org.apache.solr.index.
> TieredMergePolicyFactory">
>        <int name="maxMergeAtOnce">30</int>
>        <int name="maxMergeAtOnceExplicit">100</int>
>        <double name="segmentsPerTier">30.0</double>
> </mergePolicyFactory>
>
> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
>        <int name="maxMergeCount">18</int>
>        <int name="maxThreadCount">6</int>
> </mergeScheduler>
>
> Commit Configs:
> ===============
> <autoCommit>
>        <maxTime>${solr.autoCommit.maxTime:600000}</maxTime>
>        <openSearcher>false</openSearcher>
> </autoCommit>
>
> <autoSoftCommit>
>        <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
> </autoSoftCommit>
>
>
> We do 3500 Insert / Updates per second spread across all 130 cores, We yet
> to start using selects effectively.
>
> The problem what we are facing is at times suddenly the thread count
> increase heavily which results SOLR non responsive or throwing 503 response
> for client (PHP HTTP CURL) requests.
>
> Today 04-04-2018 the thread dump shows that the peak went upto 13000+
>
> Please hlep me in fixing this issue. Thanks!
>
>
> Sample Threads:
> ===============
>
> 1.updateExecutor-2-thread-25746-processing-http:////
> 172.10.2.19:8983//solr//profileviews x:profileviews r:core_node2
> n:172.10.2.18:8983_solr s:shard1 c:profileviews", "state":"TIMED_WAITING",
> "lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject@297be1d5",
> "cpuTime":"162.4371ms", "userTime":"120.0000ms",
> "stackTrace":["sun.misc.Unsafe.park(Native Method)",
> "java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)",
> "java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)",
>
> 2. ERROR true HttpSolrCall
> null:org.apache.solr.update.processor.DistributedUpdateProcessor$
> DistributedUpdatesAsyncException:
> Async exception during distributed update: Error from server at
> 172.10.2.18:8983/solr/profileviews: Server Error request:
> http://172.10.2.18:8983/solr/profileviews/update?update.
> distrib=TOLEADER&distrib.from=http%3A%2F%2F172.10.2.19%
> 3A8983%2Fsolr%2Fprofileviews%2F&wt=javabin&version=2
> Remote error message: empty String
>
> 3. So Many Threads like:
> "name":"qtp959447386-21",
>         "state":"TIMED_WAITING",
>
> "lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject@6a1a2bf4
> ",
>         "cpuTime":"4522.0837ms",
>         "userTime":"3770.0000ms",
>         "stackTrace":["sun.misc.Unsafe.park(Native Method)",
>
> "java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)",
>
> "java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)",
>
> "org.eclipse.jetty.util.BlockingArrayQueue.poll(
> BlockingArrayQueue.java:392)",
>
> "org.eclipse.jetty.util.thread.QueuedThreadPool.
> idleJobPoll(QueuedThreadPool.java:563)",
>
> "org.eclipse.jetty.util.thread.QueuedThreadPool.
> access$800(QueuedThreadPool.java:48)",
>
> "org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:626)",
>           "java.lang.Thread.run(Thread.java:748)"
>



-- 
==============================
联创科技
知行如一
==============================