You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by Rekha Sekhar <re...@gmail.com> on 2021/04/12 15:49:19 UTC

Urgent help needed on Solr cloud(cont)

Hi,

Thank you for the information. We will try out the below suggestions.

I have a few more questions, which are facing in the application.

The application has *2 solr (v 8.4.1)* and *3 Zookeeper (v3.6.2) *running
in SolrCloud mode.
After running it for few days we could see below error in logs -
2021-04-12 11:04:26.850 ERROR (qtp1632497828-1072) [c:datacore s:shard1
r:core_node4 x:datacore_shard1_replica_n2] o.a.s.u.SolrCmdDistributor
java.io.IOException: *Request processing has stalled for 90083ms with 100
remaining elements in the queue*.
and
2021-04-12 09:00:36.350 ERROR (qtp1632497828-786) [c:datacore s:shard1
r:core_node4 x:datacore_shard1_replica_n2] o.a.s.s.HttpSolrCall
null:java.io.IOException: *Task queue processing has stalled for 90175 ms
with 92 remaining elements to process.*


1. What do these error messages mean? How can we resolve this?
2. After getting these messages,  the 2 Solr nodes show different document
count and delete count. It seems the 2 Solr nodes are not sync(screenshot
attached for reference).
3. One of the nodes (not leader) goes to a recovering state forever.
4.In the solr update requests of 1 Lakh records, there are few thousands of
delete query as well. Do  the delete query introduce more slowness in
synching the nodes

The above messages are coming frequently from both the Solr nodes and
finally one node goes to a recovering state forever.

 Could you please help by answering the above queries.

Thanks,
Rekha

Re: Urgent help needed on Solr cloud(cont)

Posted by SayantiGmail <sa...@gmail.com>.

Hi

This seems to be a bug in Solr 8.4 .Will this get resolved in higher versions or we need to update the stall time configuration as a workaround.

> On 12 Apr 2021, at 22:41, Carlos .Sponchiado <cs...@gmail.com> wrote:
> 
> 1. I found a similar issue in this version of Solr here
> https://github.com/clarin-eric/VLO/issues/291 , They suggest using
> solr.cloud.client.stallTime
> to mitigate it. But I think fixing the commit issue will solve this problem
> too.
> 4. The delete query was supposed to only mark the document inside each
> segment as deleted.
> 
> Do you know if the throughput of updating documents increased a lot? In the
> stats of SolrAdmin is possible to see how frequently commits are happening.
> If it is a lot, this fix of avoid send commit command and have hardCommit
> and SoftCommit configured can help you. Let's wait for other suggestions
> here too.
> 
>> Em seg., 12 de abr. de 2021 às 18:51, Rekha Sekhar <re...@gmail.com>
>> escreveu:
>> 
>> Hi,
>> 
>> Thank you for the information. We will try out the below suggestions.
>> 
>> I have a few more questions, which are facing in the application.
>> 
>> The application has *2 solr (v 8.4.1)* and *3 Zookeeper (v3.6.2) *running
>> in SolrCloud mode.
>> After running it for few days we could see below error in logs -
>> 2021-04-12 11:04:26.850 ERROR (qtp1632497828-1072) [c:datacore s:shard1
>> r:core_node4 x:datacore_shard1_replica_n2] o.a.s.u.SolrCmdDistributor
>> java.io.IOException: *Request processing has stalled for 90083ms with 100
>> remaining elements in the queue*.
>> and
>> 2021-04-12 09:00:36.350 ERROR (qtp1632497828-786) [c:datacore s:shard1
>> r:core_node4 x:datacore_shard1_replica_n2] o.a.s.s.HttpSolrCall
>> null:java.io.IOException: *Task queue processing has stalled for 90175 ms
>> with 92 remaining elements to process.*
>> 
>> 
>> 1. What do these error messages mean? How can we resolve this?
>> 2. After getting these messages,  the 2 Solr nodes show different document
>> count and delete count. It seems the 2 Solr nodes are not sync(screenshot
>> attached for reference).
>> 3. One of the nodes (not leader) goes to a recovering state forever.
>> 4.In the solr update requests of 1 Lakh records, there are few thousands of
>> delete query as well. Do  the delete query introduce more slowness in
>> synching the nodes
>> 
>> The above messages are coming frequently from both the Solr nodes and
>> finally one node goes to a recovering state forever.
>> 
>> Could you please help by answering the above queries.
>> 
>> Thanks,
>> Rekha
>> 
> 
> 
> -- 
> Abraços
> Carlos Sponchiado

Re: Urgent help needed on Solr cloud(cont)

Posted by "Carlos .Sponchiado" <cs...@gmail.com>.

1. I found a similar issue in this version of Solr here
https://github.com/clarin-eric/VLO/issues/291 , They suggest using
solr.cloud.client.stallTime
to mitigate it. But I think fixing the commit issue will solve this problem
too.
4. The delete query was supposed to only mark the document inside each
segment as deleted.

Do you know if the throughput of updating documents increased a lot? In the
stats of SolrAdmin is possible to see how frequently commits are happening.
If it is a lot, this fix of avoid send commit command and have hardCommit
and SoftCommit configured can help you. Let's wait for other suggestions
here too.

Em seg., 12 de abr. de 2021 às 18:51, Rekha Sekhar <re...@gmail.com>
escreveu:

> Hi,
>
> Thank you for the information. We will try out the below suggestions.
>
> I have a few more questions, which are facing in the application.
>
> The application has *2 solr (v 8.4.1)* and *3 Zookeeper (v3.6.2) *running
> in SolrCloud mode.
> After running it for few days we could see below error in logs -
> 2021-04-12 11:04:26.850 ERROR (qtp1632497828-1072) [c:datacore s:shard1
> r:core_node4 x:datacore_shard1_replica_n2] o.a.s.u.SolrCmdDistributor
> java.io.IOException: *Request processing has stalled for 90083ms with 100
> remaining elements in the queue*.
> and
> 2021-04-12 09:00:36.350 ERROR (qtp1632497828-786) [c:datacore s:shard1
> r:core_node4 x:datacore_shard1_replica_n2] o.a.s.s.HttpSolrCall
> null:java.io.IOException: *Task queue processing has stalled for 90175 ms
> with 92 remaining elements to process.*
>
>
> 1. What do these error messages mean? How can we resolve this?
> 2. After getting these messages,  the 2 Solr nodes show different document
> count and delete count. It seems the 2 Solr nodes are not sync(screenshot
> attached for reference).
> 3. One of the nodes (not leader) goes to a recovering state forever.
> 4.In the solr update requests of 1 Lakh records, there are few thousands of
> delete query as well. Do  the delete query introduce more slowness in
> synching the nodes
>
> The above messages are coming frequently from both the Solr nodes and
> finally one node goes to a recovering state forever.
>
>  Could you please help by answering the above queries.
>
> Thanks,
> Rekha
>


-- 
Abraços
Carlos Sponchiado