You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Patson Luk (Jira)" <ji...@apache.org> on 2022/10/18 17:55:00 UTC

[jira] [Commented] (SOLR-16454) Fixed race condition that trigger error on SizeLimitedDistributedMap …

    [ https://issues.apache.org/jira/browse/SOLR-16454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619723#comment-17619723 ] 

Patson Luk commented on SOLR-16454:
-----------------------------------

The details can be found in [https://github.com/fullstorydev/lucene-solr/pull/142] :)

 

Quick summary:

The {{org.apache.zookeeper.KeeperException$NoNodeException}} is triggered sometimes from {{completedMap}} field of type {{SizeLimitedDistributedMap}} in {{{}OverseerTaskProcessor{}}}, while performing clean up in [here|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/cloud/SizeLimitedDistributedMap.java#L91-L98] 

 

The reason - multiple threads can enter the same code block and try to delete the same list of children which the slower threads can delete on child node that no longer exists.

 

The proposed solution is to be a bit more forgiving with such exception with a catch block such as

{{} catch (KeeperException.NoNodeException e) {}}
{{//this could happen if multiple threads try to clean the same map}}
{{}}}

> Fixed race condition that trigger error on SizeLimitedDistributedMap …
> ----------------------------------------------------------------------
>
>                 Key: SOLR-16454
>                 URL: https://issues.apache.org/jira/browse/SOLR-16454
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 9.1
>            Reporter: Hitesh Khamesra
>            Priority: Major
>
> here [https://github.com/apache/solr/blob/19f109842fb34069346a9efb21cf01b6706830a8/solr/core/src/java/org/apache/solr/cloud/SizeLimitedDistributedMap.java#L94]
>  
> We should catch zk exception, as it can lead wired race condirions. 
>  
> [~patson] Can you please add the details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org