You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Patson Luk (Jira)" <ji...@apache.org> on 2022/06/17 01:55:00 UTC

[jira] [Updated] (SOLR-16257) Possible Race condition in ZkStateReader between fields watchedCollectionStates and collectionWatches

     [ https://issues.apache.org/jira/browse/SOLR-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patson Luk updated SOLR-16257:
------------------------------
    Description: 
We have observed certain weird behavior with our Solr that the cluster state for certain collections seem to be "stuck" in certain stale states. While inspecting the current code logic, it's found that certain race condition can arise and lead to inconsistent states of `collectionWatches` and `watchedCollectionStates`

 

By looking at the current design, it appears that those 2 fields:
 * {{collectionWatches -}} for "watching" ZK updates on collections (via notification). Such map has the collection name (ie org) as the key and value is {{CollectionWatch}} which holds another set of {{{}DocCollectionWatcher{}}}s in this case (ie {{{}ConcurrentHashMap<String, CollectionWatch<DocCollectionWatcher>> collectionWatches{}}})
 * {{watchedCollectionStates}} - as {{{}ConcurrentHashMap<String, DocCollection>{}}}, which key is collection name and value is the {{DocCollection}} value triggered by previous watch event handled by {{ZkStateReader$StateWatcher}} (either by the fetch on first watcher registration or by notification from ZK on state changes)

On the design level, such 2 fields should be "in-sync" , ie if a collection is no longer tracked in {{collectionWatches}} then it should not have any entry in {{watchedCollectionStates}} neither.

 

However such guarantee is not a strong one as we can see there are various code pieces that try to remove entries from {{watchedCollectionStates}} if somehow the collection is no longer in {{collectionWatches}} for example in [here|https://github.com/fullstorydev/lucene-solr/blob/b8a2e1574d71094fcf81e6d693634b9ced8f4956/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L2137] , in particular this appears to address certain race condition with {{{}unregisterCore{}}}. The code that removes registered DocCollectionWatcher also tries to ensure both maps are in sync as in [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1916]

*The core of the issue appears to be that there's an assumption when the last {{DocCollectionWatcher}} is removed from the {{{}CollectionWatch{}}}, both the {{watchedCollectionStates}} and {{collectionWatches}} should be purged of the watched collection.* Hence the {{clusterState}} should have the {{LazyCollectionRef}} instead, which {{DocCollection get(boolean allowCached)}} invocation should return the correct {{DocCollection}} state on {{{}allowCached=false{}}}. Unfortunately, there could still be possible race condition as far as there are 2 separate maps. One possible race condition is demonstrated in later section

h3. PR proposal
 An idea is to Merge {{DocCollectionWater}} into  {{CollectionWatch}} so we would not run into race condition that the collection key appears in one but not the other. Please see the PR here https://github.com/apache/solr/pull/909 and would love to gather some feedbacks and thoughts!


 
h3. Steps to reproduce a race condition
Spin up 2 nodes, only one node should serve the test collection, the other node should be made the overseer/leader of the cluster. Debug on the overseer node

 # From the IDE, ensure all breakpoints suspend "thread" only, not "all" (that's for intelliJ)
 # All breakpoints are in {{ZkStateReader}}. Set breakpoint at line [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1965], put condition {{Thread.currentThread().getName().startsWith("zkCallback")}} . This pauses when zk notification comes in and right before it checks whether the collection is already in {{watchedCollectionStates}} in method {{updateWatchedCollection}}
 # Set breakpoint at [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1829], which throws {{TimeoutException}} after the latch timeout, for example from call of  (inside {{{}ZkStateWatcher#waitForState{}}})
 # now start debugging the overseer node
 # Issue a split shard request to the overseer. For example http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=test&shard=shard1
 # Eventually a thread will stop the first breakpoint within {{updateWatchedCollection}}, the thread name should be something like {{zkCallback-127-thread-2}}
 # Might have to wait for 320 secs (timeout on {{CollectionHandlingUtils.waitForNewShard}}), another thread should stop at 2nd breakpoint throwing {{TimeoutException}}, the thread name should be something similar to {{OverseerThreadFactory...}}
 # Add breakpoint at {{removeDocCollectionWatcher}}[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1925], that removes the entry from {{watchedCollectionStates}} but not yet purge the {{collectionWatches}} 
 # Resume this {{OverseerThreadFactory}} thread that's going to throw {{TimeoutException}}, it should stop the new breakpoint in {{removeDocCollectionWatcher}}.
 # Switch back to thread {{zkCallback...}}, resume that thread, it should find that the {{watchedCollectionStates}} is empty, hence trying to insert a {{CollectionRef}} into it
 # Switch back to {{OverseerThreadFactory}} thread, step a few steps, it will purge the {{collectionWatches}} but the {{watchedCollectionStates}} will still have one entry in it.
 # If we inspect the {{zkStateReader.clusterState}} at this point, we will notice that the collection will have a non lazy CollectionRef for the test collection, while the {{collectionWatches}} is empty but {{watchedCollectionStates}} would still have the collection state in there

 

  was:
We have observed certain weird behavior with our Solr that the cluster state for certain collections seem to be "stuck" in certain stale states. While inspecting the current code logic, it's found that certain race condition can arise and lead to inconsistent states of `collectionWatches` and `watchedCollectionStates`

 

By looking at the current design, it appears that those 2 fields:
 * {{collectionWatches -}} for "watching" ZK updates on collections (via notification). Such map has the collection name (ie org) as the key and value is {{CollectionWatch}} which holds another set of {{{}DocCollectionWatcher{}}}s in this case (ie {{{}ConcurrentHashMap<String, CollectionWatch<DocCollectionWatcher>> collectionWatches{}}})
 * {{watchedCollectionStates}} - as {{{}ConcurrentHashMap<String, DocCollection>{}}}, which key is collection name and value is the {{DocCollection}} value triggered by previous watch event handled by {{ZkStateReader$StateWatcher}} (either by the fetch on first watcher registration or by notification from ZK on state changes)

On the design level, such 2 fields should be "in-sync" , ie if a collection is no longer tracked in {{collectionWatches}} then it should not have any entry in {{watchedCollectionStates}} neither.

 

However such guarantee is not a strong one as we can see there are various code pieces that try to remove entries from {{watchedCollectionStates}} if somehow the collection is no longer in {{collectionWatches}} for example in [here|https://github.com/fullstorydev/lucene-solr/blob/b8a2e1574d71094fcf81e6d693634b9ced8f4956/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L2137] , in particular this appears to address certain race condition with {{{}unregisterCore{}}}. The code that removes registered DocCollectionWatcher also tries to ensure both maps are in sync as in [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1916]

*The core of the issue appears to be that there's an assumption when the last {{DocCollectionWatcher}} is removed from the {{{}CollectionWatch{}}}, both the {{watchedCollectionStates}} and {{collectionWatches}} should be purged of the watched collection.* Hence the {{clusterState}} should have the {{LazyCollectionRef}} instead, which {{DocCollection get(boolean allowCached)}} invocation should return the correct {{DocCollection}} state on {{{}allowCached=false{}}}. Unfortunately, there could still be possible race condition as far as there are 2 separate maps. One possible race condition is demonstrated in later section

h3. PR proposal
 An idea is to Merge {{DocCollectionWater}} into  {{CollectionWatch}} so we would not run into race condition that the collection key appears in one but not the other. Please see the PR here (link to be updated) and would love to gather some feedbacks and thoughts!


 
h3. Steps to reproduce a race condition
Spin up 2 nodes, only one node should serve the test collection, the other node should be made the overseer/leader of the cluster. Debug on the overseer node

 # From the IDE, ensure all breakpoints suspend "thread" only, not "all" (that's for intelliJ)
 # All breakpoints are in {{ZkStateReader}}. Set breakpoint at line [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1965], put condition {{Thread.currentThread().getName().startsWith("zkCallback")}} . This pauses when zk notification comes in and right before it checks whether the collection is already in {{watchedCollectionStates}} in method {{updateWatchedCollection}}
 # Set breakpoint at [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1829], which throws {{TimeoutException}} after the latch timeout, for example from call of  (inside {{{}ZkStateWatcher#waitForState{}}})
 # now start debugging the overseer node
 # Issue a split shard request to the overseer. For example http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=test&shard=shard1
 # Eventually a thread will stop the first breakpoint within {{updateWatchedCollection}}, the thread name should be something like {{zkCallback-127-thread-2}}
 # Might have to wait for 320 secs (timeout on {{CollectionHandlingUtils.waitForNewShard}}), another thread should stop at 2nd breakpoint throwing {{TimeoutException}}, the thread name should be something similar to {{OverseerThreadFactory...}}
 # Add breakpoint at {{removeDocCollectionWatcher}}[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1925], that removes the entry from {{watchedCollectionStates}} but not yet purge the {{collectionWatches}} 
 # Resume this {{OverseerThreadFactory}} thread that's going to throw {{TimeoutException}}, it should stop the new breakpoint in {{removeDocCollectionWatcher}}.
 # Switch back to thread {{zkCallback...}}, resume that thread, it should find that the {{watchedCollectionStates}} is empty, hence trying to insert a {{CollectionRef}} into it
 # Switch back to {{OverseerThreadFactory}} thread, step a few steps, it will purge the {{collectionWatches}} but the {{watchedCollectionStates}} will still have one entry in it.
 # If we inspect the {{zkStateReader.clusterState}} at this point, we will notice that the collection will have a non lazy CollectionRef for the test collection, while the {{collectionWatches}} is empty but {{watchedCollectionStates}} would still have the collection state in there

 


> Possible Race condition in ZkStateReader between fields watchedCollectionStates and collectionWatches
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-16257
>                 URL: https://issues.apache.org/jira/browse/SOLR-16257
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 8.8, main (10.0)
>            Reporter: Patson Luk
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have observed certain weird behavior with our Solr that the cluster state for certain collections seem to be "stuck" in certain stale states. While inspecting the current code logic, it's found that certain race condition can arise and lead to inconsistent states of `collectionWatches` and `watchedCollectionStates`
>  
> By looking at the current design, it appears that those 2 fields:
>  * {{collectionWatches -}} for "watching" ZK updates on collections (via notification). Such map has the collection name (ie org) as the key and value is {{CollectionWatch}} which holds another set of {{{}DocCollectionWatcher{}}}s in this case (ie {{{}ConcurrentHashMap<String, CollectionWatch<DocCollectionWatcher>> collectionWatches{}}})
>  * {{watchedCollectionStates}} - as {{{}ConcurrentHashMap<String, DocCollection>{}}}, which key is collection name and value is the {{DocCollection}} value triggered by previous watch event handled by {{ZkStateReader$StateWatcher}} (either by the fetch on first watcher registration or by notification from ZK on state changes)
> On the design level, such 2 fields should be "in-sync" , ie if a collection is no longer tracked in {{collectionWatches}} then it should not have any entry in {{watchedCollectionStates}} neither.
>  
> However such guarantee is not a strong one as we can see there are various code pieces that try to remove entries from {{watchedCollectionStates}} if somehow the collection is no longer in {{collectionWatches}} for example in [here|https://github.com/fullstorydev/lucene-solr/blob/b8a2e1574d71094fcf81e6d693634b9ced8f4956/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L2137] , in particular this appears to address certain race condition with {{{}unregisterCore{}}}. The code that removes registered DocCollectionWatcher also tries to ensure both maps are in sync as in [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1916]
> *The core of the issue appears to be that there's an assumption when the last {{DocCollectionWatcher}} is removed from the {{{}CollectionWatch{}}}, both the {{watchedCollectionStates}} and {{collectionWatches}} should be purged of the watched collection.* Hence the {{clusterState}} should have the {{LazyCollectionRef}} instead, which {{DocCollection get(boolean allowCached)}} invocation should return the correct {{DocCollection}} state on {{{}allowCached=false{}}}. Unfortunately, there could still be possible race condition as far as there are 2 separate maps. One possible race condition is demonstrated in later section
> h3. PR proposal
>  An idea is to Merge {{DocCollectionWater}} into  {{CollectionWatch}} so we would not run into race condition that the collection key appears in one but not the other. Please see the PR here https://github.com/apache/solr/pull/909 and would love to gather some feedbacks and thoughts!
>  
> h3. Steps to reproduce a race condition
> Spin up 2 nodes, only one node should serve the test collection, the other node should be made the overseer/leader of the cluster. Debug on the overseer node
>  # From the IDE, ensure all breakpoints suspend "thread" only, not "all" (that's for intelliJ)
>  # All breakpoints are in {{ZkStateReader}}. Set breakpoint at line [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1965], put condition {{Thread.currentThread().getName().startsWith("zkCallback")}} . This pauses when zk notification comes in and right before it checks whether the collection is already in {{watchedCollectionStates}} in method {{updateWatchedCollection}}
>  # Set breakpoint at [here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1829], which throws {{TimeoutException}} after the latch timeout, for example from call of  (inside {{{}ZkStateWatcher#waitForState{}}})
>  # now start debugging the overseer node
>  # Issue a split shard request to the overseer. For example http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=test&shard=shard1
>  # Eventually a thread will stop the first breakpoint within {{updateWatchedCollection}}, the thread name should be something like {{zkCallback-127-thread-2}}
>  # Might have to wait for 320 secs (timeout on {{CollectionHandlingUtils.waitForNewShard}}), another thread should stop at 2nd breakpoint throwing {{TimeoutException}}, the thread name should be something similar to {{OverseerThreadFactory...}}
>  # Add breakpoint at {{removeDocCollectionWatcher}}[here|https://github.com/apache/solr/blob/56842452026fb5f61fbe439a2f86adc08297c69c/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1925], that removes the entry from {{watchedCollectionStates}} but not yet purge the {{collectionWatches}} 
>  # Resume this {{OverseerThreadFactory}} thread that's going to throw {{TimeoutException}}, it should stop the new breakpoint in {{removeDocCollectionWatcher}}.
>  # Switch back to thread {{zkCallback...}}, resume that thread, it should find that the {{watchedCollectionStates}} is empty, hence trying to insert a {{CollectionRef}} into it
>  # Switch back to {{OverseerThreadFactory}} thread, step a few steps, it will purge the {{collectionWatches}} but the {{watchedCollectionStates}} will still have one entry in it.
>  # If we inspect the {{zkStateReader.clusterState}} at this point, we will notice that the collection will have a non lazy CollectionRef for the test collection, while the {{collectionWatches}} is empty but {{watchedCollectionStates}} would still have the collection state in there
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org