You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/02/01 19:19:42 UTC

[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2285: SOLR-14928: introduce distributed cluster state updates

muse-dev[bot] commented on a change in pull request #2285:
URL: https://github.com/apache/lucene-solr/pull/2285#discussion_r568077584



##########
File path: solr/core/src/java/org/apache/solr/cloud/Overseer.java
##########
@@ -1058,10 +1056,29 @@ public ZkStateReader getZkStateReader() {
   }
 
   public void offerStateUpdate(byte[] data) throws KeeperException, InterruptedException {
+    // When cluster state change is distributed, the Overseer cluster state update queue should only ever receive only QUIT messages.
+    // These go to sendQuitToOverseer for execution path clarity.
+    if (distributedClusterChangeUpdater.isDistributedStateChange()) {
+      final ZkNodeProps message = ZkNodeProps.load(data);

Review comment:
       *THREAD_SAFETY_VIOLATION:*  Unprotected write. Non-private method `Overseer.offerStateUpdate(...)` indirectly writes to field `noggit.JSONParser.devNull.buf` outside of synchronization.
    Reporting because another access to the same memory occurs on a background thread, although this access may not.

##########
File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
##########
@@ -463,7 +468,11 @@ public boolean isClosed() {
 
     init();

Review comment:
       *THREAD_SAFETY_VIOLATION:*  Read/Write race. Non-private method `ZkController(...)` indirectly reads with synchronization from `noggit.JSONParser.devNull.buf`. Potentially races with unsynchronized write in method `ZkController.preClose()`.
    Reporting because another access to the same memory occurs on a background thread, although this access may not.

##########
File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
##########
@@ -2660,17 +2679,26 @@ public boolean checkIfCoreNodeNameAlreadyExists(CoreDescriptor dcore) {
    */
   public void publishNodeAsDown(String nodeName) {
     log.info("Publish node={} as DOWN", nodeName);
-    ZkNodeProps m = new ZkNodeProps(Overseer.QUEUE_OPERATION, OverseerAction.DOWNNODE.toLower(),
-        ZkStateReader.NODE_NAME_PROP, nodeName);
-    try {
-      overseer.getStateUpdateQueue().offer(Utils.toJSON(m));
-    } catch (AlreadyClosedException e) {
-      log.info("Not publishing node as DOWN because a resource required to do so is already closed.");
-    } catch (InterruptedException e) {
-      Thread.currentThread().interrupt();
-      log.debug("Publish node as down was interrupted.");
-    } catch (KeeperException e) {
-      log.warn("Could not publish node as down: ", e);
+    if (distributedClusterChangeUpdater.isDistributedStateChange()) {
+      // Note that with the current implementation, when distributed cluster state updates are enabled, we mark the node
+      // down synchronously from this thread, whereas the Overseer cluster state update frees this thread right away and
+      // the Overseer will async mark the node down but updating all affected collections.
+      // If this is an issue (i.e. takes too long), then the call below should be executed from another thread so that
+      // the calling thread can immediately return.
+      distributedClusterChangeUpdater.executeNodeDownStateChange(nodeName, zkStateReader);

Review comment:
       *THREAD_SAFETY_VIOLATION:*  Unprotected write. Non-private method `ZkController.publishNodeAsDown(...)` indirectly writes to field `noggit.JSONParser.devNull.buf` outside of synchronization.
    Reporting because another access to the same memory occurs on a background thread, although this access may not.

##########
File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
##########
@@ -1662,22 +1680,20 @@ public void unregister(String coreName, CoreDescriptor cd, boolean removeCoreFro
     }
     CloudDescriptor cloudDescriptor = cd.getCloudDescriptor();
     if (removeCoreFromZk) {
-      ZkNodeProps m = new ZkNodeProps(Overseer.QUEUE_OPERATION,
-          OverseerAction.DELETECORE.toLower(), ZkStateReader.CORE_NAME_PROP, coreName,
+      ZkNodeProps m = new ZkNodeProps(Overseer.QUEUE_OPERATION, OverseerAction.DELETECORE.toLower(),
+          ZkStateReader.CORE_NAME_PROP, coreName,
           ZkStateReader.NODE_NAME_PROP, getNodeName(),
           ZkStateReader.COLLECTION_PROP, cloudDescriptor.getCollectionName(),
           ZkStateReader.CORE_NODE_NAME_PROP, coreNodeName);
-      overseerJobQueue.offer(Utils.toJSON(m));
+      if (distributedClusterChangeUpdater.isDistributedStateChange()) {
+        distributedClusterChangeUpdater.doSingleStateUpdate(DistributedClusterChangeUpdater.MutatingCommand.SliceRemoveReplica, m,
+            getSolrCloudManager(), zkStateReader);

Review comment:
       *THREAD_SAFETY_VIOLATION:*  Read/Write race. Non-private method `ZkController.unregister(...)` indirectly reads without synchronization from `this.cloudManager`. Potentially races with write in method `ZkController.getSolrCloudManager()`.
    Reporting because another access to the same memory occurs on a background thread, although this access may not.

##########
File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
##########
@@ -1662,22 +1680,20 @@ public void unregister(String coreName, CoreDescriptor cd, boolean removeCoreFro
     }
     CloudDescriptor cloudDescriptor = cd.getCloudDescriptor();
     if (removeCoreFromZk) {
-      ZkNodeProps m = new ZkNodeProps(Overseer.QUEUE_OPERATION,
-          OverseerAction.DELETECORE.toLower(), ZkStateReader.CORE_NAME_PROP, coreName,
+      ZkNodeProps m = new ZkNodeProps(Overseer.QUEUE_OPERATION, OverseerAction.DELETECORE.toLower(),
+          ZkStateReader.CORE_NAME_PROP, coreName,
           ZkStateReader.NODE_NAME_PROP, getNodeName(),
           ZkStateReader.COLLECTION_PROP, cloudDescriptor.getCollectionName(),
           ZkStateReader.CORE_NODE_NAME_PROP, coreNodeName);
-      overseerJobQueue.offer(Utils.toJSON(m));
+      if (distributedClusterChangeUpdater.isDistributedStateChange()) {
+        distributedClusterChangeUpdater.doSingleStateUpdate(DistributedClusterChangeUpdater.MutatingCommand.SliceRemoveReplica, m,

Review comment:
       *THREAD_SAFETY_VIOLATION:*  Unprotected write. Non-private method `ZkController.unregister(...)` indirectly writes to field `noggit.JSONParser.devNull.buf` outside of synchronization.
    Reporting because another access to the same memory occurs on a background thread, although this access may not.

##########
File path: solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java
##########
@@ -1446,9 +1446,9 @@ public void refreshAndWatch() {
     }
   }
 
-  public static DocCollection getCollectionLive(ZkStateReader zkStateReader, String coll) {
+  public DocCollection getCollectionLive(String coll) {
     try {
-      return zkStateReader.fetchCollectionState(coll, null);
+      return fetchCollectionState(coll, null);

Review comment:
       *THREAD_SAFETY_VIOLATION:*  Unprotected write. Non-private method `ZkStateReader.getCollectionLive(...)` indirectly writes to field `noggit.JSONParser.devNull.buf` outside of synchronization.
    Reporting because another access to the same memory occurs on a background thread, although this access may not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org