You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ignite.apache.org by "vldpyatkov (via GitHub)" <gi...@apache.org> on 2023/06/02 17:48:38 UTC

[GitHub] [ignite-3] vldpyatkov opened a new pull request, #2142: IGNITE-19494 Correctly stop a replica

vldpyatkov opened a new pull request, #2142:
URL: https://github.com/apache/ignite-3/pull/2142

   https://issues.apache.org/jira/browse/IGNITE-19494


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] vldpyatkov commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "vldpyatkov (via GitHub)" <gi...@apache.org>.
vldpyatkov commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1222907486


##########
modules/raft/src/main/java/org/apache/ignite/raft/jraft/disruptor/StripedDisruptor.java:
##########
@@ -214,7 +214,7 @@ void unsubscribe(NodeId nodeId) {
             assert handler != null : format("Group of the event is unsupported [nodeId={}, event={}]", event.nodeId(), event);
 
             //TODO: IGNITE-15568 endOfBatch should be set to true to prevent caching tasks until IGNITE-15568 has fixed.
-            handler.onEvent(event, sequence, true);
+            handler.onEvent(event, sequence, subscribers.size() > 1 ? true : endOfBatch);

Review Comment:
   We don't have a test that will be able to demonstrate that. I assume, that update is not doing worse at least.
   Theoretically, RAFT log may fsync less often in the circumstance when several RAFT commands are executing simultaniously.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] vldpyatkov commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "vldpyatkov (via GitHub)" <gi...@apache.org>.
vldpyatkov commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1222963905


##########
modules/raft/src/main/java/org/apache/ignite/raft/jraft/disruptor/StripedDisruptor.java:
##########
@@ -214,7 +214,7 @@ void unsubscribe(NodeId nodeId) {
             assert handler != null : format("Group of the event is unsupported [nodeId={}, event={}]", event.nodeId(), event);
 
             //TODO: IGNITE-15568 endOfBatch should be set to true to prevent caching tasks until IGNITE-15568 has fixed.
-            handler.onEvent(event, sequence, true);
+            handler.onEvent(event, sequence, subscribers.size() > 1 ? true : endOfBatch);

Review Comment:
   We will work a little better just now. Today, the change worth nothing for us.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] alievmirza commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "alievmirza (via GitHub)" <gi...@apache.org>.
alievmirza commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1222947503


##########
modules/raft/src/main/java/org/apache/ignite/raft/jraft/disruptor/StripedDisruptor.java:
##########
@@ -214,7 +214,7 @@ void unsubscribe(NodeId nodeId) {
             assert handler != null : format("Group of the event is unsupported [nodeId={}, event={}]", event.nodeId(), event);
 
             //TODO: IGNITE-15568 endOfBatch should be set to true to prevent caching tasks until IGNITE-15568 has fixed.
-            handler.onEvent(event, sequence, true);
+            handler.onEvent(event, sequence, subscribers.size() > 1 ? true : endOfBatch);

Review Comment:
   I don't think that this change is reasonable, because we have a plan to port disruptor changes in a upcoming sprint, potentially this fix will live a few weeks 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] sanpwc commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "sanpwc (via GitHub)" <gi...@apache.org>.
sanpwc commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1221656137


##########
modules/raft/src/main/java/org/apache/ignite/raft/jraft/disruptor/StripedDisruptor.java:
##########
@@ -214,7 +214,7 @@ void unsubscribe(NodeId nodeId) {
             assert handler != null : format("Group of the event is unsupported [nodeId={}, event={}]", event.nodeId(), event);
 
             //TODO: IGNITE-15568 endOfBatch should be set to true to prevent caching tasks until IGNITE-15568 has fixed.
-            handler.onEvent(event, sequence, true);
+            handler.onEvent(event, sequence, subscribers.size() > 1 ? true : endOfBatch);

Review Comment:
   Is it an intended change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] vldpyatkov commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "vldpyatkov (via GitHub)" <gi...@apache.org>.
vldpyatkov commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1222572038


##########
modules/replicator/src/main/java/org/apache/ignite/internal/replicator/ReplicaManager.java:
##########
@@ -387,11 +399,18 @@ private boolean stopReplicaInternal(ReplicationGroupId replicaGrpId) {
         CompletableFuture<Replica> removed = replicas.remove(replicaGrpId);
 
         if (removed != null) {
-            removed.whenComplete((replica, throwable) -> {
-                if (throwable != null) {
-                    replica.shutdown();
-                }
-            });
+            if (!removed.isDone()) {

Review Comment:
   Removed this note.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] sanpwc commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "sanpwc (via GitHub)" <gi...@apache.org>.
sanpwc commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1222619421


##########
modules/raft/src/main/java/org/apache/ignite/raft/jraft/disruptor/StripedDisruptor.java:
##########
@@ -214,7 +214,7 @@ void unsubscribe(NodeId nodeId) {
             assert handler != null : format("Group of the event is unsupported [nodeId={}, event={}]", event.nodeId(), event);
 
             //TODO: IGNITE-15568 endOfBatch should be set to true to prevent caching tasks until IGNITE-15568 has fixed.
-            handler.onEvent(event, sequence, true);
+            handler.onEvent(event, sequence, subscribers.size() > 1 ? true : endOfBatch);

Review Comment:
   Sounds reasonable, however I had to ask. How did you understand that it works better after given change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] vldpyatkov commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "vldpyatkov (via GitHub)" <gi...@apache.org>.
vldpyatkov commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1222561847


##########
modules/raft/src/main/java/org/apache/ignite/raft/jraft/disruptor/StripedDisruptor.java:
##########
@@ -214,7 +214,7 @@ void unsubscribe(NodeId nodeId) {
             assert handler != null : format("Group of the event is unsupported [nodeId={}, event={}]", event.nodeId(), event);
 
             //TODO: IGNITE-15568 endOfBatch should be set to true to prevent caching tasks until IGNITE-15568 has fixed.
-            handler.onEvent(event, sequence, true);
+            handler.onEvent(event, sequence, subscribers.size() > 1 ? true : endOfBatch);

Review Comment:
   Not intended, but I decided to leave it after commit. Because this allows to do slightly better of disruptor for Meta storage and Cluster management group and I don't want to create dedicate ticket for this small change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] sanpwc commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "sanpwc (via GitHub)" <gi...@apache.org>.
sanpwc commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1221682891


##########
modules/replicator/src/main/java/org/apache/ignite/internal/replicator/ReplicaManager.java:
##########
@@ -387,11 +399,18 @@ private boolean stopReplicaInternal(ReplicationGroupId replicaGrpId) {
         CompletableFuture<Replica> removed = replicas.remove(replicaGrpId);
 
         if (removed != null) {
-            removed.whenComplete((replica, throwable) -> {
-                if (throwable != null) {
-                    replica.shutdown();
-                }
-            });
+            if (!removed.isDone()) {
+                removed.completeExceptionally(new ReplicaStoppingException(

Review Comment:
   Is it guaranteed that `!removed.isDone()` and `removed.completeExceptionally` are atomic? In other words why we don't call replica.shutdown() in case of removed.completeExceptionally() branch?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] vldpyatkov commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "vldpyatkov (via GitHub)" <gi...@apache.org>.
vldpyatkov commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1222568013


##########
modules/replicator/src/main/java/org/apache/ignite/internal/replicator/exception/ReplicaStoppingException.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.replicator.exception;
+
+import static org.apache.ignite.lang.ErrorGroups.Replicator.REPLICA_STOPPING_ERR;
+
+import java.util.UUID;
+import org.apache.ignite.internal.replicator.ReplicationGroupId;
+import org.apache.ignite.network.ClusterNode;
+
+/**
+ * The exception is thrown for unhandled requests that wait for the replica, but it is already stopping.
+ */
+public class ReplicaStoppingException extends ReplicationException {
+    /**
+     * The constructor.
+     *
+     * @param groupId Replication group id.
+     * @param node Node.
+     */
+    public ReplicaStoppingException(ReplicationGroupId groupId, ClusterNode node) {
+        super(REPLICA_STOPPING_ERR, "Replica is stopping [replicationGroupId=" + groupId + ", nodeName=" + node.name() + ']');
+    }
+
+    /**
+     * The constructor is used for creating an exception instance that is thrown from a remote server.
+     *
+     * @param traceId Trace id.
+     * @param code Error code.
+     * @param message Error message.
+     * @param cause Cause exception.
+     */
+    public ReplicaStoppingException(UUID traceId, int code, String message, Throwable cause) {

Review Comment:
   I think, any exception should have constructor like this, even if it is not used in network just now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] sanpwc commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "sanpwc (via GitHub)" <gi...@apache.org>.
sanpwc commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1221661458


##########
modules/replicator/src/main/java/org/apache/ignite/internal/replicator/ReplicaManager.java:
##########
@@ -387,11 +399,18 @@ private boolean stopReplicaInternal(ReplicationGroupId replicaGrpId) {
         CompletableFuture<Replica> removed = replicas.remove(replicaGrpId);
 
         if (removed != null) {
-            removed.whenComplete((replica, throwable) -> {
-                if (throwable != null) {
-                    replica.shutdown();
-                }
-            });
+            if (!removed.isDone()) {

Review Comment:
   ```
       // TODO: IGNITE-19494 We need to correctly stop the replica
       private boolean stopReplicaInternal(ReplicationGroupId replicaGrpId) 
   ```
   todo still present



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] vldpyatkov commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "vldpyatkov (via GitHub)" <gi...@apache.org>.
vldpyatkov commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1222566102


##########
modules/replicator/src/main/java/org/apache/ignite/internal/replicator/ReplicaManager.java:
##########
@@ -387,11 +399,18 @@ private boolean stopReplicaInternal(ReplicationGroupId replicaGrpId) {
         CompletableFuture<Replica> removed = replicas.remove(replicaGrpId);
 
         if (removed != null) {
-            removed.whenComplete((replica, throwable) -> {
-                if (throwable != null) {
-                    replica.shutdown();
-                }
-            });
+            if (!removed.isDone()) {
+                removed.completeExceptionally(new ReplicaStoppingException(

Review Comment:
   We have the grantee, because the replica future is already deleted from the replicas, but successful complete possible only in a compute closure for the map.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] sanpwc commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "sanpwc (via GitHub)" <gi...@apache.org>.
sanpwc commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1222629038


##########
modules/replicator/src/main/java/org/apache/ignite/internal/replicator/ReplicaManager.java:
##########
@@ -387,11 +399,18 @@ private boolean stopReplicaInternal(ReplicationGroupId replicaGrpId) {
         CompletableFuture<Replica> removed = replicas.remove(replicaGrpId);
 
         if (removed != null) {
-            removed.whenComplete((replica, throwable) -> {
-                if (throwable != null) {
-                    replica.shutdown();
-                }
-            });
+            if (!removed.isDone()) {
+                removed.completeExceptionally(new ReplicaStoppingException(

Review Comment:
   I'd still call shutdown just in case. Seems that it should be safe, what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] vldpyatkov commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "vldpyatkov (via GitHub)" <gi...@apache.org>.
vldpyatkov commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1222965710


##########
modules/replicator/src/main/java/org/apache/ignite/internal/replicator/ReplicaManager.java:
##########
@@ -387,11 +399,18 @@ private boolean stopReplicaInternal(ReplicationGroupId replicaGrpId) {
         CompletableFuture<Replica> removed = replicas.remove(replicaGrpId);
 
         if (removed != null) {
-            removed.whenComplete((replica, throwable) -> {
-                if (throwable != null) {
-                    replica.shutdown();
-                }
-            });
+            if (!removed.isDone()) {
+                removed.completeExceptionally(new ReplicaStoppingException(

Review Comment:
   I changed the behavior and add shutdown invocation in any case when the future is not completed with exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] sanpwc commented on a diff in pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "sanpwc (via GitHub)" <gi...@apache.org>.
sanpwc commented on code in PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142#discussion_r1221695867


##########
modules/replicator/src/main/java/org/apache/ignite/internal/replicator/exception/ReplicaStoppingException.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.replicator.exception;
+
+import static org.apache.ignite.lang.ErrorGroups.Replicator.REPLICA_STOPPING_ERR;
+
+import java.util.UUID;
+import org.apache.ignite.internal.replicator.ReplicationGroupId;
+import org.apache.ignite.network.ClusterNode;
+
+/**
+ * The exception is thrown for unhandled requests that wait for the replica, but it is already stopping.
+ */
+public class ReplicaStoppingException extends ReplicationException {
+    /**
+     * The constructor.
+     *
+     * @param groupId Replication group id.
+     * @param node Node.
+     */
+    public ReplicaStoppingException(ReplicationGroupId groupId, ClusterNode node) {
+        super(REPLICA_STOPPING_ERR, "Replica is stopping [replicationGroupId=" + groupId + ", nodeName=" + node.name() + ']');
+    }
+
+    /**
+     * The constructor is used for creating an exception instance that is thrown from a remote server.
+     *
+     * @param traceId Trace id.
+     * @param code Error code.
+     * @param message Error message.
+     * @param cause Cause exception.
+     */
+    public ReplicaStoppingException(UUID traceId, int code, String message, Throwable cause) {

Review Comment:
   Where do we use such constructor? Within deserialization process?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ignite-3] vldpyatkov merged pull request #2142: IGNITE-19494 Correctly stop a replica

Posted by "vldpyatkov (via GitHub)" <gi...@apache.org>.
vldpyatkov merged PR #2142:
URL: https://github.com/apache/ignite-3/pull/2142


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org