You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ignite.apache.org by GitBox <gi...@apache.org> on 2022/06/23 10:57:05 UTC

[GitHub] [ignite-3] alievmirza commented on a diff in pull request #881: IGNITE-16955 Improve logging for rebalance process

alievmirza commented on code in PR #881:
URL: https://github.com/apache/ignite-3/pull/881#discussion_r904799797


##########
modules/raft/src/main/java/org/apache/ignite/raft/jraft/core/NodeImpl.java:
##########
@@ -406,8 +406,10 @@ void onCaughtUp(final long version, final PeerId peer, final boolean success) {
             }
             Requires.requireTrue(this.stage == Stage.STAGE_CATCHING_UP, "Stage is not in STAGE_CATCHING_UP");
             if (success) {
+                LOG.info("Catch up for peer={} was finished", peer);
                 this.addingPeers.remove(peer);
                 if (this.addingPeers.isEmpty()) {
+                    LOG.info("Catch up phase to change peers from={} to={} was successfully finished", oldPeers, newPeers);

Review Comment:
   I would log this inside `org.apache.ignite.raft.jraft.core.NodeImpl.ConfigurationCtx#nextStage`



##########
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/raft/RebalanceRaftGroupEventsListener.java:
##########
@@ -140,6 +140,10 @@ public void onLeaderElected(long term) {
                     if (!pendingEntry.empty()) {
                         List<ClusterNode> pendingNodes = (List<ClusterNode>) ByteUtils.fromBytes(pendingEntry.value());
 
+                        LOG.info("New leader elected for the raft group={} "

Review Comment:
   ```suggestion
                           LOG.info("New leader was elected for the raft group={} "
   ```



##########
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/raft/RebalanceRaftGroupEventsListener.java:
##########
@@ -140,6 +140,10 @@ public void onLeaderElected(long term) {
                     if (!pendingEntry.empty()) {
                         List<ClusterNode> pendingNodes = (List<ClusterNode>) ByteUtils.fromBytes(pendingEntry.value());
 
+                        LOG.info("New leader elected for the raft group={} "
+                                        + "of partition={}, table={} and pending reconfiguration to peers={} discovered",
+                                partId, partNum, tblConfiguration.name().value(), pendingNodes);
+

Review Comment:
   Lets add logging to `org.apache.ignite.internal.table.distributed.TableManager#movePartition` so we could track that we have started a process of  moving a partition.



##########
modules/table/src/main/java/org/apache/ignite/internal/utils/RebalanceUtil.java:
##########
@@ -42,9 +42,33 @@
  */
 public class RebalanceUtil {
 
+    /** Logger. */
+    private static IgniteLogger LOG = IgniteLogger.forClass(RebalanceUtil.class);

Review Comment:
   Maybe final



##########
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/raft/RebalanceRaftGroupEventsListener.java:
##########
@@ -140,6 +140,10 @@ public void onLeaderElected(long term) {
                     if (!pendingEntry.empty()) {
                         List<ClusterNode> pendingNodes = (List<ClusterNode>) ByteUtils.fromBytes(pendingEntry.value());
 
+                        LOG.info("New leader elected for the raft group={} "

Review Comment:
   ```suggestion
                           LOG.info("New leader was elected for the raft group={} "
   ```



##########
modules/table/src/main/java/org/apache/ignite/internal/utils/RebalanceUtil.java:
##########
@@ -42,9 +42,33 @@
  */
 public class RebalanceUtil {
 
+    /** Logger. */
+    private static IgniteLogger LOG = IgniteLogger.forClass(RebalanceUtil.class);
+
+    /** Return code of metastore multi-invoke which identify,

Review Comment:
   `identifies` here and below



##########
modules/raft/src/main/java/org/apache/ignite/raft/jraft/rpc/impl/RaftGroupServiceImpl.java:
##########
@@ -567,6 +570,9 @@ private <R> void sendWithRetry(Peer peer, Object req, long stopTime, Completable
                 if (err != null) {
                     if (recoverable(err)) {
                         executor.schedule(() -> {
+                            LOG.warn("Recoverable error during the request type={} occurred: ",

Review Comment:
   Lets add here that we retrying intent on a randomly selected node



##########
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/raft/RebalanceRaftGroupEventsListener.java:
##########
@@ -269,16 +273,30 @@ private void doOnNewPeersConfigurationApplied(List<PeerId> peers) {
                                 remove(plannedPartAssignmentsKey(partId)))
                                 .yield(true),
                         ops().yield(false))).get().getAsBoolean()) {
+                    LOG.info("Finished rebalance of partition={}, table={} to peers={}, "

Review Comment:
   As far as we try again, we cannot say that rebalance was finished so let's remove this ''Finished rebalance of partition={},..." line and add it only when CAS was successful.
   
   Please update the same logs further in the code 



##########
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/raft/RebalanceRaftGroupEventsListener.java:
##########
@@ -140,6 +140,10 @@ public void onLeaderElected(long term) {
                     if (!pendingEntry.empty()) {
                         List<ClusterNode> pendingNodes = (List<ClusterNode>) ByteUtils.fromBytes(pendingEntry.value());
 
+                        LOG.info("New leader elected for the raft group={} "
+                                        + "of partition={}, table={} and pending reconfiguration to peers={} discovered",

Review Comment:
   ```suggestion
                                           + "of partition={}, table={} and pending reconfiguration to peers={} was discovered",
   ```



##########
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/TableManager.java:
##########
@@ -366,6 +366,9 @@ private CompletableFuture<?> onUpdateReplicas(ConfigurationNotificationEvent<Int
             if (replicasCtx.oldValue() != null && replicasCtx.oldValue() > 0) {
                 TableConfiguration tblCfg = replicasCtx.config(TableConfiguration.class);
 
+                LOG.info("Received update for replicas number for table={} from replicas={} to replicas={}",

Review Comment:
   If we run sone rebalance test, we can see in the logs the following:
   
   ```
   2022-06-23 13:18:27:798 +0300 [INFO][Thread-242][TableManager] Received update for replicas number for table=PUBLIC.TBL1 from replicas=1 to replicas=2
   2022-06-23 13:18:27:799 +0300 [INFO][Thread-244][TableManager] Received update for replicas number for table=PUBLIC.TBL1 from replicas=1 to replicas=2
   2022-06-23 13:18:27:802 +0300 [INFO][Thread-241][TableManager] Received update for replicas number for table=PUBLIC.TBL1 from replicas=1 to replicas=2
   ```
   Lines looks the same and could be more informative, if we add info about the node where such update has happened



##########
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/raft/RebalanceRaftGroupEventsListener.java:
##########
@@ -269,16 +273,30 @@ private void doOnNewPeersConfigurationApplied(List<PeerId> peers) {
                                 remove(plannedPartAssignmentsKey(partId)))
                                 .yield(true),
                         ops().yield(false))).get().getAsBoolean()) {
+                    LOG.info("Finished rebalance of partition={}, table={} to peers={}, "
+                                    + "but planned key={} was changed during results CAS write to metastore, will try again",

Review Comment:
   "during results CAS" 
   Could you please rephrase this, seems that idea is not clear 



##########
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/TableManager.java:
##########
@@ -366,6 +366,9 @@ private CompletableFuture<?> onUpdateReplicas(ConfigurationNotificationEvent<Int
             if (replicasCtx.oldValue() != null && replicasCtx.oldValue() > 0) {
                 TableConfiguration tblCfg = replicasCtx.config(TableConfiguration.class);
 
+                LOG.info("Received update for replicas number for table={} from replicas={} to replicas={}",

Review Comment:
   Maybe we should think about a general solution, where all lines of logs will contain info about the node which logged that line 



##########
modules/raft/src/main/java/org/apache/ignite/raft/jraft/rpc/impl/RaftGroupServiceImpl.java:
##########
@@ -372,6 +372,9 @@ public static CompletableFuture<RaftGroupService> start(
 
         CompletableFuture<ChangePeersAsyncResponse> fut = new CompletableFuture<>();
 
+        LOG.info("Sending request to changePeers of group={} to peers={} with leader term={}",

Review Comment:
   changePeersAsync



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@ignite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org