You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@iotdb.apache.org by GitBox <gi...@apache.org> on 2020/08/19 11:47:04 UTC

[GitHub] [incubator-iotdb] neuyilan opened a new pull request #1635: [Distributed] fix async applier bug when do snapshpot

neuyilan opened a new pull request #1635:
URL: https://github.com/apache/incubator-iotdb/pull/1635


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-iotdb] jt2594838 commented on a change in pull request #1635: [Distributed] Emend the async applier

Posted by GitBox <gi...@apache.org>.

jt2594838 commented on a change in pull request #1635:
URL: https://github.com/apache/incubator-iotdb/pull/1635#discussion_r478780874



##########
File path: cluster/src/main/java/org/apache/iotdb/cluster/log/manage/FilePartitionedSnapshotLogManager.java
##########
@@ -47,22 +52,47 @@
       .getLogger(FilePartitionedSnapshotLogManager.class);
 
   public FilePartitionedSnapshotLogManager(LogApplier logApplier, PartitionTable partitionTable,
-      Node header, Node thisNode) {
-    super(logApplier, partitionTable, header, thisNode, FileSnapshot::new);
+      Node header, Node thisNode, DataGroupMember dataGroupMember) {
+    super(logApplier, partitionTable, header, thisNode, FileSnapshot::new, dataGroupMember);
+  }
+
+  /**
+   * send FlushPlan to all nodes in one dataGroup
+   */
+  public void syncFlushAllProcessor() {
+    logger.info("{}: Start flush all storage group processor in one data group", getName());
+    ConcurrentHashMap<String, StorageGroupProcessor> processorMap = StorageEngine.getInstance()
+        .getProcessorMap();
+    if (processorMap.size() == 0) {
+      logger.info("{}: no need to flush processor", getName());
+      return;
+    }
+    List<Path> storageGroups = new ArrayList<>();
+    for (Map.Entry<String, StorageGroupProcessor> entry : processorMap.entrySet()) {
+      Path path = new Path(entry.getKey());
+      storageGroups.add(path);
+    }
+    FlushPlan plan = new FlushPlan(null, true, storageGroups);
+    dataGroupMember.flushFileWhenDoSnapshot(plan);

Review comment:
       I am afraid that we cannot flush the whole storage group when time partitioning is enabled. Because in that case, a storage group will be managed by several data groups, if you flush one storage group without notifying other data groups, the file integrity of other data groups will be broken.
   So you should either flush other related groups (which is rather hard to find), or only flush partitions that belong to the data group.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-iotdb] neuyilan commented on pull request #1635: [Distributed] fix async applier bug when do snapshpot

Posted by GitBox <gi...@apache.org>.

neuyilan commented on pull request #1635:
URL: https://github.com/apache/incubator-iotdb/pull/1635#issuecomment-676856963


   > The fix itself has a potential problem that when new logs keep coming in, it is very likely to time out.
   > Now I see you want to make sure when the snapshot is taken, all previous logs are applied. This side effect has been noticed but back to then, AsyncApplier is merely in the experimental stage. Since you have fixed one of the problems, I would like to give some pieces of advice to make AsyncApplier complete:
   > 
   > 1. When starting to take a snapshot, record the current commitIndex.
   > 2. Block until all logs whose indices <= the recorded commitIndex are applied. (Use RaftLogManager to do so instead of LogApplier)
   > 3. Prevent the log cleaner thread to clean logs that are not applied.
   > 4. Change `StorageEngine.getInstance().syncCloseAllProcessor();` in `takeSnapshot()` to send a flush plan within the group. (So the file sequence will not be broken by snapshots)
   > 5. When committed logs are recovered during start-up, re-apply all of them. (Notice that operation sequences in IoTDB are idempotent)
   
   Great suggestion, I think the 1-3 items you mentioned is to make sure that when do snapshot, new logs can not be added and the snapshot task should not take long time.  however now the implementation can block the new logs coming in, as when do snapshot we should get the _logManager_ lock, so new logs can not be committed(commited log also  need to get the  _logManager_ lock ), we just need to wait all the previous committed log applied when do snapshot.
   
   I'm going to rethink the 4-5 suggestions.
   
   
   
   
   
    
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-iotdb] neuyilan commented on a change in pull request #1635: [Distributed] Emend the async applier

Posted by GitBox <gi...@apache.org>.

neuyilan commented on a change in pull request #1635:
URL: https://github.com/apache/incubator-iotdb/pull/1635#discussion_r478819179



##########
File path: cluster/src/main/java/org/apache/iotdb/cluster/log/manage/FilePartitionedSnapshotLogManager.java
##########
@@ -47,22 +52,47 @@
       .getLogger(FilePartitionedSnapshotLogManager.class);
 
   public FilePartitionedSnapshotLogManager(LogApplier logApplier, PartitionTable partitionTable,
-      Node header, Node thisNode) {
-    super(logApplier, partitionTable, header, thisNode, FileSnapshot::new);
+      Node header, Node thisNode, DataGroupMember dataGroupMember) {
+    super(logApplier, partitionTable, header, thisNode, FileSnapshot::new, dataGroupMember);
+  }
+
+  /**
+   * send FlushPlan to all nodes in one dataGroup
+   */
+  public void syncFlushAllProcessor() {
+    logger.info("{}: Start flush all storage group processor in one data group", getName());
+    ConcurrentHashMap<String, StorageGroupProcessor> processorMap = StorageEngine.getInstance()
+        .getProcessorMap();
+    if (processorMap.size() == 0) {
+      logger.info("{}: no need to flush processor", getName());
+      return;
+    }
+    List<Path> storageGroups = new ArrayList<>();
+    for (Map.Entry<String, StorageGroupProcessor> entry : processorMap.entrySet()) {
+      Path path = new Path(entry.getKey());
+      storageGroups.add(path);
+    }
+    FlushPlan plan = new FlushPlan(null, true, storageGroups);
+    dataGroupMember.flushFileWhenDoSnapshot(plan);

Review comment:
       Sure, thanks  for your advice, I'd like to only flush the partitions that belong to the data group




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-iotdb] jt2594838 merged pull request #1635: [IOTDB-860] Emend the async log applier

Posted by GitBox <gi...@apache.org>.

jt2594838 merged pull request #1635:
URL: https://github.com/apache/incubator-iotdb/pull/1635


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org