You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/03/31 17:18:30 UTC

[GitHub] [hive] aasha opened a new pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase

aasha opened a new pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase
URL: https://github.com/apache/hive/pull/965
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] anishek commented on a change in pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase

Posted by GitBox <gi...@apache.org>.
anishek commented on a change in pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase
URL: https://github.com/apache/hive/pull/965#discussion_r401496558
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##########
 @@ -296,4 +300,17 @@ public static boolean includeAcidTableInDump(HiveConf conf) {
   public static boolean tableIncludedInReplScope(ReplScope replScope, String tableName) {
     return ((replScope == null) || replScope.tableIncludedInReplScope(tableName));
   }
+
+  public static boolean dataCopyCompleted(Path toPath, HiveConf conf) throws IOException {
+    FileSystem dstFs = null;
+    dstFs = toPath.getFileSystem(conf);
+    if (dstFs.exists(new Path(toPath, ReplUtils.COPY_ACKNOWLEDGEMENT))) {
 
 Review comment:
   do return of condition directly rather than using if clause

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] anishek commented on a change in pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase

Posted by GitBox <gi...@apache.org>.
anishek commented on a change in pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase
URL: https://github.com/apache/hive/pull/965#discussion_r401486847
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##########
 @@ -906,6 +908,131 @@ public void replicationWithTableNameContainsKeywords() throws Throwable {
             .verifyReplTargetProperty(replicatedDbName);
   }
 
+  @Test
+  public void testCheckPointing() throws Throwable {
+    List<String> withClauseOptions = externalTableBasePathWithClause();
+    WarehouseInstance.Tuple bootstrapDump = primary.run("use " + primaryDbName)
+            .run("CREATE TABLE t1(a string) STORED AS TEXTFILE")
+            .run("CREATE EXTERNAL TABLE t2(a string) STORED AS TEXTFILE")
+            .run("insert into t1 values (1)")
+            .run("insert into t1 values (2)")
+            .run("insert into t2 values (11)")
+            .run("insert into t2 values (21)")
+            .dump(primaryDbName, withClauseOptions);
+
+    // verify that the external table info is written correctly for bootstrap
+    assertExternalFileInfo(Arrays.asList("t2"), bootstrapDump.dumpLocation, primaryDbName);
 
 Review comment:
   might be better to use another location here completely, sometimes since we are writing to same dump location and also the db directory is there you might have files in locations that are slight off and you wont realize it till you run actual prod scenario

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] anishek commented on a change in pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase

Posted by GitBox <gi...@apache.org>.
anishek commented on a change in pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase
URL: https://github.com/apache/hive/pull/965#discussion_r401496301
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##########
 @@ -98,6 +100,8 @@
   public static final String DUMP_ACKNOWLEDGEMENT = "_finished_dump";
   //Acknowledgement for repl load complete
   public static final String LOAD_ACKNOWLEDGEMENT = "_finished_load";
+  //Acknowledgement for data copy complete. Used for checkpointing
+  public static final String COPY_ACKNOWLEDGEMENT = "_finished_copy";
 
 Review comment:
   looks like we need a constants Class or a Enum class, the utils one seem to have a lot of magic variables.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] anishek commented on a change in pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase

Posted by GitBox <gi...@apache.org>.
anishek commented on a change in pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase
URL: https://github.com/apache/hive/pull/965#discussion_r401491670
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ReplCopyWork.java
 ##########
 @@ -120,4 +122,12 @@ public boolean isNeedCheckDuplicateCopy() {
   public void setCheckDuplicateCopy(boolean flag) {
     checkDuplicateCopy = flag;
   }
+
+  public boolean isCheckpointEnabled() {
+    return checkpointEnabled;
+  }
+
+  public void setCheckpointEnabled(boolean checkpointEnabled) {
 
 Review comment:
   do initialization based on constructor rather than setters ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] anishek commented on a change in pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase

Posted by GitBox <gi...@apache.org>.
anishek commented on a change in pull request #965: HIVE-23039 Checkpointing for repl dump bootstrap phase
URL: https://github.com/apache/hive/pull/965#discussion_r401496956
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##########
 @@ -296,4 +300,17 @@ public static boolean includeAcidTableInDump(HiveConf conf) {
   public static boolean tableIncludedInReplScope(ReplScope replScope, String tableName) {
     return ((replScope == null) || replScope.tableIncludedInReplScope(tableName));
   }
+
+  public static boolean dataCopyCompleted(Path toPath, HiveConf conf) throws IOException {
+    FileSystem dstFs = null;
+    dstFs = toPath.getFileSystem(conf);
+    if (dstFs.exists(new Path(toPath, ReplUtils.COPY_ACKNOWLEDGEMENT))) {
+      return true;
+    }
+    return false;
+  }
+
+  public static void setDataCopyComplete(Path toPath, HiveConf conf) throws SemanticException {
 
 Review comment:
   rename method to ackCopy() ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org