You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/03/10 18:28:57 UTC

[GitHub] [hive] aasha opened a new pull request #949: HIVE-22990 Add file based ack for replication

aasha opened a new pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392037676
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
 ##########
 @@ -2447,48 +2478,48 @@ public void testIncrementalRepeatEventOnExistingObject() throws IOException, Int
     // INSERT EVENT to unpartitioned table
     run("INSERT INTO TABLE " + dbName + ".unptned values('" + unptn_data[0] + "')", driver);
     Tuple replDump = replDumpDb(dbName);
-    incrementalDumpList.add(replDump);
     Thread.sleep(1000);
     // INSERT EVENT to partitioned table with dynamic ADD_PARTITION
     run("INSERT INTO TABLE " + dbName + ".ptned PARTITION(b=1) values('" + ptn_data_1[0] + "')", driver);
+    //Second dump without load will print a warning
+    run("REPL DUMP " + dbName, driverMirror);
+    //Load the previous dump first
 
 Review comment:
   This changes the old test case logic. The old test case was expecting the events to be generated and then load. Any specific reason for this change ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392772730
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadCompleteAckTask.java
 ##########
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.repl.dump.Utils;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+import java.io.Serializable;
+
+/**
+ * ReplLoadCompleteAckTask.
+ *
+ * Add the load complete acknoledgement.
+ **/
+public class ReplLoadCompleteAckTask extends Task<ReplLoadCompleteAckWork> implements Serializable {
 
 Review comment:
   i don't think a separate task is required for this 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392789614
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -584,6 +578,7 @@ private int executeIncrementalLoad() {
         DAGTraversal.traverse(childTasks, new AddDependencyToLeaves(TaskFactory.get(work, conf)));
       }
       this.childTasks = childTasks;
+      createReplLoadCompleteAckTask();
 
 Review comment:
   The check is inside the method

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393437333
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadCompleteAckTask.java
 ##########
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.repl.dump.Utils;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+import java.io.Serializable;
+
+/**
+ * ReplLoadCompleteAckTask.
+ *
+ * Add the load complete acknoledgement.
+ **/
+public class ReplLoadCompleteAckTask extends Task<ReplLoadCompleteAckWork> implements Serializable {
 
 Review comment:
   No ..when repl load / dump task ..returns the status ..it makes sure that ..all child tasks are done ..

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392552335
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
 ##########
 @@ -2447,48 +2478,48 @@ public void testIncrementalRepeatEventOnExistingObject() throws IOException, Int
     // INSERT EVENT to unpartitioned table
     run("INSERT INTO TABLE " + dbName + ".unptned values('" + unptn_data[0] + "')", driver);
     Tuple replDump = replDumpDb(dbName);
-    incrementalDumpList.add(replDump);
     Thread.sleep(1000);
     // INSERT EVENT to partitioned table with dynamic ADD_PARTITION
     run("INSERT INTO TABLE " + dbName + ".ptned PARTITION(b=1) values('" + ptn_data_1[0] + "')", driver);
+    //Second dump without load will print a warning
+    run("REPL DUMP " + dbName, driverMirror);
+    //Load the previous dump first
 
 Review comment:
   This is because only one dump is allowed before load. It has to be alternate. Only after the load acknowledgement is present, the next dump proceeds

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392847438
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##########
 @@ -168,6 +185,42 @@ private Long getEventFromPreviousDumpMetadata(Path dumpRoot) throws IOException,
     return 0L;
   }
 
+  private Path getPreviousDumpMetadataPath(Path dumpRoot) throws IOException {
+    FileSystem fs = dumpRoot.getFileSystem(conf);
+    if (fs.exists(dumpRoot)) {
+      FileStatus[] statuses = fs.listStatus(dumpRoot);
+      if (statuses.length > 0)  {
+        FileStatus latestUpdatedStatus = statuses[0];
+        for (FileStatus status : statuses) {
+          if (status.getModificationTime() > latestUpdatedStatus.getModificationTime()) {
+            latestUpdatedStatus = status;
+          }
+        }
+        return latestUpdatedStatus.getPath();
+      }
+    }
+    return null;
+  }
+
+  private boolean shouldDump(Path previousDumpPath) throws IOException {
+    //If no previous dump means bootstrap. So return true as there was no
+    //previous dump to load
+    if (previousDumpPath == null) {
+      return true;
+    } else {
+      FileSystem fs = previousDumpPath.getFileSystem(conf);
+      if (fs.exists(previousDumpPath)) {
 
 Review comment:
   yes will update that

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392846451
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##########
 @@ -126,21 +126,30 @@ public int execute() {
       Path dumpRoot = new Path(conf.getVar(HiveConf.ConfVars.REPLDIR),
               Base64.getEncoder().encodeToString(work.dbNameOrPattern.toLowerCase()
                       .getBytes(StandardCharsets.UTF_8.name())));
-      Path currentDumpPath = new Path(dumpRoot, getNextDumpDir());
-      Path hiveDumpRoot = new Path(currentDumpPath, ReplUtils.REPL_HIVE_BASE_DIR);
-      DumpMetaData dmd = new DumpMetaData(hiveDumpRoot, conf);
-      // Initialize ReplChangeManager instance since we will require it to encode file URI.
-      ReplChangeManager.getInstance(conf);
-      Path cmRoot = new Path(conf.getVar(HiveConf.ConfVars.REPLCMDIR));
-      Long lastReplId;
-      if (!dumpRoot.getFileSystem(conf).exists(dumpRoot)
-              || dumpRoot.getFileSystem(conf).listStatus(dumpRoot).length == 0) {
-        lastReplId = bootStrapDump(hiveDumpRoot, dmd, cmRoot, hiveDb);
+      Path previousDumpMetaPath = getPreviousDumpMetadataPath(dumpRoot);
 
 Review comment:
   yes only hive user should have access to this staging directory as we are completely relying on FS based ack.
   But in case of irrecoverable users, users should delete the _load_execption file for dump/load to proceed. So they will need some admin access. But in general they shouldn't modify anything.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r394056775
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -398,6 +319,21 @@ private void dropTablesExcludedInReplScope(ReplScope replScope) throws HiveExcep
             dbName);
   }
 
+  private void createReplLoadCompleteAckTask() {
+    if ((work.isIncrementalLoad() && !work.incrementalLoadTasksBuilder().hasMoreWork() && !work.hasBootstrapLoadTasks())
 
 Review comment:
   The external table tasks are already added before this method is called

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r394056775
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -398,6 +319,21 @@ private void dropTablesExcludedInReplScope(ReplScope replScope) throws HiveExcep
             dbName);
   }
 
+  private void createReplLoadCompleteAckTask() {
+    if ((work.isIncrementalLoad() && !work.incrementalLoadTasksBuilder().hasMoreWork() && !work.hasBootstrapLoadTasks())
 
 Review comment:
   The check is already present before the method is being called

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393782909
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
 ##########
 @@ -312,7 +312,22 @@ public void testBasic() throws IOException {
     verifySetup("SELECT * from " + dbName + ".unptned_empty", empty, driver);
 
     String replicatedDbName = dbName + "_dupe";
-    bootstrapLoadAndVerify(dbName, replicatedDbName);
+    Tuple bootstrapDump = bootstrapLoadAndVerify(dbName, replicatedDbName);
+
+    FileSystem fs = new Path(bootstrapDump.dumpLocation).getFileSystem(hconf);
+    Path dumpPath = new Path(bootstrapDump.dumpLocation, ReplUtils.REPL_HIVE_BASE_DIR);
+    boolean dumpAckFound = false;
+    boolean loadAckFound = false;
+    for (FileStatus status : fs.listStatus(dumpPath)) {
+      if (status.getPath().getName().equalsIgnoreCase(ReplUtils.DUMP_ACKNOWLEDGEMENT)) {
+        dumpAckFound = true;
+      }
+      if (status.getPath().getName().equalsIgnoreCase(ReplUtils.LOAD_ACKNOWLEDGEMENT)) {
+        loadAckFound = true;
+      }
+    }
+    assertTrue(dumpAckFound);
+    assertTrue(loadAckFound);
 
 Review comment:
   Why not just use:
   assertTrue(fs.exists(new Path(dumpPath, ReplUtils.DUMP_ACKNOWLEDGEMENT)));
   assertTrue(fs.exists(new Path(dumpPath, ReplUtils.LOAD_ACKNOWLEDGEMENT)));

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393848763
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java
 ##########
 @@ -429,36 +428,19 @@ private Path getCurrentLoadPath() throws IOException, SemanticException {
     }
     FileStatus[] statuses = loadPathBase.getFileSystem(conf).listStatus(loadPathBase);
     if (statuses.length > 0) {
-      //sort based on last modified. Recent one is at the end
-      Arrays.sort(statuses, new Comparator<FileStatus>() {
-        public int compare(FileStatus f1, FileStatus f2) {
-          return Long.compare(f1.getModificationTime(), f2.getModificationTime());
+      //sort based on last modified. Recent one is at the beginning
+      FileStatus latestUpdatedStatus = statuses[0];
+      for (FileStatus status : statuses) {
+        if (status.getModificationTime() > latestUpdatedStatus.getModificationTime()) {
+          latestUpdatedStatus = status;
         }
-      });
-      if (replScope.getDbName() != null) {
-        String currentReplStatusOfTarget
-                = getReplStatus(replScope.getDbName());
-        if (currentReplStatusOfTarget == null) { //bootstrap
-          return new Path(statuses[0].getPath(), ReplUtils.REPL_HIVE_BASE_DIR);
-        } else {
-          DumpMetaData latestDump = new DumpMetaData(
-                  new Path(statuses[statuses.length - 1].getPath(), ReplUtils.REPL_HIVE_BASE_DIR), conf);
-          if (Long.parseLong(currentReplStatusOfTarget.trim()) >= latestDump.getEventTo()) {
-            isTargetAlreadyLoaded = true;
-          } else {
-            for (FileStatus status : statuses) {
-              Path hiveLoadPath = new Path(status.getPath(), ReplUtils.REPL_HIVE_BASE_DIR);
-              DumpMetaData dmd = new DumpMetaData(hiveLoadPath, conf);
-              if (dmd.isIncrementalDump()
-                      && Long.parseLong(currentReplStatusOfTarget.trim()) < dmd.getEventTo()) {
-                return hiveLoadPath;
-              }
-            }
-          }
+      }
+      Path hiveDumpPath = new Path(latestUpdatedStatus.getPath(), ReplUtils.REPL_HIVE_BASE_DIR);
+      if (loadPathBase.getFileSystem(conf).exists(hiveDumpPath)) {
 
 Review comment:
   Why do we need this check? Aren't other two checks below good enough?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392090821
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
 ##########
 @@ -98,7 +98,10 @@
   // Configuration to enable/disable dumping ACID tables. Used only for testing and shouldn't be
   // seen in production or in case of tests other than the ones where it's required.
   public static final String REPL_DUMP_INCLUDE_ACID_TABLES = "hive.repl.dump.include.acid.tables";
-
+  //Acknoledgement for repl dump complete
 
 Review comment:
   typo "Acknoledgement "

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392552323
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
 ##########
 @@ -2387,39 +2427,33 @@ public void testTruncateWithCM() throws IOException {
     run("INSERT INTO TABLE " + dbName + ".unptned values('" + unptn_data[0] + "')", driver);
     Tuple firstInsert = replDumpDb(dbName);
     Integer numOfEventsIns1 = Integer.valueOf(firstInsert.lastReplId) - Integer.valueOf(replDumpId);
+    // load only first insert (1 record)
+    loadAndVerify(replDbName, dbName, firstInsert.lastReplId);
 
 Review comment:
   This is because only one dump is allowed before load. It has to be alternate. Only after the load acknowledgement is present, the next dump proceeds

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392772991
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##########
 @@ -146,7 +146,7 @@ public int execute() {
         }
         prepareReturnValues(Arrays.asList(currentDumpPath.toUri().toString(), String.valueOf(lastReplId)));
         writeDumpCompleteAck(hiveDumpRoot);
-        deletePreviousDumpMeta(previousDumpMetaPath);
+        deleteAllPreviousDumpMeta(dumpRoot, currentDumpPath);
 
 Review comment:
   As load and dump are operating on same path ..how will it behave when load is reading these dump directory while dump is reading it ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393437333
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadCompleteAckTask.java
 ##########
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.repl.dump.Utils;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+import java.io.Serializable;
+
+/**
+ * ReplLoadCompleteAckTask.
+ *
+ * Add the load complete acknoledgement.
+ **/
+public class ReplLoadCompleteAckTask extends Task<ReplLoadCompleteAckWork> implements Serializable {
 
 Review comment:
   No ..when repl load / dump task ..returns the status ..it makes sure that ..all child tasks are done ..

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392028251
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/Utils.java
 ##########
 @@ -82,6 +82,16 @@ public static void writeOutput(List<List<String>> listValues, Path outputFile, H
     }
   }
 
+  public static void write(Path outputFile, HiveConf hiveConf)
+          throws SemanticException {
+    try {
+      FileSystem fs = outputFile.getFileSystem(hiveConf);
+      fs.create(outputFile);
 
 Review comment:
   add retry logic

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392037888
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
 ##########
 @@ -3262,14 +3286,11 @@ public void testDumpNonReplDatabase() throws IOException {
     verifyFail("REPL DUMP " + dbName, driver);
     assertTrue(run("REPL DUMP " + dbName + " with ('hive.repl.dump.metadata.only' = 'true')",
             true, driver));
+    //Dump again before load will print a warning
     assertTrue(run("REPL DUMP " + dbName + " with ('hive.repl.dump.metadata.only' = 'true')",
             true, driver));
-    run("alter database " + dbName + " set dbproperties ('repl.source.for' = '1, 2, 3')", driver);
-    assertTrue(run("REPL DUMP " + dbName, true, driver));
 
 Review comment:
   so many changes in existing test cases should be avoided 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392800936
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##########
 @@ -126,21 +126,30 @@ public int execute() {
       Path dumpRoot = new Path(conf.getVar(HiveConf.ConfVars.REPLDIR),
               Base64.getEncoder().encodeToString(work.dbNameOrPattern.toLowerCase()
                       .getBytes(StandardCharsets.UTF_8.name())));
-      Path currentDumpPath = new Path(dumpRoot, getNextDumpDir());
-      Path hiveDumpRoot = new Path(currentDumpPath, ReplUtils.REPL_HIVE_BASE_DIR);
-      DumpMetaData dmd = new DumpMetaData(hiveDumpRoot, conf);
-      // Initialize ReplChangeManager instance since we will require it to encode file URI.
-      ReplChangeManager.getInstance(conf);
-      Path cmRoot = new Path(conf.getVar(HiveConf.ConfVars.REPLCMDIR));
-      Long lastReplId;
-      if (!dumpRoot.getFileSystem(conf).exists(dumpRoot)
-              || dumpRoot.getFileSystem(conf).listStatus(dumpRoot).length == 0) {
-        lastReplId = bootStrapDump(hiveDumpRoot, dmd, cmRoot, hiveDb);
+      Path previousDumpMetaPath = getPreviousDumpMetadataPath(dumpRoot);
 
 Review comment:
   Does this mean that. user is not supposed to Alter the dump directory in the middle of replication? Any way to restrict that operation?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393804594
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -286,6 +277,7 @@ a database ( directory )
 
       // Populate the driver context with the scratch dir info from the repl context, so that the temp dirs will be cleaned up later
       context.getFsScratchDirs().putAll(loadContext.pathInfo.getFsScratchDirs());
+      createReplLoadCompleteAckTask();
 
 Review comment:
   From executeBootStrapLoad the ack task will be added. We need this ack to be added at the end before we return a status.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392758479
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
 ##########
 @@ -2387,39 +2427,33 @@ public void testTruncateWithCM() throws IOException {
     run("INSERT INTO TABLE " + dbName + ".unptned values('" + unptn_data[0] + "')", driver);
     Tuple firstInsert = replDumpDb(dbName);
     Integer numOfEventsIns1 = Integer.valueOf(firstInsert.lastReplId) - Integer.valueOf(replDumpId);
+    // load only first insert (1 record)
+    loadAndVerify(replDbName, dbName, firstInsert.lastReplId);
 
 Review comment:
   i think ..the old repl dump and load mechanism should work fine.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393945947
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -398,6 +319,21 @@ private void dropTablesExcludedInReplScope(ReplScope replScope) throws HiveExcep
             dbName);
   }
 
+  private void createReplLoadCompleteAckTask() {
+    if ((work.isIncrementalLoad() && !work.incrementalLoadTasksBuilder().hasMoreWork() && !work.hasBootstrapLoadTasks())
 
 Review comment:
   During incremental, when we have no events in dump and only external tables are present, will the event folder still be created? If not the check work.incrementalLoadTasksBuilder().hasMoreWork() might not suffice?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392789944
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadCompleteAckTask.java
 ##########
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.repl.dump.Utils;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+import java.io.Serializable;
+
+/**
+ * ReplLoadCompleteAckTask.
+ *
+ * Add the load complete acknoledgement.
+ **/
+public class ReplLoadCompleteAckTask extends Task<ReplLoadCompleteAckWork> implements Serializable {
 
 Review comment:
   If we directly add the ack at the ReplLoadTask complete, its not correct. There can be DDLM Tasks or Copy Tasks which are not yet finished but the Repl Load Task is complete. So we need to add this task at the end of the DAG so that once all other tasks are complete without any error only then it writes the ack

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392789944
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadCompleteAckTask.java
 ##########
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.repl.dump.Utils;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+
+import java.io.Serializable;
+
+/**
+ * ReplLoadCompleteAckTask.
+ *
+ * Add the load complete acknoledgement.
+ **/
+public class ReplLoadCompleteAckTask extends Task<ReplLoadCompleteAckWork> implements Serializable {
 
 Review comment:
   If we directly add the ack at the ReplLoadTask complete, its not correct. There can be DDL Tasks or Copy Tasks which are not yet finished but the Repl Load Task is complete. So we need to add this task at the end of the DAG so that once all other tasks are complete without any error only then it writes the ack

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393803832
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##########
 @@ -146,7 +146,7 @@ public int execute() {
         }
         prepareReturnValues(Arrays.asList(currentDumpPath.toUri().toString(), String.valueOf(lastReplId)));
         writeDumpCompleteAck(hiveDumpRoot);
-        deletePreviousDumpMeta(previousDumpMetaPath);
+        deleteAllPreviousDumpMeta(dumpRoot, currentDumpPath);
 
 Review comment:
   status is a cached copy. So there won't be a problem

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393783338
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
 ##########
 @@ -838,7 +859,26 @@ public void testIncrementalAdds() throws IOException {
     verifySetup("SELECT a from " + dbName + ".ptned_late WHERE b=2", ptn_data_2, driver);
 
     // Perform REPL-DUMP/LOAD
-    incrementalLoadAndVerify(dbName, replDbName);
+    Tuple incrementalDump = incrementalLoadAndVerify(dbName, replDbName);
+    FileSystem fs = new Path(bootstrapDump.dumpLocation).getFileSystem(hconf);
+    boolean dumpAckFound = false;
+    boolean loadAckFound = false;
+    assertFalse(fs.exists(new Path(bootstrapDump.dumpLocation)));
+    fs = new Path(incrementalDump.dumpLocation).getFileSystem(hconf);
+    Path dumpPath = new Path(incrementalDump.dumpLocation, ReplUtils.REPL_HIVE_BASE_DIR);
+    dumpAckFound = false;
+    loadAckFound = false;
+    for (FileStatus status : fs.listStatus(dumpPath)) {
+      if (status.getPath().getName().equalsIgnoreCase(ReplUtils.DUMP_ACKNOWLEDGEMENT)) {
+        dumpAckFound = true;
+      }
+      if (status.getPath().getName().equalsIgnoreCase(ReplUtils.LOAD_ACKNOWLEDGEMENT)) {
+        loadAckFound = true;
+      }
+    }
+
+    assertTrue(dumpAckFound);
+    assertTrue(loadAckFound);
 
 Review comment:
   Why not just use:
   assertTrue(fs.exists(new Path(dumpPath, ReplUtils.DUMP_ACKNOWLEDGEMENT)));
   assertTrue(fs.exists(new Path(dumpPath, ReplUtils.LOAD_ACKNOWLEDGEMENT)));

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392770512
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/Utils.java
 ##########
 @@ -84,10 +85,17 @@ public static void writeOutput(List<List<String>> listValues, Path outputFile, H
 
   public static void write(Path outputFile, HiveConf hiveConf)
 
 Review comment:
   change the name to create

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393780241
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -286,6 +277,7 @@ a database ( directory )
 
       // Populate the driver context with the scratch dir info from the repl context, so that the temp dirs will be cleaned up later
       context.getFsScratchDirs().putAll(loadContext.pathInfo.getFsScratchDirs());
+      createReplLoadCompleteAckTask();
 
 Review comment:
   1) During incremental when there is only bootstrap task and no  events, this might not reach here, it will go back right from Line 436: return executeBootStrapLoad();
   2) How does it make sure that the ack task is added after the Repl event ID update task?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392089866
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##########
 @@ -168,6 +182,36 @@ private Long getEventFromPreviousDumpMetadata(Path dumpRoot) throws IOException,
     return 0L;
   }
 
+  private Path getPreviousDumpMetadataPath(Path dumpRoot) throws IOException {
+    FileSystem fs = dumpRoot.getFileSystem(conf);
+    if (fs.exists(dumpRoot)) {
+      FileStatus[] statuses = fs.listStatus(dumpRoot);
+      if (statuses.length == 1) {
 
 Review comment:
   if previous one is not deleted after dump is done successfully ..then there could be 2 dump directory  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392800355
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##########
 @@ -168,6 +185,42 @@ private Long getEventFromPreviousDumpMetadata(Path dumpRoot) throws IOException,
     return 0L;
   }
 
+  private Path getPreviousDumpMetadataPath(Path dumpRoot) throws IOException {
+    FileSystem fs = dumpRoot.getFileSystem(conf);
+    if (fs.exists(dumpRoot)) {
+      FileStatus[] statuses = fs.listStatus(dumpRoot);
+      if (statuses.length > 0)  {
+        FileStatus latestUpdatedStatus = statuses[0];
+        for (FileStatus status : statuses) {
+          if (status.getModificationTime() > latestUpdatedStatus.getModificationTime()) {
+            latestUpdatedStatus = status;
+          }
+        }
+        return latestUpdatedStatus.getPath();
+      }
+    }
+    return null;
+  }
+
+  private boolean shouldDump(Path previousDumpPath) throws IOException {
+    //If no previous dump means bootstrap. So return true as there was no
+    //previous dump to load
+    if (previousDumpPath == null) {
+      return true;
+    } else {
+      FileSystem fs = previousDumpPath.getFileSystem(conf);
+      if (fs.exists(previousDumpPath)) {
 
 Review comment:
   Should we just use return fs.exists(new Path(previousDumpPath, ReplUtils.LOAD_ACKNOWLEDGEMENT));

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392790094
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##########
 @@ -146,7 +146,7 @@ public int execute() {
         }
         prepareReturnValues(Arrays.asList(currentDumpPath.toUri().toString(), String.valueOf(lastReplId)));
         writeDumpCompleteAck(hiveDumpRoot);
-        deletePreviousDumpMeta(previousDumpMetaPath);
+        deleteAllPreviousDumpMeta(dumpRoot, currentDumpPath);
 
 Review comment:
   Load should read only the latest one.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392090433
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -97,19 +98,40 @@ public StageType getType() {
 
   @Override
   public int execute() {
-    Task<?> rootTask = work.getRootTask();
-    if (rootTask != null) {
-      rootTask.setChildTasks(null);
-    }
-    work.setRootTask(this);
-    this.parentTasks = null;
-    if (work.isIncrementalLoad()) {
-      return executeIncrementalLoad();
-    } else {
-      return executeBootStrapLoad();
+    try {
+      Task<?> rootTask = work.getRootTask();
+      if (rootTask != null) {
+        rootTask.setChildTasks(null);
+      }
+      work.setRootTask(this);
+      this.parentTasks = null;
+      int status = 0;
+      if (work.isIncrementalLoad()) {
+        status = executeIncrementalLoad();
+        //All repl load tasks are executed and status is 0, write the load ack
+        if (status == 0 && !work.incrementalLoadTasksBuilder().hasMoreWork()) {
+          writeLoadCompleteAck(work.dumpDirectory);
+        }
+      } else {
+        status = executeBootStrapLoad();
+        //All repl load tasks are executed and status is 0, write the load ack
+        if (status == 0 && !work.hasBootstrapLoadTasks()) {
 
 Review comment:
   This can be common code ..

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393780762
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -398,6 +319,21 @@ private void dropTablesExcludedInReplScope(ReplScope replScope) throws HiveExcep
             dbName);
   }
 
+  private void createReplLoadCompleteAckTask() {
+    if ((work.isIncrementalLoad() && !work.incrementalLoadTasksBuilder().hasMoreWork() && !work.hasBootstrapLoadTasks())
+            || (!work.isIncrementalLoad() && !work.hasBootstrapLoadTasks())) {
+      //All repl load tasks are executed and status is 0, create the task to add the acknowledgement
+      ReplLoadCompleteAckWork replLoadCompleteAckWork = new ReplLoadCompleteAckWork(work.dumpDirectory);
+      Task<ReplLoadCompleteAckWork> loadCompleteAckWorkTask = TaskFactory.get(replLoadCompleteAckWork, conf);
+      if (this.childTasks.isEmpty()) {
 
 Review comment:
   Isn't childTask null sometime?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r394056775
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -398,6 +319,21 @@ private void dropTablesExcludedInReplScope(ReplScope replScope) throws HiveExcep
             dbName);
   }
 
+  private void createReplLoadCompleteAckTask() {
+    if ((work.isIncrementalLoad() && !work.incrementalLoadTasksBuilder().hasMoreWork() && !work.hasBootstrapLoadTasks())
 
 Review comment:
   The external table tasks are already added before this method is called. But have still added the check.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392089082
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##########
 @@ -126,21 +126,30 @@ public int execute() {
       Path dumpRoot = new Path(conf.getVar(HiveConf.ConfVars.REPLDIR),
               Base64.getEncoder().encodeToString(work.dbNameOrPattern.toLowerCase()
                       .getBytes(StandardCharsets.UTF_8.name())));
-      Path currentDumpPath = new Path(dumpRoot, getNextDumpDir());
-      Path hiveDumpRoot = new Path(currentDumpPath, ReplUtils.REPL_HIVE_BASE_DIR);
-      DumpMetaData dmd = new DumpMetaData(hiveDumpRoot, conf);
-      // Initialize ReplChangeManager instance since we will require it to encode file URI.
-      ReplChangeManager.getInstance(conf);
-      Path cmRoot = new Path(conf.getVar(HiveConf.ConfVars.REPLCMDIR));
-      Long lastReplId;
-      if (!dumpRoot.getFileSystem(conf).exists(dumpRoot)
-              || dumpRoot.getFileSystem(conf).listStatus(dumpRoot).length == 0) {
-        lastReplId = bootStrapDump(hiveDumpRoot, dmd, cmRoot, hiveDb);
+      Path previousDumpMetaPath = getPreviousDumpMetadataPath(dumpRoot);
+      Path previousHiveDumpPath =
+              previousDumpMetaPath != null ? new Path(previousDumpMetaPath, ReplUtils.REPL_HIVE_BASE_DIR) : null;
+      //If no previous dump is present or previous dump was loaded, proceed
+      if (shouldDump(previousHiveDumpPath)) {
+        Path currentDumpPath = new Path(dumpRoot, getNextDumpDir());
+        Path hiveDumpRoot = new Path(currentDumpPath, ReplUtils.REPL_HIVE_BASE_DIR);
+        DumpMetaData dmd = new DumpMetaData(hiveDumpRoot, conf);
+        // Initialize ReplChangeManager instance since we will require it to encode file URI.
+        ReplChangeManager.getInstance(conf);
+        Path cmRoot = new Path(conf.getVar(HiveConf.ConfVars.REPLCMDIR));
+        Long lastReplId;
+        if (previousHiveDumpPath == null) {
+          lastReplId = bootStrapDump(hiveDumpRoot, dmd, cmRoot, hiveDb);
+        } else {
+          work.setEventFrom(getEventFromPreviousDumpMetadata(previousHiveDumpPath));
+          lastReplId = incrementalDump(hiveDumpRoot, dmd, cmRoot, hiveDb);
+        }
+        prepareReturnValues(Arrays.asList(currentDumpPath.toUri().toString(), String.valueOf(lastReplId)));
+        writeDumpCompleteAck(hiveDumpRoot);
+        deletePreviousDumpMeta(previousDumpMetaPath);
 
 Review comment:
   what if delete fails /system goes down after the dump ..how to cleanup the previous dump directory ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393806962
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -398,6 +319,21 @@ private void dropTablesExcludedInReplScope(ReplScope replScope) throws HiveExcep
             dbName);
   }
 
+  private void createReplLoadCompleteAckTask() {
+    if ((work.isIncrementalLoad() && !work.incrementalLoadTasksBuilder().hasMoreWork() && !work.hasBootstrapLoadTasks())
+            || (!work.isIncrementalLoad() && !work.hasBootstrapLoadTasks())) {
+      //All repl load tasks are executed and status is 0, create the task to add the acknowledgement
+      ReplLoadCompleteAckWork replLoadCompleteAckWork = new ReplLoadCompleteAckWork(work.dumpDirectory);
+      Task<ReplLoadCompleteAckWork> loadCompleteAckWorkTask = TaskFactory.get(replLoadCompleteAckWork, conf);
+      if (this.childTasks.isEmpty()) {
 
 Review comment:
   No. Its initialised at line 440 for incremental and 86 for bootstrap before this method is called

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r393437622
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
 ##########
 @@ -146,7 +146,7 @@ public int execute() {
         }
         prepareReturnValues(Arrays.asList(currentDumpPath.toUri().toString(), String.valueOf(lastReplId)));
         writeDumpCompleteAck(hiveDumpRoot);
-        deletePreviousDumpMeta(previousDumpMetaPath);
+        deleteAllPreviousDumpMeta(dumpRoot, currentDumpPath);
 
 Review comment:
   But load ..reads all the path and loads only the latest one ..so there is a race condition ..

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392772602
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -584,6 +578,7 @@ private int executeIncrementalLoad() {
         DAGTraversal.traverse(childTasks, new AddDependencyToLeaves(TaskFactory.get(work, conf)));
       }
       this.childTasks = childTasks;
+      createReplLoadCompleteAckTask();
 
 Review comment:
   This should be done only once when no more tasks are there to execute .. "if (builder.hasMoreWork() || work.getPathsToCopyIterator().hasNext() || work.hasBootstrapLoadTasks()) {'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392552354
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
 ##########
 @@ -3262,14 +3286,11 @@ public void testDumpNonReplDatabase() throws IOException {
     verifyFail("REPL DUMP " + dbName, driver);
     assertTrue(run("REPL DUMP " + dbName + " with ('hive.repl.dump.metadata.only' = 'true')",
             true, driver));
+    //Dump again before load will print a warning
     assertTrue(run("REPL DUMP " + dbName + " with ('hive.repl.dump.metadata.only' = 'true')",
             true, driver));
-    run("alter database " + dbName + " set dbproperties ('repl.source.for' = '1, 2, 3')", driver);
-    assertTrue(run("REPL DUMP " + dbName, true, driver));
 
 Review comment:
   This is because only one dump is allowed before load. It has to be alternate. Only after the load acknowledgement is present, the next dump proceeds

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #949: HIVE-22990 Add file based ack for replication
URL: https://github.com/apache/hive/pull/949#discussion_r392037544
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
 ##########
 @@ -2387,39 +2427,33 @@ public void testTruncateWithCM() throws IOException {
     run("INSERT INTO TABLE " + dbName + ".unptned values('" + unptn_data[0] + "')", driver);
     Tuple firstInsert = replDumpDb(dbName);
     Integer numOfEventsIns1 = Integer.valueOf(firstInsert.lastReplId) - Integer.valueOf(replDumpId);
+    // load only first insert (1 record)
+    loadAndVerify(replDbName, dbName, firstInsert.lastReplId);
 
 Review comment:
   This changes the old test case logic. The old test case was expecting the events to be generated and then load. Any specific reason for this change ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org