You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/02/22 13:52:24 UTC

[GitHub] [hive] haymant1998 opened a new pull request #1978: Hive 24783 Store currentNotificationID on target during repl load operation

haymant1998 opened a new pull request #1978:
URL: https://github.com/apache/hive/pull/1978


   
   ### What changes were proposed in this pull request?
   Storing last notificationID of target cluster while executing repl load command.
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added one test => TestReplicationScenariosAcidTables#testAcidNotification 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] haymant1998 commented on a change in pull request #1978: Hive 24783 Store currentNotificationID on target during repl load operation

Posted by GitBox <gi...@apache.org>.
haymant1998 commented on a change in pull request #1978:
URL: https://github.com/apache/hive/pull/1978#discussion_r577417488



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/NotTask.java
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.repl.dump.Utils;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.Serializable;
+
+public class NotTask extends Task<NotWork> implements Serializable {
+    private static final long serialVersionUID = 1L;
+    private Logger LOG = LoggerFactory.getLogger(AckTask.class);
+
+    @Override
+    public int execute() {
+        try {
+            conf.set(MetastoreConf.ConfVars.TRANSACTIONAL_EVENT_LISTENERS.getHiveName(),
+                    "org.apache.hive.hcatalog.listener.DbNotificationListener"); // turn on db notification listener on metastore
+            HiveMetaStoreClient metaStoreClient = new HiveMetaStoreClient(conf);
+            long currentNotificationID = metaStoreClient.getCurrentNotificationEventId().getEventId();
+            Path notificationPath = work.getNotificationFilePath();
+            Utils.writeOutput(String.valueOf(currentNotificationID), notificationPath, conf);
+            LOG.info("Created Notification file : {} ", notificationPath);
+        } catch (Exception e) {
+            e.printStackTrace();
+        }
+        return 0;
+    }
+
+    @Override
+    public StageType getType() {
+        return StageType.ACK;

Review comment:
       Because _notification_id is created alongside creation of dump ACK file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] haymant1998 closed pull request #1978: Hive 24783 Store currentNotificationID on target during repl load operation

Posted by GitBox <gi...@apache.org>.
haymant1998 closed pull request #1978:
URL: https://github.com/apache/hive/pull/1978


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pkumarsinha commented on a change in pull request #1978: Hive 24783 Store currentNotificationID on target during repl load operation

Posted by GitBox <gi...@apache.org>.
pkumarsinha commented on a change in pull request #1978:
URL: https://github.com/apache/hive/pull/1978#discussion_r577338822



##########
File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/WarehouseInstance.java
##########
@@ -62,8 +62,7 @@
 import org.codehaus.plexus.util.ExceptionUtils;
 import org.slf4j.Logger;
 
-import java.io.Closeable;
-import java.io.IOException;
+import java.io.*;

Review comment:
       Revert this

##########
File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/BaseReplicationScenariosAcidTables.java
##########
@@ -224,6 +224,25 @@ void verifyLoadExecution(String replicatedDbName, String lastReplId, boolean inc
     }
   }
 
+  void verifyBootLoadNotification(String replicatedDbName, String lastReplId, String dumpLocation, boolean includeAcid)
+          throws Throwable {
+    List<String> tableNames = new LinkedList<>(nonAcidTableNames);
+    if (includeAcid) {
+      tableNames.addAll(acidTableNames);
+    }
+    replica.run("use " + replicatedDbName)
+            .run("show tables")
+            .verifyResults(tableNames)
+            .run("repl status " + replicatedDbName)
+            .verifyResult(lastReplId)
+            .verifyReplTargetProperty(replicatedDbName)
+            .verifyNotificationID(dumpLocation);
+    verifyNonAcidTableLoad(replicatedDbName);

Review comment:
       You can assume that acid table is  always there.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##########
@@ -978,6 +987,7 @@ private boolean shouldResumePreviousDump(DumpMetaData dumpMetaData) {
 
   private boolean shouldResumePreviousDump(Path lastDumpPath, boolean isBootStrap) throws IOException {
     if (validDump(lastDumpPath)) {
+      //If lastDumpPath is successful dump don't resume

Review comment:
       remove the comment

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##########
@@ -853,6 +860,7 @@ Long bootStrapDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive hiveDb)
     long waitUntilTime = System.currentTimeMillis() + timeoutInMs;
     String validTxnList = getValidTxnListForReplDump(hiveDb, waitUntilTime);
     Path metadataPath = new Path(dumpRoot, EximUtil.METADATA_PATH_NAME);
+    //metadataPath = TEST_PATH/hrepl/db_name/1/hive/metadata

Review comment:
       remove the comment

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/NotTask.java
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.repl.dump.Utils;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.Serializable;
+
+public class NotTask extends Task<NotWork> implements Serializable {
+    private static final long serialVersionUID = 1L;
+    private Logger LOG = LoggerFactory.getLogger(AckTask.class);
+
+    @Override
+    public int execute() {
+        try {
+            conf.set(MetastoreConf.ConfVars.TRANSACTIONAL_EVENT_LISTENERS.getHiveName(),
+                    "org.apache.hive.hcatalog.listener.DbNotificationListener"); // turn on db notification listener on metastore
+            HiveMetaStoreClient metaStoreClient = new HiveMetaStoreClient(conf);
+            long currentNotificationID = metaStoreClient.getCurrentNotificationEventId().getEventId();
+            Path notificationPath = work.getNotificationFilePath();
+            Utils.writeOutput(String.valueOf(currentNotificationID), notificationPath, conf);
+            LOG.info("Created Notification file : {} ", notificationPath);
+        } catch (Exception e) {
+            e.printStackTrace();
+        }
+        return 0;
+    }
+
+    @Override
+    public StageType getType() {
+        return StageType.ACK;

Review comment:
       Why  is this of ACK type?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##########
@@ -168,7 +169,10 @@ public int execute() {
         }
         //If no previous dump is present or previous dump is already loaded, proceed with the dump operation.
         if (shouldDump(previousValidHiveDumpPath)) {
+        //Should dump checks for whether pVHDP is null or contains _finished load file

Review comment:
       remove all the debug comments

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##########
@@ -870,6 +878,7 @@ Long bootStrapDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive hiveDb)
         if ((db != null) && (ReplUtils.isFirstIncPending(db.getParameters()))) {
           // For replicated (target) database, until after first successful incremental load, the database will not be
           // in a consistent state. Avoid allowing replicating this database to a new target.
+          //This might be case if target db already contain db_name before load.

Review comment:
       remove the comment
   

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##########
@@ -397,6 +402,8 @@ private boolean shouldDump(Path previousDumpPath) throws IOException {
       return true;
     } else {
       FileSystem fs = previousDumpPath.getFileSystem(conf);
+      //Verifies that _finished_load is present in the staging area. If yes,

Review comment:
       Remove the comment

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/NotTask.java
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.ql.ErrorMsg;
+import org.apache.hadoop.hive.ql.exec.Task;
+import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.parse.repl.dump.Utils;
+import org.apache.hadoop.hive.ql.plan.api.StageType;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.Serializable;
+
+public class NotTask extends Task<NotWork> implements Serializable {
+    private static final long serialVersionUID = 1L;
+    private Logger LOG = LoggerFactory.getLogger(AckTask.class);
+
+    @Override
+    public int execute() {
+        try {
+            conf.set(MetastoreConf.ConfVars.TRANSACTIONAL_EVENT_LISTENERS.getHiveName(),

Review comment:
       This is not a runtime config. What happens if  it is not enabled but the              metaStoreClient.getCurrentNotificationEventId() call is made?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
##########
@@ -27,24 +27,7 @@
 import org.apache.hadoop.hive.ql.ddl.DDLWork;
 import org.apache.hadoop.hive.ql.exec.mr.MapRedTask;
 import org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask;
-import org.apache.hadoop.hive.ql.exec.repl.AtlasDumpTask;
-import org.apache.hadoop.hive.ql.exec.repl.AtlasDumpWork;
-import org.apache.hadoop.hive.ql.exec.repl.AtlasLoadTask;
-import org.apache.hadoop.hive.ql.exec.repl.AtlasLoadWork;
-import org.apache.hadoop.hive.ql.exec.repl.ReplDumpTask;
-import org.apache.hadoop.hive.ql.exec.repl.ReplDumpWork;
-import org.apache.hadoop.hive.ql.exec.repl.ReplLoadTask;
-import org.apache.hadoop.hive.ql.exec.repl.ReplLoadWork;
-import org.apache.hadoop.hive.ql.exec.repl.AckTask;
-import org.apache.hadoop.hive.ql.exec.repl.AckWork;
-import org.apache.hadoop.hive.ql.exec.repl.ReplStateLogTask;
-import org.apache.hadoop.hive.ql.exec.repl.ReplStateLogWork;
-import org.apache.hadoop.hive.ql.exec.repl.DirCopyTask;
-import org.apache.hadoop.hive.ql.exec.repl.DirCopyWork;
-import org.apache.hadoop.hive.ql.exec.repl.RangerLoadWork;
-import org.apache.hadoop.hive.ql.exec.repl.RangerLoadTask;
-import org.apache.hadoop.hive.ql.exec.repl.RangerDumpWork;
-import org.apache.hadoop.hive.ql.exec.repl.RangerDumpTask;
+import org.apache.hadoop.hive.ql.exec.repl.*;

Review comment:
       Revert this

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##########
@@ -545,7 +552,7 @@ private Long incrementalDump(Path dumpRoot, DumpMetaData dmd, Path cmRoot, Hive
     // factory per event to decode. For now, however, since all messages have the
     // same factory, restricting by message format is effectively a guard against
     // older leftover data that would cause us problems.
-    work.overrideLastEventToDump(hiveDb, bootDumpBeginReplId);
+    work.overrideLastEventToDump(hiveDb, bootDumpBeginReplId);  //used to specify eventTo variable in work

Review comment:
       remove the comment

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##########
@@ -428,8 +428,8 @@ public static boolean failedWithNonRecoverableError(Path dumpRoot, HiveConf conf
 
   public static Path getEncodedDumpRootPath(HiveConf conf, String dbname) throws UnsupportedEncodingException {
     return new Path(conf.getVar(HiveConf.ConfVars.REPLDIR),
-      Base64.getEncoder().encodeToString(dbname
-        .getBytes(StandardCharsets.UTF_8.name())));
+            Base64.getEncoder().encodeToString(dbname

Review comment:
       You can rather move it to single line

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##########
@@ -341,6 +341,7 @@ a database ( directory )
     // Populate the driver context with the scratch dir info from the repl context, so that the
     // temp dirs will be cleaned up later
     context.getFsScratchDirs().putAll(loadContext.pathInfo.getFsScratchDirs());
+

Review comment:
       Revert this

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpWork.java
##########
@@ -117,7 +117,7 @@ void setEventFrom(long eventId) {
 
   // Override any user specification that changes the last event to be dumped.
   void overrideLastEventToDump(Hive fromDb, long bootstrapLastId) throws Exception {
-    // If we are bootstrapping ACID tables, we need to dump all the events upto the event id at
+    // If we are bootstrapping ACID tables, we need to dump all the events upto maxEventLimitthe event id at

Review comment:
       typo in the  comment

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java
##########
@@ -216,4 +216,5 @@ public Long getDumpExecutionId() {
   public void setExternalTableDataCopyItr(Iterator<String> externalTableDataCopyItr) {
     this.externalTableDataCopyItr = externalTableDataCopyItr;
   }
+

Review comment:
       Revert this

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##########
@@ -654,4 +665,7 @@ private int executeIncrementalLoad() throws Exception {
     createReplLoadCompleteAckTask();
     return 0;
   }
+
 }
+
+

Review comment:
       Revert this

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##########
@@ -986,9 +996,11 @@ private boolean shouldResumePreviousDump(Path lastDumpPath, boolean isBootStrap)
       return false;
     }
     if (isBootStrap) {
+      //Always return false for this case.

Review comment:
       remove the comment.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
##########
@@ -654,4 +665,7 @@ private int executeIncrementalLoad() throws Exception {
     createReplLoadCompleteAckTask();
     return 0;
   }
+

Review comment:
       Revert this

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##########
@@ -986,9 +996,11 @@ private boolean shouldResumePreviousDump(Path lastDumpPath, boolean isBootStrap)
       return false;
     }
     if (isBootStrap) {
+      //Always return false for this case.
       return shouldResumePreviousDump(dumpMetaData);
     }
     // In case of incremental we should resume if _events_dump file is present and is valid
+    //_events_dump file contains last dump event id

Review comment:
       _events_dump maintains last dump id dumped so far in current execution




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] haymant1998 closed pull request #1978: Hive 24783 Store currentNotificationID on target during repl load operation

Posted by GitBox <gi...@apache.org>.
haymant1998 closed pull request #1978:
URL: https://github.com/apache/hive/pull/1978


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org