You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/06/07 13:04:05 UTC

[GitHub] [hive] pvary opened a new pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

pvary opened a new pull request #2359:
URL: https://github.com/apache/hive/pull/2359


   ### What changes were proposed in this pull request?
   We should use the MoveTask to commit the changes (inserts/insert overwrites)
   
   ### Why are the changes needed?
   MoveTask is used for multiple things, like stat generation. When we removed the Move tasks, we caused several unseen issues. We should reintroduce the MoveTask
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648942046



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##########
@@ -317,6 +320,36 @@ public int execute() {
     }
 
     try (LocalTableLock lock = acquireLockForFileMove(work.getLoadTableWork())) {
+      String storageHandlerClass = null;
+      Properties commitProperties = null;
+      boolean overwrite = false;
+
+      if (work.getLoadTableWork() != null) {
+        // Get the info from the table data
+        TableDesc tableDesc = work.getLoadTableWork().getTable();
+        storageHandlerClass = tableDesc.getProperties().getProperty(

Review comment:
       I think it should not be `null`, otherwise we will have a `NullPointerException` in `EXPLAIN ...` commands: https://github.com/apache/hive/blob/a97448f84167e4e8c3615908556fe2e4163a43ca/ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java#L165-L168




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648947866



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -7316,7 +7316,7 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input)
       destTableId++;
       // Create the work for moving the table
       // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
-      if (!isNonNativeTable) {
+      if (!isNonNativeTable || destinationTable.getStorageHandler().useNativeCommit()) {

Review comment:
       Otherwise we will end up with `PreInsertTableOperation` instead of `MoveTask`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648945402



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##########
@@ -317,6 +320,36 @@ public int execute() {
     }
 
     try (LocalTableLock lock = acquireLockForFileMove(work.getLoadTableWork())) {
+      String storageHandlerClass = null;

Review comment:
       Would it make sense to move this newly-added code into a method? e.g. `boolean checkAndCommitNatively(work)` or something like that, with some javadoc. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648915562



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -265,6 +273,28 @@ public String getFileFormatPropertyKey() {
     return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean useNativeCommit() {
+    return true;
+  }
+
+  @Override
+  public void nativeCommit(Properties commitProperties, boolean overwrite) throws HiveException {
+    String tableName = commitProperties.getProperty(Catalogs.NAME);
+    Configuration configuration = SessionState.getSessionConf();
+    Optional<JobContext> jobContext = getJobContextForCommitOrAbort(configuration, tableName, overwrite);
+    if (jobContext.isPresent()) {
+      try {
+        OutputCommitter committer = new HiveIcebergOutputCommitter();
+        committer.commitJob(jobContext.get());
+      } catch (Throwable e) {
+        LOG.error("Error while trying to commit job, starting rollback", e);
+        rollbackInsertTable(configuration, tableName, overwrite);

Review comment:
       Maybe it's enough to pass in the `jobContext` to this method?  `getJobContextForCommitOrAbort` should give back the same result on the second invocation too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648955441



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##########
@@ -263,4 +264,23 @@ default boolean supportsPartitionTransform() {
   default String getFileFormatPropertyKey() {
     return null;
   }
+
+  /**
+   * Check if we should use the {@link #nativeCommit(Properties, boolean)} method for committing inserts instead of
+   * using file copy in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}-s.
+   * @return
+   */
+  default boolean useNativeCommit() {
+    return false;
+  }
+
+  /**
+   * Commits the inserts for the non-native tables. Used in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}.

Review comment:
       Sounds good.
   Maybe for the first one, `commitWithMoveTask() default true;`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648943168



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -7316,7 +7316,7 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input)
       destTableId++;
       // Create the work for moving the table
       // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
-      if (!isNonNativeTable) {
+      if (!isNonNativeTable || destinationTable.getStorageHandler().useNativeCommit()) {

Review comment:
       Can you give some context why this is needed? Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648916479



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -265,6 +273,28 @@ public String getFileFormatPropertyKey() {
     return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean useNativeCommit() {
+    return true;
+  }
+
+  @Override
+  public void nativeCommit(Properties commitProperties, boolean overwrite) throws HiveException {
+    String tableName = commitProperties.getProperty(Catalogs.NAME);
+    Configuration configuration = SessionState.getSessionConf();
+    Optional<JobContext> jobContext = getJobContextForCommitOrAbort(configuration, tableName, overwrite);
+    if (jobContext.isPresent()) {
+      try {
+        OutputCommitter committer = new HiveIcebergOutputCommitter();
+        committer.commitJob(jobContext.get());
+      } catch (Throwable e) {
+        LOG.error("Error while trying to commit job, starting rollback", e);
+        rollbackInsertTable(configuration, tableName, overwrite);
+        throw new HiveException("Error committing job", e);

Review comment:
       Might be worth including the jobID and tableName into the exception message here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649117229



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -265,6 +273,36 @@ public String getFileFormatPropertyKey() {
     return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean commitInMoveTask() {
+    return true;
+  }
+
+  @Override
+  public void storageHandlerCommit(Properties commitProperties, boolean overwrite) throws HiveException {
+    String tableName = commitProperties.getProperty(Catalogs.NAME);
+    Configuration configuration = SessionState.getSessionConf();
+    Optional<JobContext> jobContext = generateJobContext(configuration, tableName, overwrite);
+    if (jobContext.isPresent()) {
+      OutputCommitter committer = new HiveIcebergOutputCommitter();
+      try {
+        // Committing the job
+        committer.commitJob(jobContext.get());
+      } catch (Throwable e) {
+        // Aborting the job if the commit has failed
+        LOG.error("Error while trying to commit job: {}, starting rollback changes for table: {}",
+            jobContext.get().getJobID(), tableName, e);
+        try {
+          committer.abortJob(jobContext.get(), JobStatus.State.FAILED);
+        } catch (IOException ioe) {
+          LOG.error("Error while trying to abort failed job. There might be uncleaned data files.", ioe);
+          // no throwing here because the original exception should be propagated
+        }
+        throw new HiveException("Error committing job: " + jobContext.get().getJobID() + " for table: " + tableName);

Review comment:
       ehh... always when I try to do things fast, I make stupid mistakes....
   Thanks for catching

##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -265,6 +273,36 @@ public String getFileFormatPropertyKey() {
     return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean commitInMoveTask() {
+    return true;
+  }
+
+  @Override
+  public void storageHandlerCommit(Properties commitProperties, boolean overwrite) throws HiveException {
+    String tableName = commitProperties.getProperty(Catalogs.NAME);
+    Configuration configuration = SessionState.getSessionConf();
+    Optional<JobContext> jobContext = generateJobContext(configuration, tableName, overwrite);
+    if (jobContext.isPresent()) {
+      OutputCommitter committer = new HiveIcebergOutputCommitter();
+      try {
+        // Committing the job

Review comment:
       Removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648954028



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##########
@@ -317,6 +320,36 @@ public int execute() {
     }
 
     try (LocalTableLock lock = acquireLockForFileMove(work.getLoadTableWork())) {
+      String storageHandlerClass = null;
+      Properties commitProperties = null;
+      boolean overwrite = false;
+
+      if (work.getLoadTableWork() != null) {
+        // Get the info from the table data
+        TableDesc tableDesc = work.getLoadTableWork().getTable();
+        storageHandlerClass = tableDesc.getProperties().getProperty(

Review comment:
       Makes sense




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648953881



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -7316,7 +7316,7 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input)
       destTableId++;
       // Create the work for moving the table
       // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
-      if (!isNonNativeTable) {
+      if (!isNonNativeTable || destinationTable.getStorageHandler().useNativeCommit()) {

Review comment:
       Oh I see. Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649092943



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -265,6 +273,36 @@ public String getFileFormatPropertyKey() {
     return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean commitInMoveTask() {
+    return true;
+  }
+
+  @Override
+  public void storageHandlerCommit(Properties commitProperties, boolean overwrite) throws HiveException {
+    String tableName = commitProperties.getProperty(Catalogs.NAME);
+    Configuration configuration = SessionState.getSessionConf();
+    Optional<JobContext> jobContext = generateJobContext(configuration, tableName, overwrite);
+    if (jobContext.isPresent()) {
+      OutputCommitter committer = new HiveIcebergOutputCommitter();
+      try {
+        // Committing the job

Review comment:
       nit: unnecessary comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648937036



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -265,6 +273,28 @@ public String getFileFormatPropertyKey() {
     return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean useNativeCommit() {
+    return true;
+  }
+
+  @Override
+  public void nativeCommit(Properties commitProperties, boolean overwrite) throws HiveException {
+    String tableName = commitProperties.getProperty(Catalogs.NAME);
+    Configuration configuration = SessionState.getSessionConf();
+    Optional<JobContext> jobContext = getJobContextForCommitOrAbort(configuration, tableName, overwrite);
+    if (jobContext.isPresent()) {
+      try {
+        OutputCommitter committer = new HiveIcebergOutputCommitter();
+        committer.commitJob(jobContext.get());
+      } catch (Throwable e) {
+        LOG.error("Error while trying to commit job, starting rollback", e);
+        rollbackInsertTable(configuration, tableName, overwrite);
+        throw new HiveException("Error committing job", e);

Review comment:
       Makes sense 😄 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649019156



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -265,6 +273,28 @@ public String getFileFormatPropertyKey() {
     return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean useNativeCommit() {
+    return true;
+  }
+
+  @Override
+  public void nativeCommit(Properties commitProperties, boolean overwrite) throws HiveException {
+    String tableName = commitProperties.getProperty(Catalogs.NAME);
+    Configuration configuration = SessionState.getSessionConf();
+    Optional<JobContext> jobContext = getJobContextForCommitOrAbort(configuration, tableName, overwrite);
+    if (jobContext.isPresent()) {
+      try {
+        OutputCommitter committer = new HiveIcebergOutputCommitter();
+        committer.commitJob(jobContext.get());
+      } catch (Throwable e) {
+        LOG.error("Error while trying to commit job, starting rollback", e);
+        rollbackInsertTable(configuration, tableName, overwrite);
+        throw new HiveException("Error committing job", e);

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649021325



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##########
@@ -263,4 +264,23 @@ default boolean supportsPartitionTransform() {
   default String getFileFormatPropertyKey() {
     return null;
   }
+
+  /**
+   * Check if we should use the {@link #nativeCommit(Properties, boolean)} method for committing inserts instead of
+   * using file copy in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}-s.
+   * @return
+   */
+  default boolean useNativeCommit() {
+    return false;
+  }
+
+  /**
+   * Commits the inserts for the non-native tables. Used in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}.

Review comment:
       settled on `commitInMoveTask` and `storageHandlerCommit`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648945777



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##########
@@ -263,4 +264,23 @@ default boolean supportsPartitionTransform() {
   default String getFileFormatPropertyKey() {
     return null;
   }
+
+  /**
+   * Check if we should use the {@link #nativeCommit(Properties, boolean)} method for committing inserts instead of
+   * using file copy in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}-s.
+   * @return
+   */
+  default boolean useNativeCommit() {
+    return false;
+  }
+
+  /**
+   * Commits the inserts for the non-native tables. Used in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}.

Review comment:
       What do you think about these names?
   ```
     /**
      * Checks if we should keep the {@link org.apache.hadoop.hive.ql.exec.MoveTask} and use the
      * {@link #storageHandlerCommit(Properties, boolean)} method for committing inserts instead of
      * {@link org.apache.hadoop.hive.metastore.DefaultHiveMetaHook#commitInsertTable(Table, boolean)}.
      * @return Returns true if we should use the {@link #storageHandlerCommit(Properties, boolean)} method
      */
     default boolean keepMoveTask() {
       return false;
     }
   
     /**
      * Commits the inserts for the non-native tables. Used in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}.
      * @param commitProperties Commit properties which are needed for the handler based commit
      * @param overwrite If this is an INSERT OVERWRITE then it is true
      * @throws HiveException If there is an error during commit
      */
     default void storageHandlerCommit(Properties commitProperties, boolean overwrite) throws HiveException {
       throw new UnsupportedOperationException();
     }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary merged pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
pvary merged pull request #2359:
URL: https://github.com/apache/hive/pull/2359


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649019475



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##########
@@ -317,6 +320,36 @@ public int execute() {
     }
 
     try (LocalTableLock lock = acquireLockForFileMove(work.getLoadTableWork())) {
+      String storageHandlerClass = null;
+      Properties commitProperties = null;
+      boolean overwrite = false;
+
+      if (work.getLoadTableWork() != null) {
+        // Get the info from the table data
+        TableDesc tableDesc = work.getLoadTableWork().getTable();
+        storageHandlerClass = tableDesc.getProperties().getProperty(

Review comment:
       Did not change as discussed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649028108



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##########
@@ -317,6 +320,36 @@ public int execute() {
     }
 
     try (LocalTableLock lock = acquireLockForFileMove(work.getLoadTableWork())) {
+      String storageHandlerClass = null;

Review comment:
       Refactored out code




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648926231



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##########
@@ -317,6 +320,36 @@ public int execute() {
     }
 
     try (LocalTableLock lock = acquireLockForFileMove(work.getLoadTableWork())) {
+      String storageHandlerClass = null;
+      Properties commitProperties = null;
+      boolean overwrite = false;
+
+      if (work.getLoadTableWork() != null) {
+        // Get the info from the table data
+        TableDesc tableDesc = work.getLoadTableWork().getTable();
+        storageHandlerClass = tableDesc.getProperties().getProperty(

Review comment:
       Can `tableDesc.getProperties()` be null?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r649091082



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -265,6 +273,36 @@ public String getFileFormatPropertyKey() {
     return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean commitInMoveTask() {
+    return true;
+  }
+
+  @Override
+  public void storageHandlerCommit(Properties commitProperties, boolean overwrite) throws HiveException {
+    String tableName = commitProperties.getProperty(Catalogs.NAME);
+    Configuration configuration = SessionState.getSessionConf();
+    Optional<JobContext> jobContext = generateJobContext(configuration, tableName, overwrite);
+    if (jobContext.isPresent()) {
+      OutputCommitter committer = new HiveIcebergOutputCommitter();
+      try {
+        // Committing the job
+        committer.commitJob(jobContext.get());
+      } catch (Throwable e) {
+        // Aborting the job if the commit has failed
+        LOG.error("Error while trying to commit job: {}, starting rollback changes for table: {}",
+            jobContext.get().getJobID(), tableName, e);
+        try {
+          committer.abortJob(jobContext.get(), JobStatus.State.FAILED);
+        } catch (IOException ioe) {
+          LOG.error("Error while trying to abort failed job. There might be uncleaned data files.", ioe);
+          // no throwing here because the original exception should be propagated
+        }
+        throw new HiveException("Error committing job: " + jobContext.get().getJobID() + " for table: " + tableName);

Review comment:
       Can we include `e` in the exception to get the underlying cause?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] marton-bod commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648941044



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java
##########
@@ -263,4 +264,23 @@ default boolean supportsPartitionTransform() {
   default String getFileFormatPropertyKey() {
     return null;
   }
+
+  /**
+   * Check if we should use the {@link #nativeCommit(Properties, boolean)} method for committing inserts instead of
+   * using file copy in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}-s.
+   * @return
+   */
+  default boolean useNativeCommit() {
+    return false;
+  }
+
+  /**
+   * Commits the inserts for the non-native tables. Used in the {@link org.apache.hadoop.hive.ql.exec.MoveTask}.

Review comment:
       It's a bit strange that we use the term native commit but only for non-native tables :) But to be honest, I couldn't find a better name yet either, because `nativeCommit` still kinda makes sense :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #2359: HIVE-25208: Refactor Iceberg commit to the MoveTask/MoveWork

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2359:
URL: https://github.com/apache/hive/pull/2359#discussion_r648936862



##########
File path: iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
##########
@@ -265,6 +273,28 @@ public String getFileFormatPropertyKey() {
     return TableProperties.DEFAULT_FILE_FORMAT;
   }
 
+  @Override
+  public boolean useNativeCommit() {
+    return true;
+  }
+
+  @Override
+  public void nativeCommit(Properties commitProperties, boolean overwrite) throws HiveException {
+    String tableName = commitProperties.getProperty(Catalogs.NAME);
+    Configuration configuration = SessionState.getSessionConf();
+    Optional<JobContext> jobContext = getJobContextForCommitOrAbort(configuration, tableName, overwrite);
+    if (jobContext.isPresent()) {
+      try {
+        OutputCommitter committer = new HiveIcebergOutputCommitter();
+        committer.commitJob(jobContext.get());
+      } catch (Throwable e) {
+        LOG.error("Error while trying to commit job, starting rollback", e);
+        rollbackInsertTable(configuration, tableName, overwrite);

Review comment:
       Ok. Did the full refactoring then




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org