You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "steveloughran (via GitHub)" <gi...@apache.org> on 2023/02/13 18:35:22 UTC

[GitHub] [hadoop] steveloughran commented on a diff in pull request #5251: HADOOP-18582. skip unnecessary cleanup logic in distcp

steveloughran commented on code in PR #5251:
URL: https://github.com/apache/hadoop/pull/5251#discussion_r1104876447


##########
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java:
##########
@@ -152,9 +152,18 @@ public void abortJob(JobContext jobContext,
   }
 
   private void cleanupTempFiles(JobContext context) {
-    try {
-      Configuration conf = context.getConfiguration();
+    Configuration conf = context.getConfiguration();
+
+    final boolean directWrite = conf.getBoolean(
+        DistCpOptionSwitch.DIRECT_WRITE.getConfigLabel(), false);
+    final boolean append = conf.getBoolean(
+        DistCpOptionSwitch.APPEND.getConfigLabel(), false);
+    final boolean useTempTarget = !append && !directWrite;
+    if (!useTempTarget) {

Review Comment:
   Reviewing this, I think the cleanup should only be skipped on direct *not* append.
   
   why so, append will choose between overwrite and append depending on the state of the destination file -a decision made case by case. when doing distcp with -append, new files or those whose dest checksum doesn't match that of the source for the same #of bytes will use a temp file unless -direct is set.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org