You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/04/13 12:05:13 UTC

[GitHub] [hive] hmangla98 commented on a diff in pull request #3170: HIVE-25787: Prevent duplicate paths in the fileList while adding an entry to NotifcationLog

hmangla98 commented on code in PR #3170:
URL: https://github.com/apache/hive/pull/3170#discussion_r849404436


##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java:
##########
@@ -5215,7 +5215,7 @@ private static void moveAcidFiles(String deltaFileType, PathFilter pathFilter, F
               bucketDest.toUri().toString());
           try {
             fs.rename(bucketSrc, bucketDest);
-            if (newFiles != null) {
+            if (newFiles != null && !newFiles.contains(bucketDest)) {

Review Comment:
   Actually, Task reattempt generates more than one temporary files and at the time of actual copying of data from temporary to table loc, the destination address is fed into this "newFiles" list multiple times.  As a result, there were multiple files in fileList in TXN_WRITE_NOTIFICAION_LOG. And this is problematic when we initiate distcp from src to tgt and it fails with duplicationFilesException.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org