You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/21 22:01:27 UTC

[GitHub] [hudi] yihua commented on issue #6686: Apache Hudi Consistency issues with glue and marketplace connector

yihua commented on issue #6686:
URL: https://github.com/apache/hudi/issues/6686#issuecomment-1254272868

   @asankadarshana007  The consistency check, when enabled, happens when removing invalid data files: (1) check that all paths to delete exist, (2) delete them, (3) wait for all paths to disappear after eventual consistency.  Note that this logic is not needed for strong consistency.  As the invalid data files are now determined based on the markers, there could be a case where a marker is created, but the data file has not started being written, so that the check (1) fails, which is okay.  Given that there is no use case for the eventual consistency atm, we don't maintain the logic.
   
   Let me know if turning off `hoodie.consistency.check.enabled` solves your problem.  You can close the ticket if all good.
   
   ```
         if (!invalidDataPaths.isEmpty()) {
           LOG.info("Removing duplicate data files created due to task retries before committing. Paths=" + invalidDataPaths);
           Map<String, List<Pair<String, String>>> invalidPathsByPartition = invalidDataPaths.stream()
               .map(dp -> Pair.of(new Path(basePath, dp).getParent().toString(), new Path(basePath, dp).toString()))
               .collect(Collectors.groupingBy(Pair::getKey));
   
           // Ensure all files in delete list is actually present. This is mandatory for an eventually consistent FS.
           // Otherwise, we may miss deleting such files. If files are not found even after retries, fail the commit
           if (consistencyCheckEnabled) {
             // This will either ensure all files to be deleted are present.
             waitForAllFiles(context, invalidPathsByPartition, FileVisibility.APPEAR);
           }
   
           // Now delete partially written files
           context.setJobStatus(this.getClass().getSimpleName(), "Delete all partially written files: " + config.getTableName());
           deleteInvalidFilesByPartitions(context, invalidPathsByPartition);
   
           // Now ensure the deleted files disappear
           if (consistencyCheckEnabled) {
             // This will either ensure all files to be deleted are absent.
             waitForAllFiles(context, invalidPathsByPartition, FileVisibility.DISAPPEAR);
           }
         }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org