You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/11/24 23:00:07 UTC

[GitHub] [hudi] satishkotha edited a comment on issue #4109: [SUPPORT] SqlQueryEqualityPreCommitValidator errors with java.util.ConcurrentModificationException

satishkotha edited a comment on issue #4109:
URL: https://github.com/apache/hudi/issues/4109#issuecomment-978407363


   CocurrentModificationException seems to be coming from here 
   https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTablePreCommitFileSystemView.java#L83
   
   We need to redo this logic to avoid newFilesWrittenForPartition.remove(...).  
   
   Simple option to try out: 
   Replace line 72 with
   ```
    Map<String, HoodieBaseFile> newFilesWrittenForPartition  = new ConcurrentHashMap(filesWritten.stream()
           .filter(file -> partitionStr.equals(file.getPartitionPath()))
           .collect(Collectors.toMap(HoodieWriteStat::getFileId, writeStat -> 
               new HoodieBaseFile(new Path(tableMetaClient.getBasePath(), writeStat.getPath()).toString()))))
   ```
   
   Probably better option is to group based on fileId i.e., replace line 78 -88 with:
   This needs some more testing. I can send PR next week.
   
   ```
   Map<String, HoodieBaseFile> baseFilesForCommittedFileIds = committedBaseFiles
           // Remove files replaced by current inflight commit
           .filter(baseFile -> !replacedFileIdsForPartition.contains(baseFile.getFileId()))
           collect(Collectors.toMap(HoodieBaseFile::getFileId, baseFile -> baseFile))
   
   baseFilesForCommittedFileIds.putAll(newFilesWrittenForPartition)
   return baseFilesForCommittedFileIds.values().stream();
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org