You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/04 16:54:38 UTC

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5213: [HUDI-3776] Fix BloomIndex incorrectly using ColStats to lookup record location

alexeykudinkin commented on code in PR #5213:
URL: https://github.com/apache/hudi/pull/5213#discussion_r841953807


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java:
##########
@@ -138,6 +134,28 @@ public HoodieBloomIndex(HoodieWriteConfig config, BaseHoodieBloomIndexHelper blo
         partitionRecordKeyPairs, fileComparisonPairs, partitionToFileInfo, recordsPerPartition);
   }
 
+  private List<Pair<String, BloomIndexFileInfo>> getBloomIndexFileInfoForPartitions(HoodieEngineContext context,
+                                                                                    HoodieTable hoodieTable,
+                                                                                    List<String> affectedPartitionPathList) {
+    List<Pair<String, BloomIndexFileInfo>> fileInfoList = new ArrayList<>();
+
+    if (config.getBloomIndexPruneByRanges()) {
+      // load column ranges from metadata index if column stats index is enabled and column_stats metadata partition is available
+      if (config.isMetadataColumnStatsIndexEnabled()
+          && getCompletedMetadataPartitions(hoodieTable.getMetaClient().getTableConfig()).contains(COLUMN_STATS.getPartitionPath())) {

Review Comment:
   This is still not sufficient: we can not guarantee at any given moment that **all** of the files are indexed in Column Stats, therefore we have to do the same thing we do in Data Skipping:
   
   1. For files that are indexed in Column Stats -- we fetch from there 
   2. For files which are not, we read directly from the file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org