You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/25 14:55:28 UTC

[GitHub] [hudi] yihua commented on a diff in pull request #5746: [HUDI-4250][HUDI-4202] Optimize performance of Column Stats Index reading in Data Skipping

yihua commented on code in PR #5746:
URL: https://github.com/apache/hudi/pull/5746#discussion_r928972329


##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -166,6 +167,11 @@ public Map<String, List<FileSlice>> listFileSlices() {
         .collect(Collectors.toMap(e -> e.getKey().path, Map.Entry::getValue));
   }
 
+  public int getFileSlicesCount() {
+    return cachedAllInputFileSlices.values().stream()
+        .reduce(0, (count, fileSlices) -> count + fileSlices.size(), Integer::sum);

Review Comment:
   This is not quite obvious.  Do sth like `cachedAllInputFileSlices.values().stream().mapToInt(List::size).sum()` instead?



##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -187,6 +187,18 @@ public final class HoodieMetadataConfig extends HoodieConfig {
       .sinceVersion("0.11.0")
       .withDocumentation("Comma-separated list of columns for which column stats index will be built. If not set, all columns will be indexed");
 
+  public static final String COLUMN_STATS_INDEX_PROCESSING_MODE_IN_MEMORY = "in-memory";
+  public static final String COLUMN_STATS_INDEX_PROCESSING_MODE_SPARK = "spark";

Review Comment:
   nit: this can be enums.



##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -187,6 +187,18 @@ public final class HoodieMetadataConfig extends HoodieConfig {
       .sinceVersion("0.11.0")
       .withDocumentation("Comma-separated list of columns for which column stats index will be built. If not set, all columns will be indexed");
 
+  public static final String COLUMN_STATS_INDEX_PROCESSING_MODE_IN_MEMORY = "in-memory";
+  public static final String COLUMN_STATS_INDEX_PROCESSING_MODE_SPARK = "spark";

Review Comment:
   `COLUMN_STATS_INDEX_PROCESSING_MODE_SPARK`: based on the logic, this mode is more of leveraging engine context, could be Spark or Flink (plain Java), right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org