You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/18 13:28:12 UTC

[GitHub] [hudi] codope opened a new pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

codope opened a new pull request #4848:
URL: https://github.com/apache/hudi/pull/4848


   ## What is the purpose of the pull request
   
   Rework of #4761 
   This diff introduces follinwg changes:
   
   - Write stats are converted to metadata index records during the commit. Making them use the HoodieData type so that the record generation scales up with needs. 
   - Metadata index init support for bloom filter and column stats partitions.
   - When building the BloomFilter from the index records, using the type param stored in the payload instead of hardcoded type.
   - Delta writes can change column ranges and the column stats index need to be properly updated with new ranges to be consistent with the table dataset. This fix add column stats index update support for the delta writes.
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060242046


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 21dc93b754e84a414e239a6854fce1195267143f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608) 
   * ff1f746fc4826a6432ec2078ae3e6c8536a038f1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1061373965


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621",
       "triggerID" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fa193b7961e309d335cb24f5d35102bfa80111a7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6675",
       "triggerID" : "fa193b7961e309d335cb24f5d35102bfa80111a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff1f746fc4826a6432ec2078ae3e6c8536a038f1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621) 
   * fa193b7961e309d335cb24f5d35102bfa80111a7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6675) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r816955786



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -320,7 +325,48 @@ private void updateWriteStatus(HoodieDeltaWriteStat stat, AppendResult result) {
     statuses.add(this.writeStatus);
   }
 
-  private void processAppendResult(AppendResult result) {
+  /**
+   * Compute column statistics for the records part of this append handle.
+   *
+   * @param filePath       - Log file that records are part of
+   * @param recordList     - List of records appended to the log for which column statistics is needed for
+   * @param columnRangeMap - Output map to accumulate the column statistics for the records
+   */
+  private void computeRecordsStats(final String filePath, List<IndexedRecord> recordList,
+                                   Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap) {
+    recordList.forEach(record -> accumulateColumnRanges(record, writeSchemaWithMetaFields, filePath, columnRangeMap, config.isConsistentLogicalTimestampEnabled()));
+  }
+
+  /**
+   * Accumulate column range statistics for the requested record.
+   *
+   * @param record   - Record to get the column range statistics for
+   * @param schema   - Schema for the record
+   * @param filePath - File that record belongs to
+   */
+  private static void accumulateColumnRanges(IndexedRecord record, Schema schema, String filePath,
+          Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap, boolean consistentLogicalTimestampEnabled) {
+    if (!(record instanceof GenericRecord)) {
+      throw new HoodieIOException("Record is not a generic type to get column range metadata!");
+    }
+    schema.getFields().forEach(field -> {
+      final String fieldVal = HoodieAvroUtils.getNestedFieldValAsString((GenericRecord) record, field.name(), true, consistentLogicalTimestampEnabled);
+      final int fieldSize = fieldVal == null ? 0 : fieldVal.length();
+      final HoodieColumnRangeMetadata<Comparable> fieldRange = new HoodieColumnRangeMetadata<>(

Review comment:
       Good point! Have changed the code accordingly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r818226210



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -922,4 +952,39 @@ public static int getPartitionFileGroupCount(final MetadataPartitionType partiti
     }
   }
 
+  /**
+   * Computes column range metadata
+   *
+   * @param recordList                        - list of records from which column range statistics will be computed
+   * @param field                             - column name for which statistics will be computed
+   * @param filePath                          - data file path
+   * @param columnRangeMap                    - old column range statistics, which will be merged in this computation
+   * @param consistentLogicalTimestampEnabled - flag to deal with logical timestamp type when getting column value
+   */
+  public static void accumulateColumnRanges(List<IndexedRecord> recordList, Schema.Field field, String filePath,

Review comment:
       I see this is getting called from HoodieAppendHandle and we call it for every field/column.
   
   ie
   for every field  -> accumulatecolumnRanges { iterate through every record and find cols stats  }
   
   Since this is avro/row based format, why can't we collect stats for fields/cols at once per record and keep iterating through every record to eventually find col stats for all fields. 
   
   essentially we are doing a columnar read across records for N no of columns. I am proposing if we can flip that to read entire record, fetch stats for all cols and proceed to next record and don't need to come back to this record again. 
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048606918


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 125d2cd385219cb9187e0ce6ac90b00cfea863fc Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229) 
   * 48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r816949687



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieKeyLookupHandle.java
##########
@@ -53,56 +45,23 @@
 
   private final BloomFilter bloomFilter;
   private final List<String> candidateRecordKeys;
-  private final boolean useMetadataTableIndex;
-  private Option<String> fileName = Option.empty();
   private long totalKeysChecked;
 
   public HoodieKeyLookupHandle(HoodieWriteConfig config, HoodieTable<T, I, K, O> hoodieTable,
                                Pair<String, String> partitionPathFileIDPair) {
-    this(config, hoodieTable, partitionPathFileIDPair, Option.empty(), false);
-  }
-
-  public HoodieKeyLookupHandle(HoodieWriteConfig config, HoodieTable<T, I, K, O> hoodieTable,
-                               Pair<String, String> partitionPathFileIDPair, Option<String> fileName,
-                               boolean useMetadataTableIndex) {
     super(config, hoodieTable, partitionPathFileIDPair);
     this.candidateRecordKeys = new ArrayList<>();
     this.totalKeysChecked = 0;
-    if (fileName.isPresent()) {
-      ValidationUtils.checkArgument(FSUtils.getFileId(fileName.get()).equals(getFileId()),
-          "File name '" + fileName.get() + "' doesn't match this lookup handle fileid '" + getFileId() + "'");
-      this.fileName = fileName;
-    }
-    this.useMetadataTableIndex = useMetadataTableIndex;
     this.bloomFilter = getBloomFilter();
   }
 
   private BloomFilter getBloomFilter() {
-    BloomFilter bloomFilter = null;
-    HoodieTimer timer = new HoodieTimer().startTimer();
-    try {
-      if (this.useMetadataTableIndex) {

Review comment:
       Good catch. The intention was to get rid of private field and redundant constructor. Instead use the write config. I have fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1044532819


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63aac434acbbbbd15223dc186635f963e97367e9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r810417741



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -339,6 +385,13 @@ private void processAppendResult(AppendResult result) {
       updateWriteStatus(stat, result);
     }
 
+    if (config.isMetadataIndexColumnStatsForAllColumnsEnabled()) {
+      Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap = stat.getRecordsStats().isPresent()
+              ? stat.getRecordsStats().get().getStats() : new HashMap<>();
+      getRecordsStats(stat.getPath(), recordList, columnRangeMap);

Review comment:
       since this is already happening from within a single task, we can't leverage spark parallelism here I guess. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048543998


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 19ba560542a8769475948561e2b607f85f70b548 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222) 
   * 125d2cd385219cb9187e0ce6ac90b00cfea863fc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060127358


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ffcc639ed1eb64000395e967f4ce57b4ae0c68e2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551) 
   * 21dc93b754e84a414e239a6854fce1195267143f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060382416


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621",
       "triggerID" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff1f746fc4826a6432ec2078ae3e6c8536a038f1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r820839962



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -831,7 +828,7 @@ public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient
    * @param datasetMetaClient                   - Data table meta client
    * @param isMetaIndexColumnStatsForAllColumns - Is column stats indexing enabled for all columns
    */
-  private static List<String> getLatestColumns(HoodieTableMetaClient datasetMetaClient, boolean isMetaIndexColumnStatsForAllColumns) {
+  private static List<String> getColumnsToIndex(HoodieTableMetaClient datasetMetaClient, boolean isMetaIndexColumnStatsForAllColumns) {

Review comment:
       I did not create a separate jira.. This is already being tracked in https://issues.apache.org/jira/browse/HUDI-3411




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r815594374



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieDeltaWriteStat.java
##########
@@ -69,4 +73,24 @@ public void addLogFiles(String logFile) {
   public List<String> getLogFiles() {
     return logFiles;
   }
+
+  public void setRecordsStats(RecordsStats<? extends Map> stats) {
+    recordsStats = Option.of(stats);
+  }
+
+  public Option<RecordsStats<? extends Map>> getRecordsStats() {
+    return recordsStats;
+  }
+
+  public static class RecordsStats<T> implements Serializable {

Review comment:
       Wrapper abstracts away the underlying metadata. I think write stat should be aware that it saves the record stats but not necessarily what those stats are composed of. Are you concerned about serde cost here? It shouldn't add much overhead over keeping it as a prvate field.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1055615217


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bf80ef66675695d0cbc6eff541226e09567b6e51 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379) 
   * 6a772a7709b577db7afddefb86a1ccd62a75269c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r815607975



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -339,6 +385,13 @@ private void processAppendResult(AppendResult result) {
       updateWriteStatus(stat, result);
     }
 
+    if (config.isMetadataIndexColumnStatsForAllColumnsEnabled()) {
+      Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap = stat.getRecordsStats().isPresent()

Review comment:
       We actually need `getRecordStats().get().getStats()`, hence the `isPresent` check. `getRecordStats().getOrElse(() -> new HashMap())` will only return the `RecordStats` from which we will have to call `getStats()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048722584


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060127358


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ffcc639ed1eb64000395e967f4ce57b4ae0c68e2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551) 
   * 21dc93b754e84a414e239a6854fce1195267143f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1059353304


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a772a7709b577db7afddefb86a1ccd62a75269c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442) 
   * ffcc639ed1eb64000395e967f4ce57b4ae0c68e2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1044532819


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63aac434acbbbbd15223dc186635f963e97367e9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1047369471


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63aac434acbbbbd15223dc186635f963e97367e9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124) 
   * 97f253e3d9ef2c8caf05810d42e5f54e7598d4de Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r813091690



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -187,94 +178,90 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
   /**
    * Convert commit action metadata to bloom filter records.
    *
-   * @param commitMetadata - Commit action metadata
-   * @param dataMetaClient - Meta client for the data table
-   * @param instantTime    - Action instant time
-   * @return List of metadata table records
+   * @param context                 - Engine context to use
+   * @param commitMetadata          - Commit action metadata
+   * @param instantTime             - Action instant time
+   * @param recordsGenerationParams - Parameters for bloom filter record generation
+   * @return HoodieData of metadata table records
    */
-  public static List<HoodieRecord> convertMetadataToBloomFilterRecords(HoodieCommitMetadata commitMetadata,
-                                                                       HoodieTableMetaClient dataMetaClient,
-                                                                       String instantTime) {
-    List<HoodieRecord> records = new LinkedList<>();
-    commitMetadata.getPartitionToWriteStats().forEach((partitionStatName, writeStats) -> {
-      final String partition = partitionStatName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionStatName;
-      Map<String, Long> newFiles = new HashMap<>(writeStats.size());
-      writeStats.forEach(hoodieWriteStat -> {
-        // No action for delta logs
-        if (hoodieWriteStat instanceof HoodieDeltaWriteStat) {
-          return;
-        }
+  public static HoodieData<HoodieRecord> convertMetadataToBloomFilterRecords(
+      HoodieEngineContext context, HoodieCommitMetadata commitMetadata,
+      String instantTime, MetadataRecordsGenerationParams recordsGenerationParams) {
+    final List<HoodieWriteStat> allWriteStats = commitMetadata.getPartitionToWriteStats().values().stream()
+        .flatMap(entry -> entry.stream()).collect(Collectors.toList());
+    if (allWriteStats.isEmpty()) {
+      return context.emptyHoodieData();
+    }
 
-        String pathWithPartition = hoodieWriteStat.getPath();
-        if (pathWithPartition == null) {
-          // Empty partition
-          LOG.error("Failed to find path in write stat to update metadata table " + hoodieWriteStat);
-          return;
-        }
-        int offset = partition.equals(NON_PARTITIONED_NAME) ? (pathWithPartition.startsWith("/") ? 1 : 0) :
-            partition.length() + 1;
+    HoodieData<HoodieWriteStat> allWriteStatsRDD = context.parallelize(allWriteStats,
+        Math.max(recordsGenerationParams.getBloomIndexParallelism(), allWriteStats.size()));

Review comment:
       The thing is bloom index parallelism is 0 by default. If we can make it non-zero then maybe we can do max otherwise the subsequent step fails.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048480190


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 97f253e3d9ef2c8caf05810d42e5f54e7598d4de Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187) 
   * 19ba560542a8769475948561e2b607f85f70b548 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1044536372


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63aac434acbbbbd15223dc186635f963e97367e9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048481656


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 97f253e3d9ef2c8caf05810d42e5f54e7598d4de Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187) 
   * 19ba560542a8769475948561e2b607f85f70b548 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222) 
   * 125d2cd385219cb9187e0ce6ac90b00cfea863fc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1055690729


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a772a7709b577db7afddefb86a1ccd62a75269c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060126050


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ffcc639ed1eb64000395e967f4ce57b4ae0c68e2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551) 
   * 21dc93b754e84a414e239a6854fce1195267143f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1059423441


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ffcc639ed1eb64000395e967f4ce57b4ae0c68e2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1061372853


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621",
       "triggerID" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fa193b7961e309d335cb24f5d35102bfa80111a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fa193b7961e309d335cb24f5d35102bfa80111a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff1f746fc4826a6432ec2078ae3e6c8536a038f1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621) 
   * fa193b7961e309d335cb24f5d35102bfa80111a7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1061372853


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621",
       "triggerID" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fa193b7961e309d335cb24f5d35102bfa80111a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fa193b7961e309d335cb24f5d35102bfa80111a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff1f746fc4826a6432ec2078ae3e6c8536a038f1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621) 
   * fa193b7961e309d335cb24f5d35102bfa80111a7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1061410266


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621",
       "triggerID" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fa193b7961e309d335cb24f5d35102bfa80111a7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6675",
       "triggerID" : "fa193b7961e309d335cb24f5d35102bfa80111a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fa193b7961e309d335cb24f5d35102bfa80111a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6675) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048483132


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 19ba560542a8769475948561e2b607f85f70b548 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222) 
   * 125d2cd385219cb9187e0ce6ac90b00cfea863fc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r815600163



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -347,64 +348,62 @@ public void initTableMetadata() {
   /**
    * Bootstrap the metadata table if needed.
    *
-   * @param engineContext  - Engine context
-   * @param dataMetaClient - Meta client for the data table
-   * @param actionMetadata - Optional action metadata
-   * @param <T>            - Action metadata types extending Avro generated SpecificRecordBase
+   * @param dataMetaClient           - Meta client for the data table

Review comment:
       No. Will fix it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1053955444


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234) 
   * bf80ef66675695d0cbc6eff541226e09567b6e51 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060126050


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ffcc639ed1eb64000395e967f4ce57b4ae0c68e2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551) 
   * 21dc93b754e84a414e239a6854fce1195267143f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1059350931


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a772a7709b577db7afddefb86a1ccd62a75269c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442) 
   * ffcc639ed1eb64000395e967f4ce57b4ae0c68e2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1059423441


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ffcc639ed1eb64000395e967f4ce57b4ae0c68e2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1061329026


   @codope : am good with the patch. Can you rebase w/ latest master. we can land once CI is green. sorry, lets get this landed by tomorrow. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1061373965


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621",
       "triggerID" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fa193b7961e309d335cb24f5d35102bfa80111a7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6675",
       "triggerID" : "fa193b7961e309d335cb24f5d35102bfa80111a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff1f746fc4826a6432ec2078ae3e6c8536a038f1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621) 
   * fa193b7961e309d335cb24f5d35102bfa80111a7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6675) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r813474629



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -339,6 +385,13 @@ private void processAppendResult(AppendResult result) {
       updateWriteStatus(stat, result);
     }
 
+    if (config.isMetadataIndexColumnStatsForAllColumnsEnabled()) {

Review comment:
       Why is this check so specific to whether all columns are enabled? It's ok if we don't handle the use-case of collecting stats for subset of columns for now (since we don't have config for it) and leave a TODO here, but i don't think we need to be so specific in this check

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -320,7 +325,48 @@ private void updateWriteStatus(HoodieDeltaWriteStat stat, AppendResult result) {
     statuses.add(this.writeStatus);
   }
 
-  private void processAppendResult(AppendResult result) {
+  /**
+   * Compute column statistics for the records part of this append handle.
+   *
+   * @param filePath       - Log file that records are part of
+   * @param recordList     - List of records appended to the log for which column statistics is needed for
+   * @param columnRangeMap - Output map to accumulate the column statistics for the records
+   */
+  private void computeRecordsStats(final String filePath, List<IndexedRecord> recordList,
+                                   Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap) {
+    recordList.forEach(record -> accumulateColumnRanges(record, writeSchemaWithMetaFields, filePath, columnRangeMap, config.isConsistentLogicalTimestampEnabled()));
+  }
+
+  /**
+   * Accumulate column range statistics for the requested record.
+   *
+   * @param record   - Record to get the column range statistics for
+   * @param schema   - Schema for the record
+   * @param filePath - File that record belongs to
+   */
+  private static void accumulateColumnRanges(IndexedRecord record, Schema schema, String filePath,
+          Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap, boolean consistentLogicalTimestampEnabled) {
+    if (!(record instanceof GenericRecord)) {
+      throw new HoodieIOException("Record is not a generic type to get column range metadata!");
+    }
+    schema.getFields().forEach(field -> {
+      final String fieldVal = HoodieAvroUtils.getNestedFieldValAsString((GenericRecord) record, field.name(), true, consistentLogicalTimestampEnabled);
+      final int fieldSize = fieldVal == null ? 0 : fieldVal.length();
+      final HoodieColumnRangeMetadata<Comparable> fieldRange = new HoodieColumnRangeMetadata<>(

Review comment:
       I don't think it's a good idea to create new `HoodieColumnRangeMetadata` object for every record (it's a pretty large object.
   
   Instead, for every field we can iterate over all records computing metrics locally (on the stack, in local vars) then populate `HoodieColumnRangeMetadata` once

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -848,41 +857,40 @@ public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient
     }
   }
 
-  private static List<String> getLatestColumns(HoodieTableMetaClient datasetMetaClient) {
-    return getLatestColumns(datasetMetaClient, false);
-  }
-
   public static Stream<HoodieRecord> translateWriteStatToColumnStats(HoodieWriteStat writeStat,
                                                                      HoodieTableMetaClient datasetMetaClient,
-                                                                     List<String> latestColumns) {
-    return getColumnStats(writeStat.getPartitionPath(), writeStat.getPath(), datasetMetaClient, latestColumns, false);
-
+                                                                     List<String> columnsToIndex) {
+    if (writeStat instanceof HoodieDeltaWriteStat && ((HoodieDeltaWriteStat) writeStat).getRecordsStats().isPresent()) {
+      Option<Map<String, HoodieColumnRangeMetadata<Comparable>>> columnRangeMap =

Review comment:
       What's the point of Option here?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieDeltaWriteStat.java
##########
@@ -69,4 +73,24 @@ public void addLogFiles(String logFile) {
   public List<String> getLogFiles() {
     return logFiles;
   }
+
+  public void setRecordsStats(RecordsStats<? extends Map> stats) {
+    recordsStats = Option.of(stats);
+  }
+
+  public Option<RecordsStats<? extends Map>> getRecordsStats() {
+    return recordsStats;
+  }
+
+  public static class RecordsStats<T> implements Serializable {

Review comment:
       What do we need this wrapper for? 
   

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -320,7 +325,48 @@ private void updateWriteStatus(HoodieDeltaWriteStat stat, AppendResult result) {
     statuses.add(this.writeStatus);
   }
 
-  private void processAppendResult(AppendResult result) {
+  /**
+   * Compute column statistics for the records part of this append handle.
+   *
+   * @param filePath       - Log file that records are part of
+   * @param recordList     - List of records appended to the log for which column statistics is needed for
+   * @param columnRangeMap - Output map to accumulate the column statistics for the records
+   */
+  private void computeRecordsStats(final String filePath, List<IndexedRecord> recordList,
+                                   Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap) {
+    recordList.forEach(record -> accumulateColumnRanges(record, writeSchemaWithMetaFields, filePath, columnRangeMap, config.isConsistentLogicalTimestampEnabled()));

Review comment:
       I think we can inline this method given it's oneliner, and is not used anywhere else

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -330,78 +322,67 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
       });
     });
 
-    return engineContext.map(deleteFileList, deleteFileInfo -> {
-      return HoodieMetadataPayload.createBloomFilterMetadataRecord(
-          deleteFileInfo.getLeft(), deleteFileInfo.getRight(), instantTime, ByteBuffer.allocate(0), true);
-    }, 1).stream().collect(Collectors.toList());
+    HoodieData<Pair<String, String>> deleteFileListRDD = engineContext.parallelize(deleteFileList,

Review comment:
       Do we really need to cast this action t/h RDD? Do we envision that this will scale past the point when we won't be able to handle this on the driver? 
   
   I'm worried about serialization cost we incur for every record we handle t/h RDD (serializing/de closure) to be able to create a single object

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -169,6 +169,12 @@
           + "store the column ranges and will be used for pruning files during the index lookups. "
           + "Only applies if " + ENABLE_METADATA_INDEX_COLUMN_STATS.key() + " is enabled.");
 
+  public static final ConfigProperty<Integer> COLUMN_STATS_INDEX_PARALLELISM = ConfigProperty

Review comment:
       This seems to be too low-level lever to expose as config:
   
    - If we want to determine optimal parallelism we should use # of cores as a proxy
    - If we want this to be a cap on how much parallelism we allow, we should rename it accordingly (and also generalize to cover all of Metadata activities)

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -347,64 +348,62 @@ public void initTableMetadata() {
   /**
    * Bootstrap the metadata table if needed.
    *
-   * @param engineContext  - Engine context
-   * @param dataMetaClient - Meta client for the data table
-   * @param actionMetadata - Optional action metadata
-   * @param <T>            - Action metadata types extending Avro generated SpecificRecordBase
+   * @param dataMetaClient           - Meta client for the data table

Review comment:
       This seems off. Was this intentional?
   

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java
##########
@@ -121,10 +123,28 @@
   private HoodieMetadataBloomFilter bloomFilterMetadata = null;
   private HoodieMetadataColumnStats columnStatMetadata = null;
 
+  public static final BiFunction<HoodieMetadataColumnStats, HoodieMetadataColumnStats, HoodieMetadataColumnStats> COLUMN_STATS_MERGE_FUNCTION =

Review comment:
       Let's extract this to `HoodieMetadatUtil` as just a normal function. There's not much value in maintaining it as static constant

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieKeyLookupHandle.java
##########
@@ -53,56 +45,23 @@
 
   private final BloomFilter bloomFilter;
   private final List<String> candidateRecordKeys;
-  private final boolean useMetadataTableIndex;
-  private Option<String> fileName = Option.empty();
   private long totalKeysChecked;
 
   public HoodieKeyLookupHandle(HoodieWriteConfig config, HoodieTable<T, I, K, O> hoodieTable,
                                Pair<String, String> partitionPathFileIDPair) {
-    this(config, hoodieTable, partitionPathFileIDPair, Option.empty(), false);
-  }
-
-  public HoodieKeyLookupHandle(HoodieWriteConfig config, HoodieTable<T, I, K, O> hoodieTable,
-                               Pair<String, String> partitionPathFileIDPair, Option<String> fileName,
-                               boolean useMetadataTableIndex) {
     super(config, hoodieTable, partitionPathFileIDPair);
     this.candidateRecordKeys = new ArrayList<>();
     this.totalKeysChecked = 0;
-    if (fileName.isPresent()) {
-      ValidationUtils.checkArgument(FSUtils.getFileId(fileName.get()).equals(getFileId()),
-          "File name '" + fileName.get() + "' doesn't match this lookup handle fileid '" + getFileId() + "'");
-      this.fileName = fileName;
-    }
-    this.useMetadataTableIndex = useMetadataTableIndex;
     this.bloomFilter = getBloomFilter();
   }
 
   private BloomFilter getBloomFilter() {
-    BloomFilter bloomFilter = null;
-    HoodieTimer timer = new HoodieTimer().startTimer();
-    try {
-      if (this.useMetadataTableIndex) {

Review comment:
       Can you please help me understand why this is changing?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -203,7 +204,7 @@ private void enablePartitions() {
    * @param metadataConfig       - Table config
    * @param metaClient           - Meta client for the metadata table
    * @param fsView               - Metadata table filesystem view to use
-   * @param isBootstrapCompleted - Is metadata table bootstrap completed
+   * @param isBootstrapCompleted - Is metadata table initialize completed

Review comment:
       nit: correct form would be "initializing"

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -339,6 +385,13 @@ private void processAppendResult(AppendResult result) {
       updateWriteStatus(stat, result);
     }
 
+    if (config.isMetadataIndexColumnStatsForAllColumnsEnabled()) {
+      Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap = stat.getRecordsStats().isPresent()

Review comment:
       You can do `getRecordStats().getOrElse(() -> new HashMap())`

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -330,78 +322,67 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
       });
     });
 
-    return engineContext.map(deleteFileList, deleteFileInfo -> {
-      return HoodieMetadataPayload.createBloomFilterMetadataRecord(
-          deleteFileInfo.getLeft(), deleteFileInfo.getRight(), instantTime, ByteBuffer.allocate(0), true);
-    }, 1).stream().collect(Collectors.toList());
+    HoodieData<Pair<String, String>> deleteFileListRDD = engineContext.parallelize(deleteFileList,
+        Math.max(deleteFileList.size(), recordsGenerationParams.getBloomIndexParallelism()));
+    return deleteFileListRDD.map(deleteFileInfo -> HoodieMetadataPayload.createBloomFilterMetadataRecord(
+        deleteFileInfo.getLeft(), deleteFileInfo.getRight(), instantTime, StringUtils.EMPTY_STRING,
+        ByteBuffer.allocate(0), true));
   }
 
   /**
    * Convert clean metadata to column stats index records.
    *
-   * @param cleanMetadata     - Clean action metadata
-   * @param engineContext     - Engine context
-   * @param datasetMetaClient - data table meta client
+   * @param cleanMetadata           - Clean action metadata
+   * @param engineContext           - Engine context
+   * @param recordsGenerationParams - Parameters for bloom filter record generation
    * @return List of column stats index records for the clean metadata
    */
-  public static List<HoodieRecord> convertMetadataToColumnStatsRecords(HoodieCleanMetadata cleanMetadata,
-                                                                       HoodieEngineContext engineContext,
-                                                                       HoodieTableMetaClient datasetMetaClient) {
+  public static HoodieData<HoodieRecord> convertMetadataToColumnStatsRecords(HoodieCleanMetadata cleanMetadata,
+                                                                             HoodieEngineContext engineContext,
+                                                                             MetadataRecordsGenerationParams recordsGenerationParams) {
     List<Pair<String, String>> deleteFileList = new ArrayList<>();
     cleanMetadata.getPartitionMetadata().forEach((partition, partitionMetadata) -> {
       // Files deleted from a partition
       List<String> deletedFiles = partitionMetadata.getDeletePathPatterns();
       deletedFiles.forEach(entry -> deleteFileList.add(Pair.of(partition, entry)));
     });
 
-    List<String> latestColumns = getLatestColumns(datasetMetaClient);
-    return engineContext.flatMap(deleteFileList,
-        deleteFileInfo -> {
-          if (deleteFileInfo.getRight().endsWith(HoodieFileFormat.PARQUET.getFileExtension())) {
-            return getColumnStats(deleteFileInfo.getKey(), deleteFileInfo.getValue(), datasetMetaClient,
-                latestColumns, true);
-          }
-          return Stream.empty();
-        }, 1).stream().collect(Collectors.toList());
+    final List<String> columnsToIndex = getColumnsToIndex(recordsGenerationParams.getDataMetaClient(), recordsGenerationParams.isAllColumnStatsIndexEnabled());
+    HoodieData<Pair<String, String>> deleteFileListRDD = engineContext.parallelize(deleteFileList,
+        Math.max(deleteFileList.size(), recordsGenerationParams.getColumnStatsIndexParallelism()));
+    return deleteFileListRDD.flatMap(deleteFileInfo -> {
+      if (deleteFileInfo.getRight().endsWith(HoodieFileFormat.PARQUET.getFileExtension())) {

Review comment:
       Let's use Pair API consistently (either getKey/Value or Left/Right), it's quite confusing to see them mixed

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -330,78 +322,67 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
       });
     });
 
-    return engineContext.map(deleteFileList, deleteFileInfo -> {
-      return HoodieMetadataPayload.createBloomFilterMetadataRecord(
-          deleteFileInfo.getLeft(), deleteFileInfo.getRight(), instantTime, ByteBuffer.allocate(0), true);
-    }, 1).stream().collect(Collectors.toList());
+    HoodieData<Pair<String, String>> deleteFileListRDD = engineContext.parallelize(deleteFileList,
+        Math.max(deleteFileList.size(), recordsGenerationParams.getBloomIndexParallelism()));
+    return deleteFileListRDD.map(deleteFileInfo -> HoodieMetadataPayload.createBloomFilterMetadataRecord(

Review comment:
       Let's create common override for this method (it seems to be used in 3 more places at least)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060159124


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 21dc93b754e84a414e239a6854fce1195267143f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060243355


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621",
       "triggerID" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 21dc93b754e84a414e239a6854fce1195267143f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608) 
   * ff1f746fc4826a6432ec2078ae3e6c8536a038f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060159124


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 21dc93b754e84a414e239a6854fce1195267143f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r820837156



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -799,30 +824,20 @@ public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient
   /**
    * Create column stats from write status.
    *
-   * @param engineContext                       - Engine context
-   * @param datasetMetaClient                   - Dataset meta client
-   * @param allWriteStats                       - Write status to convert
-   * @param isMetaIndexColumnStatsForAllColumns - Are all columns enabled for indexing
+   * @param engineContext           - Engine context
+   * @param allWriteStats           - Write status to convert
+   * @param recordsGenerationParams - Parameters for columns stats record generation
    */
-  public static List<HoodieRecord> createColumnStatsFromWriteStats(HoodieEngineContext engineContext,
-                                                                   HoodieTableMetaClient datasetMetaClient,
-                                                                   List<HoodieWriteStat> allWriteStats,
-                                                                   boolean isMetaIndexColumnStatsForAllColumns) throws Exception {
+  public static HoodieData<HoodieRecord> createColumnStatsFromWriteStats(HoodieEngineContext engineContext,
+                                                                         List<HoodieWriteStat> allWriteStats,
+                                                                         MetadataRecordsGenerationParams recordsGenerationParams) {
     if (allWriteStats.isEmpty()) {
-      return Collections.emptyList();
-    }
-
-    List<HoodieWriteStat> prunedWriteStats = allWriteStats.stream().filter(writeStat -> {
-      return !(writeStat instanceof HoodieDeltaWriteStat);
-    }).collect(Collectors.toList());
-    if (prunedWriteStats.isEmpty()) {
-      return Collections.emptyList();
+      return engineContext.emptyHoodieData();
     }
-
-    return engineContext.flatMap(prunedWriteStats,
-        writeStat -> translateWriteStatToColumnStats(writeStat, datasetMetaClient,
-            getLatestColumns(datasetMetaClient, isMetaIndexColumnStatsForAllColumns)),
-        prunedWriteStats.size());
+    HoodieData<HoodieWriteStat> allWriteStatsRDD = engineContext.parallelize(
+        allWriteStats, Math.max(allWriteStats.size(), recordsGenerationParams.getColumnStatsIndexParallelism()));
+    return allWriteStatsRDD.flatMap(writeStat -> translateWriteStatToColumnStats(writeStat, recordsGenerationParams.getDataMetaClient(),
+        getColumnsToIndex(recordsGenerationParams.getDataMetaClient(), recordsGenerationParams.isAllColumnStatsIndexEnabled())).iterator());

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048480190


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 97f253e3d9ef2c8caf05810d42e5f54e7598d4de Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187) 
   * 19ba560542a8769475948561e2b607f85f70b548 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048604126


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 19ba560542a8769475948561e2b607f85f70b548 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222) 
   * 125d2cd385219cb9187e0ce6ac90b00cfea863fc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229) 
   * 48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048722584


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1055618229


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bf80ef66675695d0cbc6eff541226e09567b6e51 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379) 
   * 6a772a7709b577db7afddefb86a1ccd62a75269c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1059350931


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a772a7709b577db7afddefb86a1ccd62a75269c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442) 
   * ffcc639ed1eb64000395e967f4ce57b4ae0c68e2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1053992276


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bf80ef66675695d0cbc6eff541226e09567b6e51 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1053955444


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234) 
   * bf80ef66675695d0cbc6eff541226e09567b6e51 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1044536372


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63aac434acbbbbd15223dc186635f963e97367e9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r813110302



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -165,6 +165,12 @@
           + "used for pruning files during the index lookups. Only applies if "
           + ENABLE_METADATA_INDEX_COLUMN_STATS.key() + " is enabled.A");
 
+  public static final ConfigProperty<Integer> COLUMN_STATS_INDEX_PARALLELISM = ConfigProperty
+          .key(METADATA_PREFIX + ".index.column.stats.parallelism")
+          .defaultValue(1)

Review comment:
       done 10 for now.. but i think once we run perf tests we'll be able to come up with better default value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048483132


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 19ba560542a8769475948561e2b607f85f70b548 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222) 
   * 125d2cd385219cb9187e0ce6ac90b00cfea863fc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r813112496



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -320,7 +325,48 @@ private void updateWriteStatus(HoodieDeltaWriteStat stat, AppendResult result) {
     statuses.add(this.writeStatus);
   }
 
-  private void processAppendResult(AppendResult result) {
+  /**
+   * Get column statistics for the records part of this append handle.
+   *
+   * @param filePath       - Log file that records are part of
+   * @param recordList     - List of records appended to the log for which column statistics is needed for
+   * @param columnRangeMap - Output map to accumulate the column statistics for the records
+   */
+  private void getRecordsStats(final String filePath, List<IndexedRecord> recordList,
+                               Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap) {
+    recordList.forEach(record -> accumulateColumnRanges(record, writeSchemaWithMetaFields, filePath, columnRangeMap, config.isConsistentLogicalTimestampEnabled()));
+  }
+
+  /**
+   * Accumulate column range statistics for the requested record.
+   *
+   * @param record   - Record to get the column range statistics for
+   * @param schema   - Schema for the record
+   * @param filePath - File that record belongs to
+   */
+  private static void accumulateColumnRanges(IndexedRecord record, Schema schema, String filePath,
+          Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap, boolean consistentLogicalTimestampEnabled) {
+    if (!(record instanceof GenericRecord)) {
+      throw new HoodieIOException("Record is not a generic type to get column range metadata!");
+    }
+    schema.getFields().forEach(field -> {
+      final String fieldVal = HoodieAvroUtils.getNestedFieldValAsString((GenericRecord) record, field.name(), true, consistentLogicalTimestampEnabled);
+      final int fieldSize = fieldVal == null ? 0 : fieldVal.length();
+      final HoodieColumnRangeMetadata<Comparable> fieldRange = new HoodieColumnRangeMetadata<>(
+              filePath,
+              field.name(),
+              fieldVal,
+              fieldVal,
+              fieldVal == null ? 1 : 0, // null count
+              fieldVal == null ? 0 : 1, // value count
+              fieldSize,
+              fieldSize

Review comment:
       not necessarily.. i know parquet's ColumnChunkMetadata provides an API to get the uncompressed size. Let me take a look if there's something similar for avro.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1044624883


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63aac434acbbbbd15223dc186635f963e97367e9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r815594460



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -848,41 +857,40 @@ public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient
     }
   }
 
-  private static List<String> getLatestColumns(HoodieTableMetaClient datasetMetaClient) {
-    return getLatestColumns(datasetMetaClient, false);
-  }
-
   public static Stream<HoodieRecord> translateWriteStatToColumnStats(HoodieWriteStat writeStat,
                                                                      HoodieTableMetaClient datasetMetaClient,
-                                                                     List<String> latestColumns) {
-    return getColumnStats(writeStat.getPartitionPath(), writeStat.getPath(), datasetMetaClient, latestColumns, false);
-
+                                                                     List<String> columnsToIndex) {
+    if (writeStat instanceof HoodieDeltaWriteStat && ((HoodieDeltaWriteStat) writeStat).getRecordsStats().isPresent()) {
+      Option<Map<String, HoodieColumnRangeMetadata<Comparable>>> columnRangeMap =

Review comment:
       Not needed. Will remove.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r816953368



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -169,6 +169,12 @@
           + "store the column ranges and will be used for pruning files during the index lookups. "
           + "Only applies if " + ENABLE_METADATA_INDEX_COLUMN_STATS.key() + " is enabled.");
 
+  public static final ConfigProperty<Integer> COLUMN_STATS_INDEX_PARALLELISM = ConfigProperty

Review comment:
       This is the latter case. I think it's better to have separate config for each type of partition because indexing compute characteristics are different.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r816955472



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -339,6 +385,13 @@ private void processAppendResult(AppendResult result) {
       updateWriteStatus(stat, result);
     }
 
+    if (config.isMetadataIndexColumnStatsForAllColumnsEnabled()) {

Review comment:
       Agree. I will take this reafctoring while adding support for indexing of subset of columns. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1055618229


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bf80ef66675695d0cbc6eff541226e09567b6e51 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379) 
   * 6a772a7709b577db7afddefb86a1ccd62a75269c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r818218764



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -187,94 +178,90 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
   /**
    * Convert commit action metadata to bloom filter records.
    *
-   * @param commitMetadata - Commit action metadata
-   * @param dataMetaClient - Meta client for the data table
-   * @param instantTime    - Action instant time
-   * @return List of metadata table records
+   * @param context                 - Engine context to use
+   * @param commitMetadata          - Commit action metadata
+   * @param instantTime             - Action instant time
+   * @param recordsGenerationParams - Parameters for bloom filter record generation
+   * @return HoodieData of metadata table records
    */
-  public static List<HoodieRecord> convertMetadataToBloomFilterRecords(HoodieCommitMetadata commitMetadata,
-                                                                       HoodieTableMetaClient dataMetaClient,
-                                                                       String instantTime) {
-    List<HoodieRecord> records = new LinkedList<>();
-    commitMetadata.getPartitionToWriteStats().forEach((partitionStatName, writeStats) -> {
-      final String partition = partitionStatName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionStatName;
-      Map<String, Long> newFiles = new HashMap<>(writeStats.size());
-      writeStats.forEach(hoodieWriteStat -> {
-        // No action for delta logs
-        if (hoodieWriteStat instanceof HoodieDeltaWriteStat) {
-          return;
-        }
+  public static HoodieData<HoodieRecord> convertMetadataToBloomFilterRecords(
+      HoodieEngineContext context, HoodieCommitMetadata commitMetadata,
+      String instantTime, MetadataRecordsGenerationParams recordsGenerationParams) {
+    final List<HoodieWriteStat> allWriteStats = commitMetadata.getPartitionToWriteStats().values().stream()
+        .flatMap(entry -> entry.stream()).collect(Collectors.toList());
+    if (allWriteStats.isEmpty()) {
+      return context.emptyHoodieData();
+    }
 
-        String pathWithPartition = hoodieWriteStat.getPath();
-        if (pathWithPartition == null) {
-          // Empty partition
-          LOG.error("Failed to find path in write stat to update metadata table " + hoodieWriteStat);
-          return;
-        }
-        int offset = partition.equals(NON_PARTITIONED_NAME) ? (pathWithPartition.startsWith("/") ? 1 : 0) :
-            partition.length() + 1;
+    HoodieData<HoodieWriteStat> allWriteStatsRDD = context.parallelize(allWriteStats,
+        Math.max(recordsGenerationParams.getBloomIndexParallelism(), allWriteStats.size()));

Review comment:
       we could take min and then make it 1 if equal to 0. what incase user explicitly overrides the value for bloomIndexParallelism. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r820839507



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -187,94 +178,90 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
   /**
    * Convert commit action metadata to bloom filter records.
    *
-   * @param commitMetadata - Commit action metadata
-   * @param dataMetaClient - Meta client for the data table
-   * @param instantTime    - Action instant time
-   * @return List of metadata table records
+   * @param context                 - Engine context to use
+   * @param commitMetadata          - Commit action metadata
+   * @param instantTime             - Action instant time
+   * @param recordsGenerationParams - Parameters for bloom filter record generation
+   * @return HoodieData of metadata table records
    */
-  public static List<HoodieRecord> convertMetadataToBloomFilterRecords(HoodieCommitMetadata commitMetadata,
-                                                                       HoodieTableMetaClient dataMetaClient,
-                                                                       String instantTime) {
-    List<HoodieRecord> records = new LinkedList<>();
-    commitMetadata.getPartitionToWriteStats().forEach((partitionStatName, writeStats) -> {
-      final String partition = partitionStatName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionStatName;
-      Map<String, Long> newFiles = new HashMap<>(writeStats.size());
-      writeStats.forEach(hoodieWriteStat -> {
-        // No action for delta logs
-        if (hoodieWriteStat instanceof HoodieDeltaWriteStat) {
-          return;
-        }
+  public static HoodieData<HoodieRecord> convertMetadataToBloomFilterRecords(
+      HoodieEngineContext context, HoodieCommitMetadata commitMetadata,
+      String instantTime, MetadataRecordsGenerationParams recordsGenerationParams) {
+    final List<HoodieWriteStat> allWriteStats = commitMetadata.getPartitionToWriteStats().values().stream()
+        .flatMap(entry -> entry.stream()).collect(Collectors.toList());
+    if (allWriteStats.isEmpty()) {
+      return context.emptyHoodieData();
+    }
 
-        String pathWithPartition = hoodieWriteStat.getPath();
-        if (pathWithPartition == null) {
-          // Empty partition
-          LOG.error("Failed to find path in write stat to update metadata table " + hoodieWriteStat);
-          return;
-        }
-        int offset = partition.equals(NON_PARTITIONED_NAME) ? (pathWithPartition.startsWith("/") ? 1 : 0) :
-            partition.length() + 1;
+    HoodieData<HoodieWriteStat> allWriteStatsRDD = context.parallelize(allWriteStats,
+        Math.max(recordsGenerationParams.getBloomIndexParallelism(), allWriteStats.size()));

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r820837051



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -922,4 +952,39 @@ public static int getPartitionFileGroupCount(final MetadataPartitionType partiti
     }
   }
 
+  /**
+   * Computes column range metadata
+   *
+   * @param recordList                        - list of records from which column range statistics will be computed
+   * @param field                             - column name for which statistics will be computed
+   * @param filePath                          - data file path
+   * @param columnRangeMap                    - old column range statistics, which will be merged in this computation
+   * @param consistentLogicalTimestampEnabled - flag to deal with logical timestamp type when getting column value
+   */
+  public static void accumulateColumnRanges(List<IndexedRecord> recordList, Schema.Field field, String filePath,

Review comment:
       Good point. I have changed accrodingly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1044624883


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63aac434acbbbbd15223dc186635f963e97367e9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r810405559



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -320,7 +325,48 @@ private void updateWriteStatus(HoodieDeltaWriteStat stat, AppendResult result) {
     statuses.add(this.writeStatus);
   }
 
-  private void processAppendResult(AppendResult result) {
+  /**
+   * Get column statistics for the records part of this append handle.
+   *
+   * @param filePath       - Log file that records are part of
+   * @param recordList     - List of records appended to the log for which column statistics is needed for
+   * @param columnRangeMap - Output map to accumulate the column statistics for the records
+   */
+  private void getRecordsStats(final String filePath, List<IndexedRecord> recordList,
+                               Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap) {
+    recordList.forEach(record -> accumulateColumnRanges(record, writeSchemaWithMetaFields, filePath, columnRangeMap, config.isConsistentLogicalTimestampEnabled()));
+  }
+
+  /**
+   * Accumulate column range statistics for the requested record.
+   *
+   * @param record   - Record to get the column range statistics for
+   * @param schema   - Schema for the record
+   * @param filePath - File that record belongs to
+   */
+  private static void accumulateColumnRanges(IndexedRecord record, Schema schema, String filePath,
+          Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap, boolean consistentLogicalTimestampEnabled) {
+    if (!(record instanceof GenericRecord)) {
+      throw new HoodieIOException("Record is not a generic type to get column range metadata!");
+    }
+    schema.getFields().forEach(field -> {
+      final String fieldVal = HoodieAvroUtils.getNestedFieldValAsString((GenericRecord) record, field.name(), true, consistentLogicalTimestampEnabled);
+      final int fieldSize = fieldVal == null ? 0 : fieldVal.length();
+      final HoodieColumnRangeMetadata<Comparable> fieldRange = new HoodieColumnRangeMetadata<>(
+              filePath,
+              field.name(),
+              fieldVal,
+              fieldVal,
+              fieldVal == null ? 1 : 0, // null count
+              fieldVal == null ? 0 : 1, // value count
+              fieldSize,
+              fieldSize

Review comment:
       incase of avro, total size and total uncompressed size is gonna be same ? 

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -330,78 +319,67 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
       });
     });
 
-    return engineContext.map(deleteFileList, deleteFileInfo -> {
-      return HoodieMetadataPayload.createBloomFilterMetadataRecord(
-          deleteFileInfo.getLeft(), deleteFileInfo.getRight(), instantTime, ByteBuffer.allocate(0), true);
-    }, 1).stream().collect(Collectors.toList());
+    HoodieData<Pair<String, String>> deleteFileListRDD = engineContext.parallelize(deleteFileList,
+        Math.max(deleteFileList.size(), recordsGenerationParams.getBloomIndexParallelism()));
+    return deleteFileListRDD.map(deleteFileInfo -> HoodieMetadataPayload.createBloomFilterMetadataRecord(
+        deleteFileInfo.getLeft(), deleteFileInfo.getRight(), instantTime, StringUtils.EMPTY_STRING,
+        ByteBuffer.allocate(0), true));
   }
 
   /**
    * Convert clean metadata to column stats index records.
    *
-   * @param cleanMetadata     - Clean action metadata
-   * @param engineContext     - Engine context
-   * @param datasetMetaClient - data table meta client
+   * @param cleanMetadata           - Clean action metadata
+   * @param engineContext           - Engine context
+   * @param recordsGenerationParams - Parameters for bloom filter record generation
    * @return List of column stats index records for the clean metadata
    */
-  public static List<HoodieRecord> convertMetadataToColumnStatsRecords(HoodieCleanMetadata cleanMetadata,
-                                                                       HoodieEngineContext engineContext,
-                                                                       HoodieTableMetaClient datasetMetaClient) {
+  public static HoodieData<HoodieRecord> convertMetadataToColumnStatsRecords(HoodieCleanMetadata cleanMetadata,
+                                                                             HoodieEngineContext engineContext,
+                                                                             MetadataRecordsGenerationParams recordsGenerationParams) {
     List<Pair<String, String>> deleteFileList = new ArrayList<>();
     cleanMetadata.getPartitionMetadata().forEach((partition, partitionMetadata) -> {
       // Files deleted from a partition
       List<String> deletedFiles = partitionMetadata.getDeletePathPatterns();
       deletedFiles.forEach(entry -> deleteFileList.add(Pair.of(partition, entry)));
     });
 
-    List<String> latestColumns = getLatestColumns(datasetMetaClient);
-    return engineContext.flatMap(deleteFileList,
-        deleteFileInfo -> {
-          if (deleteFileInfo.getRight().endsWith(HoodieFileFormat.PARQUET.getFileExtension())) {
-            return getColumnStats(deleteFileInfo.getKey(), deleteFileInfo.getValue(), datasetMetaClient,
-                latestColumns, true);
-          }
-          return Stream.empty();
-        }, 1).stream().collect(Collectors.toList());
+    final List<String> columnsToIndex = getColumnsToIndex(recordsGenerationParams.getDataMetaClient(), recordsGenerationParams.isAllColumnStatsIndexEnabled());

Review comment:
       is it possible to move the generation of columnsToIndex to some higher layer and avoid repeated computing. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -330,78 +319,67 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
       });
     });
 
-    return engineContext.map(deleteFileList, deleteFileInfo -> {
-      return HoodieMetadataPayload.createBloomFilterMetadataRecord(
-          deleteFileInfo.getLeft(), deleteFileInfo.getRight(), instantTime, ByteBuffer.allocate(0), true);
-    }, 1).stream().collect(Collectors.toList());
+    HoodieData<Pair<String, String>> deleteFileListRDD = engineContext.parallelize(deleteFileList,
+        Math.max(deleteFileList.size(), recordsGenerationParams.getBloomIndexParallelism()));

Review comment:
       again. if min makes sense, do fix in all places. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -831,7 +828,7 @@ public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient
    * @param datasetMetaClient                   - Data table meta client
    * @param isMetaIndexColumnStatsForAllColumns - Is column stats indexing enabled for all columns
    */
-  private static List<String> getLatestColumns(HoodieTableMetaClient datasetMetaClient, boolean isMetaIndexColumnStatsForAllColumns) {
+  private static List<String> getColumnsToIndex(HoodieTableMetaClient datasetMetaClient, boolean isMetaIndexColumnStatsForAllColumns) {

Review comment:
       a comment about L 834. I feel we can't directly take in RecordKeyFieldProp as is. may not work for all key gens. 
   may be we have to split with "," and then set the columns to index. 
   Can you think if there are any other places where we have this dependency and check if we have done the right thing

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -320,7 +325,48 @@ private void updateWriteStatus(HoodieDeltaWriteStat stat, AppendResult result) {
     statuses.add(this.writeStatus);
   }
 
-  private void processAppendResult(AppendResult result) {
+  /**
+   * Get column statistics for the records part of this append handle.
+   *
+   * @param filePath       - Log file that records are part of
+   * @param recordList     - List of records appended to the log for which column statistics is needed for
+   * @param columnRangeMap - Output map to accumulate the column statistics for the records
+   */
+  private void getRecordsStats(final String filePath, List<IndexedRecord> recordList,

Review comment:
       may be we can name this "setRecordsStats". 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -165,6 +165,12 @@
           + "used for pruning files during the index lookups. Only applies if "
           + ENABLE_METADATA_INDEX_COLUMN_STATS.key() + " is enabled.A");
 
+  public static final ConfigProperty<Integer> COLUMN_STATS_INDEX_PARALLELISM = ConfigProperty
+          .key(METADATA_PREFIX + ".index.column.stats.parallelism")
+          .defaultValue(1)

Review comment:
       why 1. can we make this 10 may be. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -871,27 +879,39 @@ protected void bootstrapCommit(List<DirectoryInfo> partitionInfoList, String cre
         return HoodieMetadataPayload.createPartitionFilesRecord(
             partitionInfo.getRelativePath().isEmpty() ? NON_PARTITIONED_NAME : partitionInfo.getRelativePath(), Option.of(validFileNameToSizeMap), Option.empty());
       });
-      partitionRecords = partitionRecords.union(fileListRecords);
+      filesPartitionRecords = filesPartitionRecords.union(fileListRecords);

Review comment:
       I did leave this comment in one of the previous patches. but can we make the partition path name deduction to a method and reuse that everywhere. 
   ```
   partitionInfo.getRelativePath().isEmpty() ? NON_PARTITIONED_NAME : partitionInfo.getRelativePath()
   ```
   we already had some bugs around non partitioned dataset. so wanted to keep it in one place. 
   
   also this one
   ```
   String partition = partitionName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionName;
   ```

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -187,94 +178,90 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
   /**
    * Convert commit action metadata to bloom filter records.
    *
-   * @param commitMetadata - Commit action metadata
-   * @param dataMetaClient - Meta client for the data table
-   * @param instantTime    - Action instant time
-   * @return List of metadata table records
+   * @param context                 - Engine context to use
+   * @param commitMetadata          - Commit action metadata
+   * @param instantTime             - Action instant time
+   * @param recordsGenerationParams - Parameters for bloom filter record generation
+   * @return HoodieData of metadata table records
    */
-  public static List<HoodieRecord> convertMetadataToBloomFilterRecords(HoodieCommitMetadata commitMetadata,
-                                                                       HoodieTableMetaClient dataMetaClient,
-                                                                       String instantTime) {
-    List<HoodieRecord> records = new LinkedList<>();
-    commitMetadata.getPartitionToWriteStats().forEach((partitionStatName, writeStats) -> {
-      final String partition = partitionStatName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionStatName;
-      Map<String, Long> newFiles = new HashMap<>(writeStats.size());
-      writeStats.forEach(hoodieWriteStat -> {
-        // No action for delta logs
-        if (hoodieWriteStat instanceof HoodieDeltaWriteStat) {
-          return;
-        }
+  public static HoodieData<HoodieRecord> convertMetadataToBloomFilterRecords(
+      HoodieEngineContext context, HoodieCommitMetadata commitMetadata,
+      String instantTime, MetadataRecordsGenerationParams recordsGenerationParams) {
+    final List<HoodieWriteStat> allWriteStats = commitMetadata.getPartitionToWriteStats().values().stream()
+        .flatMap(entry -> entry.stream()).collect(Collectors.toList());
+    if (allWriteStats.isEmpty()) {
+      return context.emptyHoodieData();
+    }
 
-        String pathWithPartition = hoodieWriteStat.getPath();
-        if (pathWithPartition == null) {
-          // Empty partition
-          LOG.error("Failed to find path in write stat to update metadata table " + hoodieWriteStat);
-          return;
-        }
-        int offset = partition.equals(NON_PARTITIONED_NAME) ? (pathWithPartition.startsWith("/") ? 1 : 0) :
-            partition.length() + 1;
+    HoodieData<HoodieWriteStat> allWriteStatsRDD = context.parallelize(allWriteStats,
+        Math.max(recordsGenerationParams.getBloomIndexParallelism(), allWriteStats.size()));

Review comment:
       shouldn't we be doing min here instead of max. if there are only 10 writeStats, why do we parallelize across 100 (if config is set to 100)? 
   applicable to everywhere we do this based on colsStatsParallelism and bloomIndexParallelism. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -609,82 +578,124 @@ private static void processRollbackMetadata(HoodieActiveTimeline metadataTableTi
   }
 
   /**
-   * Convert rollback action metadata to bloom filter index records.
+   * Convert added and deleted files metadata to bloom filter index records.
    */
-  private static List<HoodieRecord> convertFilesToBloomFilterRecords(HoodieEngineContext engineContext,
-                                                                     HoodieTableMetaClient dataMetaClient,
-                                                                     Map<String, List<String>> partitionToDeletedFiles,
-                                                                     Map<String, Map<String, Long>> partitionToAppendedFiles,
-                                                                     String instantTime) {
-    List<HoodieRecord> records = new LinkedList<>();
-    partitionToDeletedFiles.forEach((partitionName, deletedFileList) -> deletedFileList.forEach(deletedFile -> {
-      if (!FSUtils.isBaseFile(new Path(deletedFile))) {
-        return;
-      }
+  public static HoodieData<HoodieRecord> convertFilesToBloomFilterRecords(HoodieEngineContext engineContext,
+                                                                          Map<String, List<String>> partitionToDeletedFiles,
+                                                                          Map<String, Map<String, Long>> partitionToAppendedFiles,
+                                                                          MetadataRecordsGenerationParams recordsGenerationParams,
+                                                                          String instantTime) {
+    HoodieData<HoodieRecord> allRecordsRDD = engineContext.emptyHoodieData();
+
+    List<Pair<String, List<String>>> partitionToDeletedFilesList = partitionToDeletedFiles.entrySet()
+        .stream().map(e -> Pair.of(e.getKey(), e.getValue())).collect(Collectors.toList());
+    HoodieData<Pair<String, List<String>>> partitionToDeletedFilesRDD = engineContext.parallelize(partitionToDeletedFilesList,
+        Math.max(partitionToDeletedFilesList.size(), recordsGenerationParams.getBloomIndexParallelism()));
+
+    HoodieData<HoodieRecord> deletedFilesRecordsRDD = partitionToDeletedFilesRDD.flatMap(partitionToDeletedFilesEntry -> {
+      final String partitionName = partitionToDeletedFilesEntry.getLeft();
+      final List<String> deletedFileList = partitionToDeletedFilesEntry.getRight();
+      return deletedFileList.stream().flatMap(deletedFile -> {
+        if (!FSUtils.isBaseFile(new Path(deletedFile))) {
+          return Stream.empty();
+        }
 
-      final String partition = partitionName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionName;
-      records.add(HoodieMetadataPayload.createBloomFilterMetadataRecord(
-          partition, deletedFile, instantTime, ByteBuffer.allocate(0), true));
-    }));
+        final String partition = partitionName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionName;
+        return Stream.<HoodieRecord>of(HoodieMetadataPayload.createBloomFilterMetadataRecord(
+            partition, deletedFile, instantTime, StringUtils.EMPTY_STRING, ByteBuffer.allocate(0), true));
+      }).iterator();
+    });
+    allRecordsRDD = allRecordsRDD.union(deletedFilesRecordsRDD);
 
-    partitionToAppendedFiles.forEach((partitionName, appendedFileMap) -> {
+    List<Pair<String, Map<String, Long>>> partitionToAppendedFilesList = partitionToAppendedFiles.entrySet()
+        .stream().map(entry -> Pair.of(entry.getKey(), entry.getValue())).collect(Collectors.toList());
+    HoodieData<Pair<String, Map<String, Long>>> partitionToAppendedFilesRDD = engineContext.parallelize(partitionToAppendedFilesList,
+        Math.max(partitionToAppendedFiles.size(), recordsGenerationParams.getBloomIndexParallelism()));
+
+    HoodieData<HoodieRecord> appendedFilesRecordsRDD = partitionToAppendedFilesRDD.flatMap(partitionToAppendedFilesEntry -> {
+      final String partitionName = partitionToAppendedFilesEntry.getKey();
+      final Map<String, Long> appendedFileMap = partitionToAppendedFilesEntry.getValue();
       final String partition = partitionName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionName;
-      appendedFileMap.forEach((appendedFile, length) -> {
+      return appendedFileMap.entrySet().stream().flatMap(appendedFileLengthPairEntry -> {
+        final String appendedFile = appendedFileLengthPairEntry.getKey();
         if (!FSUtils.isBaseFile(new Path(appendedFile))) {
-          return;
+          return Stream.empty();
         }
         final String pathWithPartition = partitionName + "/" + appendedFile;
-        final Path appendedFilePath = new Path(dataMetaClient.getBasePath(), pathWithPartition);
-        try {
-          HoodieFileReader<IndexedRecord> fileReader =
-              HoodieFileReaderFactory.getFileReader(dataMetaClient.getHadoopConf(), appendedFilePath);
+        final Path appendedFilePath = new Path(recordsGenerationParams.getDataMetaClient().getBasePath(), pathWithPartition);
+        try (HoodieFileReader<IndexedRecord> fileReader =
+                 HoodieFileReaderFactory.getFileReader(recordsGenerationParams.getDataMetaClient().getHadoopConf(), appendedFilePath)) {
           final BloomFilter fileBloomFilter = fileReader.readBloomFilter();
           if (fileBloomFilter == null) {
             LOG.error("Failed to read bloom filter for " + appendedFilePath);
-            return;
+            return Stream.empty();
           }
           ByteBuffer bloomByteBuffer = ByteBuffer.wrap(fileBloomFilter.serializeToString().getBytes());
           HoodieRecord record = HoodieMetadataPayload.createBloomFilterMetadataRecord(
-              partition, appendedFile, instantTime, bloomByteBuffer, false);
-          records.add(record);
-          fileReader.close();
+              partition, appendedFile, instantTime, recordsGenerationParams.getBloomFilterType(), bloomByteBuffer, false);
+          return Stream.of(record);
         } catch (IOException e) {
           LOG.error("Failed to get bloom filter for file: " + appendedFilePath);
         }
-      });
+        return Stream.empty();
+      }).iterator();
     });
-    return records;
+    allRecordsRDD = allRecordsRDD.union(appendedFilesRecordsRDD);
+
+    return allRecordsRDD;
   }
 
   /**
-   * Convert rollback action metadata to column stats index records.
+   * Convert added and deleted action metadata to column stats index records.
    */
-  private static List<HoodieRecord> convertFilesToColumnStatsRecords(HoodieEngineContext engineContext,
-                                                                     HoodieTableMetaClient datasetMetaClient,
-                                                                     Map<String, List<String>> partitionToDeletedFiles,
-                                                                     Map<String, Map<String, Long>> partitionToAppendedFiles,
-                                                                     String instantTime) {
-    List<HoodieRecord> records = new LinkedList<>();
-    List<String> latestColumns = getLatestColumns(datasetMetaClient);
-    partitionToDeletedFiles.forEach((partitionName, deletedFileList) -> deletedFileList.forEach(deletedFile -> {
+  public static HoodieData<HoodieRecord> convertFilesToColumnStatsRecords(HoodieEngineContext engineContext,
+                                                                          Map<String, List<String>> partitionToDeletedFiles,
+                                                                          Map<String, Map<String, Long>> partitionToAppendedFiles,
+                                                                          MetadataRecordsGenerationParams recordsGenerationParams) {
+    HoodieData<HoodieRecord> allRecordsRDD = engineContext.emptyHoodieData();
+    final List<String> columnsToIndex = getColumnsToIndex(recordsGenerationParams.getDataMetaClient(), recordsGenerationParams.isAllColumnStatsIndexEnabled());
+
+    final List<Pair<String, List<String>>> partitionToDeletedFilesList = partitionToDeletedFiles.entrySet()
+        .stream().map(e -> Pair.of(e.getKey(), e.getValue())).collect(Collectors.toList());
+    final HoodieData<Pair<String, List<String>>> partitionToDeletedFilesRDD = engineContext.parallelize(partitionToDeletedFilesList,
+        Math.max(partitionToDeletedFilesList.size(), recordsGenerationParams.getColumnStatsIndexParallelism()));
+
+    HoodieData<HoodieRecord> deletedFilesRecordsRDD = partitionToDeletedFilesRDD.flatMap(partitionToDeletedFilesEntry -> {
+      final String partitionName = partitionToDeletedFilesEntry.getLeft();
       final String partition = partitionName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionName;
-      if (deletedFile.endsWith(HoodieFileFormat.PARQUET.getFileExtension())) {
+      final List<String> deletedFileList = partitionToDeletedFilesEntry.getRight();
+
+      return deletedFileList.stream().flatMap(deletedFile -> {
         final String filePathWithPartition = partitionName + "/" + deletedFile;
-        records.addAll(getColumnStats(partition, filePathWithPartition, datasetMetaClient,
-            latestColumns, true).collect(Collectors.toList()));
-      }
-    }));
-
-    partitionToAppendedFiles.forEach((partitionName, appendedFileMap) -> appendedFileMap.forEach(
-        (appendedFile, size) -> {
-          final String partition = partitionName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionName;
-          if (appendedFile.endsWith(HoodieFileFormat.PARQUET.getFileExtension())) {
-            final String filePathWithPartition = partitionName + "/" + appendedFile;
-            records.addAll(getColumnStats(partition, filePathWithPartition, datasetMetaClient,
-                latestColumns, false).collect(Collectors.toList()));
-          }
-        }));
-    return records;
+        return getColumnStats(partition, filePathWithPartition, recordsGenerationParams.getDataMetaClient(), columnsToIndex, true);
+      }).iterator();
+    });
+    allRecordsRDD = allRecordsRDD.union(deletedFilesRecordsRDD);
+
+    final List<Pair<String, Map<String, Long>>> partitionToAppendedFilesList = partitionToAppendedFiles.entrySet()
+        .stream().map(entry -> Pair.of(entry.getKey(), entry.getValue())).collect(Collectors.toList());
+    final HoodieData<Pair<String, Map<String, Long>>> partitionToAppendedFilesRDD = engineContext.parallelize(partitionToAppendedFilesList,
+        Math.max(partitionToAppendedFiles.size(), recordsGenerationParams.getColumnStatsIndexParallelism()));
+
+    HoodieData<HoodieRecord> appendedFilesRecordsRDD = partitionToAppendedFilesRDD.flatMap(partitionToAppendedFilesEntry -> {
+      final String partitionName = partitionToAppendedFilesEntry.getLeft();
+      final String partition = partitionName.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionName;
+      final Map<String, Long> appendedFileMap = partitionToAppendedFilesEntry.getRight();
+
+      return appendedFileMap.entrySet().stream().flatMap(appendedFileNameLengthPair -> {
+        // TODO: HUDI-3374 Handle log files without delta write stat to get records column stats

Review comment:
       can we remove this comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1047369471


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63aac434acbbbbd15223dc186635f963e97367e9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124) 
   * 97f253e3d9ef2c8caf05810d42e5f54e7598d4de Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1047368221


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63aac434acbbbbd15223dc186635f963e97367e9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124) 
   * 97f253e3d9ef2c8caf05810d42e5f54e7598d4de UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1047368221


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 63aac434acbbbbd15223dc186635f963e97367e9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124) 
   * 97f253e3d9ef2c8caf05810d42e5f54e7598d4de UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r813113300



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -831,7 +828,7 @@ public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient
    * @param datasetMetaClient                   - Data table meta client
    * @param isMetaIndexColumnStatsForAllColumns - Is column stats indexing enabled for all columns
    */
-  private static List<String> getLatestColumns(HoodieTableMetaClient datasetMetaClient, boolean isMetaIndexColumnStatsForAllColumns) {
+  private static List<String> getColumnsToIndex(HoodieTableMetaClient datasetMetaClient, boolean isMetaIndexColumnStatsForAllColumns) {

Review comment:
       fixed here. But, there are couple of other places.. i'll create a separate patch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1047401380


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 97f253e3d9ef2c8caf05810d42e5f54e7598d4de Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048543998


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 19ba560542a8769475948561e2b607f85f70b548 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222) 
   * 125d2cd385219cb9187e0ce6ac90b00cfea863fc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048606918


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 125d2cd385219cb9187e0ce6ac90b00cfea863fc Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229) 
   * 48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1055690729


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a772a7709b577db7afddefb86a1ccd62a75269c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1053957438


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234) 
   * bf80ef66675695d0cbc6eff541226e09567b6e51 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1053957438


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234) 
   * bf80ef66675695d0cbc6eff541226e09567b6e51 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1055615217


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bf80ef66675695d0cbc6eff541226e09567b6e51 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379) 
   * 6a772a7709b577db7afddefb86a1ccd62a75269c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1053992276


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bf80ef66675695d0cbc6eff541226e09567b6e51 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r818219576



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -831,7 +828,7 @@ public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient
    * @param datasetMetaClient                   - Data table meta client
    * @param isMetaIndexColumnStatsForAllColumns - Is column stats indexing enabled for all columns
    */
-  private static List<String> getLatestColumns(HoodieTableMetaClient datasetMetaClient, boolean isMetaIndexColumnStatsForAllColumns) {
+  private static List<String> getColumnsToIndex(HoodieTableMetaClient datasetMetaClient, boolean isMetaIndexColumnStatsForAllColumns) {

Review comment:
       sure. do we have a tracking jira? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r818218157



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
##########
@@ -165,6 +165,12 @@
           + "used for pruning files during the index lookups. Only applies if "
           + ENABLE_METADATA_INDEX_COLUMN_STATS.key() + " is enabled.A");
 
+  public static final ConfigProperty<Integer> COLUMN_STATS_INDEX_PARALLELISM = ConfigProperty
+          .key(METADATA_PREFIX + ".index.column.stats.parallelism")
+          .defaultValue(1)

Review comment:
       sounds good!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1059353304


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a772a7709b577db7afddefb86a1ccd62a75269c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442) 
   * ffcc639ed1eb64000395e967f4ce57b4ae0c68e2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r818222679



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -799,30 +824,20 @@ public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient
   /**
    * Create column stats from write status.
    *
-   * @param engineContext                       - Engine context
-   * @param datasetMetaClient                   - Dataset meta client
-   * @param allWriteStats                       - Write status to convert
-   * @param isMetaIndexColumnStatsForAllColumns - Are all columns enabled for indexing
+   * @param engineContext           - Engine context
+   * @param allWriteStats           - Write status to convert
+   * @param recordsGenerationParams - Parameters for columns stats record generation
    */
-  public static List<HoodieRecord> createColumnStatsFromWriteStats(HoodieEngineContext engineContext,
-                                                                   HoodieTableMetaClient datasetMetaClient,
-                                                                   List<HoodieWriteStat> allWriteStats,
-                                                                   boolean isMetaIndexColumnStatsForAllColumns) throws Exception {
+  public static HoodieData<HoodieRecord> createColumnStatsFromWriteStats(HoodieEngineContext engineContext,
+                                                                         List<HoodieWriteStat> allWriteStats,
+                                                                         MetadataRecordsGenerationParams recordsGenerationParams) {
     if (allWriteStats.isEmpty()) {
-      return Collections.emptyList();
-    }
-
-    List<HoodieWriteStat> prunedWriteStats = allWriteStats.stream().filter(writeStat -> {
-      return !(writeStat instanceof HoodieDeltaWriteStat);
-    }).collect(Collectors.toList());
-    if (prunedWriteStats.isEmpty()) {
-      return Collections.emptyList();
+      return engineContext.emptyHoodieData();
     }
-
-    return engineContext.flatMap(prunedWriteStats,
-        writeStat -> translateWriteStatToColumnStats(writeStat, datasetMetaClient,
-            getLatestColumns(datasetMetaClient, isMetaIndexColumnStatsForAllColumns)),
-        prunedWriteStats.size());
+    HoodieData<HoodieWriteStat> allWriteStatsRDD = engineContext.parallelize(
+        allWriteStats, Math.max(allWriteStats.size(), recordsGenerationParams.getColumnStatsIndexParallelism()));
+    return allWriteStatsRDD.flatMap(writeStat -> translateWriteStatToColumnStats(writeStat, recordsGenerationParams.getDataMetaClient(),
+        getColumnsToIndex(recordsGenerationParams.getDataMetaClient(), recordsGenerationParams.isAllColumnStatsIndexEnabled())).iterator());

Review comment:
       should we move getColumnsToIndex to driver above (line 837 may be) and avoid computing getColumnsToIndex in every executor ?

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -922,4 +952,39 @@ public static int getPartitionFileGroupCount(final MetadataPartitionType partiti
     }
   }
 
+  /**
+   * Computes column range metadata
+   *
+   * @param recordList                        - list of records from which column range statistics will be computed
+   * @param field                             - column name for which statistics will be computed
+   * @param filePath                          - data file path
+   * @param columnRangeMap                    - old column range statistics, which will be merged in this computation
+   * @param consistentLogicalTimestampEnabled - flag to deal with logical timestamp type when getting column value
+   */
+  public static void accumulateColumnRanges(List<IndexedRecord> recordList, Schema.Field field, String filePath,

Review comment:
       I see this is getting called from HoodieAppendHandle and we call it for every field/column.
   
   ie
   for every field  -> accumulatecolumnRanges { iterate through every record and find cols stats  }
   
   Since this is avro/row based format, why can't we collect stats for fields/cols at once per record and keep iterating through every record to eventually find col stats for all fields. 
   
   essentially we are doing a columnar read across records for N no of columns. I am proposing if we can flip that to read entire record, fetch all stats and proceed further.  
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060382416


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621",
       "triggerID" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff1f746fc4826a6432ec2078ae3e6c8536a038f1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060243355


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621",
       "triggerID" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 21dc93b754e84a414e239a6854fce1195267143f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608) 
   * ff1f746fc4826a6432ec2078ae3e6c8536a038f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6621) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1060242046


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6234",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6379",
       "triggerID" : "bf80ef66675695d0cbc6eff541226e09567b6e51",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6442",
       "triggerID" : "6a772a7709b577db7afddefb86a1ccd62a75269c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6551",
       "triggerID" : "ffcc639ed1eb64000395e967f4ce57b4ae0c68e2",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21dc93b754e84a414e239a6854fce1195267143f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608",
       "triggerID" : "21dc93b754e84a414e239a6854fce1195267143f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff1f746fc4826a6432ec2078ae3e6c8536a038f1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 21dc93b754e84a414e239a6854fce1195267143f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6608) 
   * ff1f746fc4826a6432ec2078ae3e6c8536a038f1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1061329026


   @codope : am good with the patch. Can you rebase w/ latest master. we can land once CI is green. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r822126414



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -339,6 +343,19 @@ private void processAppendResult(AppendResult result) {
       updateWriteStatus(stat, result);
     }
 
+    if (config.isMetadataIndexColumnStatsForAllColumnsEnabled()) {
+      Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap = stat.getRecordsStats().isPresent()
+          ? stat.getRecordsStats().get().getStats() : new HashMap<>();

Review comment:
       @codope that's what i was referring to with my comments regarding increased complexity in respect to `RecordStats`. Why not just have `stat.getRecordsStats().get()` instead?
   
   Now, when reading this code reader actually need to understand what is this additional `getStats()` call is about and why it's needed, while w/o it the call-site is crystal clear and doesn't require scanning through of `getRecordStats` to understand what's going on

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -339,6 +343,19 @@ private void processAppendResult(AppendResult result) {
       updateWriteStatus(stat, result);
     }
 
+    if (config.isMetadataIndexColumnStatsForAllColumnsEnabled()) {
+      Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap = stat.getRecordsStats().isPresent()
+          ? stat.getRecordsStats().get().getStats() : new HashMap<>();
+      final String filePath = stat.getPath();
+      // initialize map of column name to map of stats name to stats value
+      Map<String, Map<String, Object>> columnToStats = new HashMap<>();
+      writeSchemaWithMetaFields.getFields().forEach(field -> columnToStats.putIfAbsent(field.name(), new HashMap<>()));
+      // collect stats for columns at once per record and keep iterating through every record to eventually find col stats for all fields.
+      recordList.forEach(record -> aggregateColumnStats(record, writeSchemaWithMetaFields, columnToStats, config.isConsistentLogicalTimestampEnabled()));

Review comment:
       Can we, instead of placing iteration and aggregation into separate methods, consolidate them in `aggregateColumnStats` so that its signature actually is:
   
   ```
   Map<String, Map<...>> aggregateColumnStats(records, writeSchema, ...)
   ```

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/MetadataRecordsGenerationParams.java
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.metadata;
+
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+
+import java.io.Serializable;
+import java.util.List;
+
+/**
+ * Encapsulates all parameters required to generate metadata index for enabled index types.
+ */
+public class MetadataRecordsGenerationParams implements Serializable {
+
+  private final HoodieTableMetaClient dataMetaClient;

Review comment:
       Let's limit the scope of this component to just _parameters_ for Index Generation. Otherwise this has a potential to become a dependency magnet, where random dependencies will be added here to avoid threading them through.

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -941,4 +978,72 @@ public static int getPartitionFileGroupCount(final MetadataPartitionType partiti
     }
   }
 
+  /**
+   * Accumulates column range metadata for the given field and updates the column range map.
+   *
+   * @param field          - column for which statistics will be computed
+   * @param filePath       - data file path
+   * @param columnRangeMap - old column range statistics, which will be merged in this computation
+   * @param columnToStats  - map of column to map of each stat and its value
+   */
+  public static void accumulateColumnRanges(Schema.Field field, String filePath,
+                                            Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap,
+                                            Map<String, Map<String, Object>> columnToStats) {
+    Map<String, Object> columnStats = columnToStats.get(field.name());
+    HoodieColumnRangeMetadata<Comparable> columnRangeMetadata = new HoodieColumnRangeMetadata<>(
+        filePath,
+        field.name(),
+        String.valueOf(columnStats.get(MIN)),
+        String.valueOf(columnStats.get(MAX)),
+        Long.parseLong(columnStats.getOrDefault(NULL_COUNT, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(VALUE_COUNT, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(TOTAL_SIZE, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(TOTAL_UNCOMPRESSED_SIZE, 0).toString())
+    );
+    columnRangeMap.merge(field.name(), columnRangeMetadata, COLUMN_RANGE_MERGE_FUNCTION);
+  }
+
+  /**
+   * Aggregates column stats for each field.
+   *
+   * @param record                            - current record
+   * @param schema                            - write schema
+   * @param columnToStats                     - map of column to map of each stat and its value which gets updates in this method
+   * @param consistentLogicalTimestampEnabled - flag to deal with logical timestamp type when getting column value
+   */
+  public static void aggregateColumnStats(IndexedRecord record, Schema schema,
+                                          Map<String, Map<String, Object>> columnToStats,
+                                          boolean consistentLogicalTimestampEnabled) {
+    if (!(record instanceof GenericRecord)) {
+      throw new HoodieIOException("Record is not a generic type to get column range metadata!");
+    }
+
+    schema.getFields().forEach(field -> {
+      Map<String, Object> columnStats = columnToStats.getOrDefault(field.name(), new HashMap<>());
+      final String fieldVal = getNestedFieldValAsString((GenericRecord) record, field.name(), true, consistentLogicalTimestampEnabled);
+      // update stats
+      final int fieldSize = fieldVal == null ? 0 : fieldVal.length();
+      columnStats.put(TOTAL_SIZE, Long.parseLong(columnStats.getOrDefault(TOTAL_SIZE, 0).toString()) + fieldSize);
+      columnStats.put(TOTAL_UNCOMPRESSED_SIZE, Long.parseLong(columnStats.getOrDefault(TOTAL_UNCOMPRESSED_SIZE, 0).toString()) + fieldSize);
+
+      if (!StringUtils.isNullOrEmpty(fieldVal)) {
+        // set the min value of the field
+        if (!columnStats.containsKey(MIN)) {
+          columnStats.put(MIN, fieldVal);
+        }
+        if (fieldVal.compareTo(String.valueOf(columnStats.get(MIN))) < 0) {

Review comment:
       We can't compare values as strings this is incorrect ("12" < "2")

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -329,14 +332,16 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
   /**
    * Convert clean metadata to bloom filter index records.
    *
-   * @param cleanMetadata - Clean action metadata
-   * @param engineContext - Engine context
-   * @param instantTime   - Clean action instant time
+   * @param cleanMetadata           - Clean action metadata
+   * @param engineContext           - Engine context
+   * @param instantTime             - Clean action instant time
+   * @param recordsGenerationParams - Parameters for bloom filter record generation
    * @return List of bloom filter index records for the clean metadata
    */
-  public static List<HoodieRecord> convertMetadataToBloomFilterRecords(HoodieCleanMetadata cleanMetadata,
-                                                                       HoodieEngineContext engineContext,
-                                                                       String instantTime) {
+  public static HoodieData<HoodieRecord> convertMetadataToBloomFilterRecords(HoodieCleanMetadata cleanMetadata,

Review comment:
       nit: There's general convention that "context" objects are usually passed as first arg

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/MetadataRecordsGenerationParams.java
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.metadata;
+
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+
+import java.io.Serializable;
+import java.util.List;
+
+/**
+ * Encapsulates all parameters required to generate metadata index for enabled index types.
+ */
+public class MetadataRecordsGenerationParams implements Serializable {
+
+  private final HoodieTableMetaClient dataMetaClient;

Review comment:
       BTW, i see it as `Serializable`, how are we serializing the `metaClient`?

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -329,14 +332,16 @@ public static void deleteMetadataTable(String basePath, HoodieEngineContext cont
   /**
    * Convert clean metadata to bloom filter index records.
    *
-   * @param cleanMetadata - Clean action metadata
-   * @param engineContext - Engine context
-   * @param instantTime   - Clean action instant time
+   * @param cleanMetadata           - Clean action metadata
+   * @param engineContext           - Engine context
+   * @param instantTime             - Clean action instant time
+   * @param recordsGenerationParams - Parameters for bloom filter record generation
    * @return List of bloom filter index records for the clean metadata
    */
-  public static List<HoodieRecord> convertMetadataToBloomFilterRecords(HoodieCleanMetadata cleanMetadata,
-                                                                       HoodieEngineContext engineContext,
-                                                                       String instantTime) {
+  public static HoodieData<HoodieRecord> convertMetadataToBloomFilterRecords(HoodieCleanMetadata cleanMetadata,

Review comment:
       Just FYI, no need to fix this 

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -941,4 +978,72 @@ public static int getPartitionFileGroupCount(final MetadataPartitionType partiti
     }
   }
 
+  /**
+   * Accumulates column range metadata for the given field and updates the column range map.
+   *
+   * @param field          - column for which statistics will be computed
+   * @param filePath       - data file path
+   * @param columnRangeMap - old column range statistics, which will be merged in this computation
+   * @param columnToStats  - map of column to map of each stat and its value
+   */
+  public static void accumulateColumnRanges(Schema.Field field, String filePath,

Review comment:
       Can we unify both of these methods into one?

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -867,41 +889,56 @@ public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient
     }
   }
 
-  private static List<String> getLatestColumns(HoodieTableMetaClient datasetMetaClient) {
-    return getLatestColumns(datasetMetaClient, false);
+  public static HoodieMetadataColumnStats mergeColumnStats(HoodieMetadataColumnStats oldColumnStats, HoodieMetadataColumnStats newColumnStats) {
+    ValidationUtils.checkArgument(oldColumnStats.getFileName().equals(newColumnStats.getFileName()));
+    if (newColumnStats.getIsDeleted()) {
+      return newColumnStats;
+    }
+    return HoodieMetadataColumnStats.newBuilder()
+        .setFileName(newColumnStats.getFileName())
+        .setMinValue(Stream.of(oldColumnStats.getMinValue(), newColumnStats.getMinValue()).filter(Objects::nonNull).min(Comparator.naturalOrder()).orElse(null))
+        .setMaxValue(Stream.of(oldColumnStats.getMinValue(), newColumnStats.getMinValue()).filter(Objects::nonNull).max(Comparator.naturalOrder()).orElse(null))
+        .setValueCount(oldColumnStats.getValueCount() + newColumnStats.getValueCount())
+        .setNullCount(oldColumnStats.getNullCount() + newColumnStats.getNullCount())
+        .setTotalSize(oldColumnStats.getTotalSize() + newColumnStats.getTotalSize())
+        .setTotalUncompressedSize(oldColumnStats.getTotalUncompressedSize() + newColumnStats.getTotalUncompressedSize())
+        .setIsDeleted(newColumnStats.getIsDeleted())
+        .build();
   }
 
   public static Stream<HoodieRecord> translateWriteStatToColumnStats(HoodieWriteStat writeStat,
                                                                      HoodieTableMetaClient datasetMetaClient,
-                                                                     List<String> latestColumns) {
-    return getColumnStats(writeStat.getPartitionPath(), writeStat.getPath(), datasetMetaClient, latestColumns, false);
-
+                                                                     List<String> columnsToIndex) {
+    if (writeStat instanceof HoodieDeltaWriteStat && ((HoodieDeltaWriteStat) writeStat).getRecordsStats().isPresent()) {
+      Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap = ((HoodieDeltaWriteStat) writeStat).getRecordsStats().get().getStats();
+      List<HoodieColumnRangeMetadata<Comparable>> columnRangeMetadataList = new ArrayList<>(columnRangeMap.values());
+      return HoodieMetadataPayload.createColumnStatsRecords(writeStat.getPartitionPath(), columnRangeMetadataList, false);
+    }
+    return getColumnStats(writeStat.getPartitionPath(), writeStat.getPath(), datasetMetaClient, columnsToIndex,false);
   }
 
   private static Stream<HoodieRecord> getColumnStats(final String partitionPath, final String filePathWithPartition,
                                                      HoodieTableMetaClient datasetMetaClient,
-                                                     List<String> columns, boolean isDeleted) {
-    final String partition = partitionPath.equals(EMPTY_PARTITION_NAME) ? NON_PARTITIONED_NAME : partitionPath;
+                                                     List<String> columnsToIndex,
+                                                     boolean isDeleted) {
+    final String partition = getPartition(partitionPath);
     final int offset = partition.equals(NON_PARTITIONED_NAME) ? (filePathWithPartition.startsWith("/") ? 1 : 0)
         : partition.length() + 1;
     final String fileName = filePathWithPartition.substring(offset);
-    if (!FSUtils.isBaseFile(new Path(fileName))) {
-      return Stream.empty();
-    }
 
     if (filePathWithPartition.endsWith(HoodieFileFormat.PARQUET.getFileExtension())) {
       List<HoodieColumnRangeMetadata<Comparable>> columnRangeMetadataList = new ArrayList<>();
       final Path fullFilePath = new Path(datasetMetaClient.getBasePath(), filePathWithPartition);
       if (!isDeleted) {

Review comment:
       Deleted files handling is invariant of the file format, right?

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -941,4 +978,72 @@ public static int getPartitionFileGroupCount(final MetadataPartitionType partiti
     }
   }
 
+  /**
+   * Accumulates column range metadata for the given field and updates the column range map.
+   *
+   * @param field          - column for which statistics will be computed
+   * @param filePath       - data file path
+   * @param columnRangeMap - old column range statistics, which will be merged in this computation
+   * @param columnToStats  - map of column to map of each stat and its value
+   */
+  public static void accumulateColumnRanges(Schema.Field field, String filePath,
+                                            Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap,
+                                            Map<String, Map<String, Object>> columnToStats) {
+    Map<String, Object> columnStats = columnToStats.get(field.name());
+    HoodieColumnRangeMetadata<Comparable> columnRangeMetadata = new HoodieColumnRangeMetadata<>(
+        filePath,
+        field.name(),
+        String.valueOf(columnStats.get(MIN)),
+        String.valueOf(columnStats.get(MAX)),
+        Long.parseLong(columnStats.getOrDefault(NULL_COUNT, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(VALUE_COUNT, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(TOTAL_SIZE, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(TOTAL_UNCOMPRESSED_SIZE, 0).toString())
+    );
+    columnRangeMap.merge(field.name(), columnRangeMetadata, COLUMN_RANGE_MERGE_FUNCTION);
+  }
+
+  /**
+   * Aggregates column stats for each field.
+   *
+   * @param record                            - current record
+   * @param schema                            - write schema
+   * @param columnToStats                     - map of column to map of each stat and its value which gets updates in this method
+   * @param consistentLogicalTimestampEnabled - flag to deal with logical timestamp type when getting column value
+   */
+  public static void aggregateColumnStats(IndexedRecord record, Schema schema,
+                                          Map<String, Map<String, Object>> columnToStats,
+                                          boolean consistentLogicalTimestampEnabled) {
+    if (!(record instanceof GenericRecord)) {
+      throw new HoodieIOException("Record is not a generic type to get column range metadata!");
+    }
+
+    schema.getFields().forEach(field -> {
+      Map<String, Object> columnStats = columnToStats.getOrDefault(field.name(), new HashMap<>());
+      final String fieldVal = getNestedFieldValAsString((GenericRecord) record, field.name(), true, consistentLogicalTimestampEnabled);
+      // update stats
+      final int fieldSize = fieldVal == null ? 0 : fieldVal.length();
+      columnStats.put(TOTAL_SIZE, Long.parseLong(columnStats.getOrDefault(TOTAL_SIZE, 0).toString()) + fieldSize);
+      columnStats.put(TOTAL_UNCOMPRESSED_SIZE, Long.parseLong(columnStats.getOrDefault(TOTAL_UNCOMPRESSED_SIZE, 0).toString()) + fieldSize);
+
+      if (!StringUtils.isNullOrEmpty(fieldVal)) {
+        // set the min value of the field
+        if (!columnStats.containsKey(MIN)) {
+          columnStats.put(MIN, fieldVal);
+        }
+        if (fieldVal.compareTo(String.valueOf(columnStats.get(MIN))) < 0) {
+          columnStats.put(MIN, fieldVal);
+        }
+        // set the max value of the field
+        if (fieldVal.compareTo(String.valueOf(columnStats.getOrDefault(MAX, ""))) > 0) {
+          columnStats.put(MAX, fieldVal);

Review comment:
       We don't need Map for that, right? Let's instead create mutable object with all the statistics that we're collecting:
   
   ```
   class FileColumnStats {
     Object min, max;
     long count, totalSize;
     // ...
   }
   ```

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -941,4 +978,72 @@ public static int getPartitionFileGroupCount(final MetadataPartitionType partiti
     }
   }
 
+  /**
+   * Accumulates column range metadata for the given field and updates the column range map.
+   *
+   * @param field          - column for which statistics will be computed
+   * @param filePath       - data file path
+   * @param columnRangeMap - old column range statistics, which will be merged in this computation
+   * @param columnToStats  - map of column to map of each stat and its value
+   */
+  public static void accumulateColumnRanges(Schema.Field field, String filePath,
+                                            Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap,
+                                            Map<String, Map<String, Object>> columnToStats) {
+    Map<String, Object> columnStats = columnToStats.get(field.name());
+    HoodieColumnRangeMetadata<Comparable> columnRangeMetadata = new HoodieColumnRangeMetadata<>(
+        filePath,
+        field.name(),
+        String.valueOf(columnStats.get(MIN)),
+        String.valueOf(columnStats.get(MAX)),
+        Long.parseLong(columnStats.getOrDefault(NULL_COUNT, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(VALUE_COUNT, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(TOTAL_SIZE, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(TOTAL_UNCOMPRESSED_SIZE, 0).toString())
+    );
+    columnRangeMap.merge(field.name(), columnRangeMetadata, COLUMN_RANGE_MERGE_FUNCTION);
+  }
+
+  /**
+   * Aggregates column stats for each field.
+   *
+   * @param record                            - current record
+   * @param schema                            - write schema
+   * @param columnToStats                     - map of column to map of each stat and its value which gets updates in this method
+   * @param consistentLogicalTimestampEnabled - flag to deal with logical timestamp type when getting column value
+   */
+  public static void aggregateColumnStats(IndexedRecord record, Schema schema,
+                                          Map<String, Map<String, Object>> columnToStats,
+                                          boolean consistentLogicalTimestampEnabled) {
+    if (!(record instanceof GenericRecord)) {
+      throw new HoodieIOException("Record is not a generic type to get column range metadata!");
+    }
+
+    schema.getFields().forEach(field -> {
+      Map<String, Object> columnStats = columnToStats.getOrDefault(field.name(), new HashMap<>());

Review comment:
       Please avoid such `HashMap` allocations, since this is just churning objects 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieDeltaWriteStat.java
##########
@@ -69,4 +73,24 @@ public void addLogFiles(String logFile) {
   public List<String> getLogFiles() {
     return logFiles;
   }
+
+  public void setRecordsStats(RecordsStats<? extends Map> stats) {
+    recordsStats = Option.of(stats);
+  }
+
+  public Option<RecordsStats<? extends Map>> getRecordsStats() {
+    return recordsStats;
+  }
+
+  public static class RecordsStats<T> implements Serializable {

Review comment:
       @codope i'm concerned about it as an abstraction that isn't bringing much value, while increasing complexity: It adds cognitive load to understand what it does for anybody interacting with it.
   
   In general, i'd suggest to follow the principle to _keep things as simple as possible, but no simpler than needed to solve the problem_. It helps on many fronts:
   
   1. Makes the code easier to comprehend
   2. Makes component evolution easier (the simpler things are, the easier it is to evolve them)
   3. Makes component age better: if things change and we need to refactor it -- the simpler the system is, the easier the refactoring will be
   

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -651,6 +641,14 @@ private void initializeFileGroups(HoodieTableMetaClient dataMetaClient, Metadata
     }
   }
 
+  private MetadataRecordsGenerationParams getRecordsGenerationParams() {
+    return new MetadataRecordsGenerationParams(

Review comment:
       BTW, why do we even need this component if we can just get all of this from the Writer Config?

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -941,4 +978,72 @@ public static int getPartitionFileGroupCount(final MetadataPartitionType partiti
     }
   }
 
+  /**
+   * Accumulates column range metadata for the given field and updates the column range map.
+   *
+   * @param field          - column for which statistics will be computed
+   * @param filePath       - data file path
+   * @param columnRangeMap - old column range statistics, which will be merged in this computation
+   * @param columnToStats  - map of column to map of each stat and its value
+   */
+  public static void accumulateColumnRanges(Schema.Field field, String filePath,
+                                            Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap,
+                                            Map<String, Map<String, Object>> columnToStats) {
+    Map<String, Object> columnStats = columnToStats.get(field.name());
+    HoodieColumnRangeMetadata<Comparable> columnRangeMetadata = new HoodieColumnRangeMetadata<>(
+        filePath,
+        field.name(),
+        String.valueOf(columnStats.get(MIN)),
+        String.valueOf(columnStats.get(MAX)),
+        Long.parseLong(columnStats.getOrDefault(NULL_COUNT, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(VALUE_COUNT, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(TOTAL_SIZE, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(TOTAL_UNCOMPRESSED_SIZE, 0).toString())
+    );
+    columnRangeMap.merge(field.name(), columnRangeMetadata, COLUMN_RANGE_MERGE_FUNCTION);
+  }
+
+  /**
+   * Aggregates column stats for each field.
+   *
+   * @param record                            - current record
+   * @param schema                            - write schema
+   * @param columnToStats                     - map of column to map of each stat and its value which gets updates in this method
+   * @param consistentLogicalTimestampEnabled - flag to deal with logical timestamp type when getting column value
+   */
+  public static void aggregateColumnStats(IndexedRecord record, Schema schema,
+                                          Map<String, Map<String, Object>> columnToStats,
+                                          boolean consistentLogicalTimestampEnabled) {
+    if (!(record instanceof GenericRecord)) {
+      throw new HoodieIOException("Record is not a generic type to get column range metadata!");
+    }
+
+    schema.getFields().forEach(field -> {
+      Map<String, Object> columnStats = columnToStats.getOrDefault(field.name(), new HashMap<>());
+      final String fieldVal = getNestedFieldValAsString((GenericRecord) record, field.name(), true, consistentLogicalTimestampEnabled);
+      // update stats
+      final int fieldSize = fieldVal == null ? 0 : fieldVal.length();
+      columnStats.put(TOTAL_SIZE, Long.parseLong(columnStats.getOrDefault(TOTAL_SIZE, 0).toString()) + fieldSize);

Review comment:
       Why do we need to `parseLong` every time?

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -867,41 +889,56 @@ public static HoodieTableFileSystemView getFileSystemView(HoodieTableMetaClient
     }
   }
 
-  private static List<String> getLatestColumns(HoodieTableMetaClient datasetMetaClient) {
-    return getLatestColumns(datasetMetaClient, false);
+  public static HoodieMetadataColumnStats mergeColumnStats(HoodieMetadataColumnStats oldColumnStats, HoodieMetadataColumnStats newColumnStats) {
+    ValidationUtils.checkArgument(oldColumnStats.getFileName().equals(newColumnStats.getFileName()));
+    if (newColumnStats.getIsDeleted()) {

Review comment:
       We need to handle inverse case as well -- when existing records is a deleted one, otherwise we will merge incorrectly

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##########
@@ -941,4 +978,72 @@ public static int getPartitionFileGroupCount(final MetadataPartitionType partiti
     }
   }
 
+  /**
+   * Accumulates column range metadata for the given field and updates the column range map.
+   *
+   * @param field          - column for which statistics will be computed
+   * @param filePath       - data file path
+   * @param columnRangeMap - old column range statistics, which will be merged in this computation
+   * @param columnToStats  - map of column to map of each stat and its value
+   */
+  public static void accumulateColumnRanges(Schema.Field field, String filePath,
+                                            Map<String, HoodieColumnRangeMetadata<Comparable>> columnRangeMap,
+                                            Map<String, Map<String, Object>> columnToStats) {
+    Map<String, Object> columnStats = columnToStats.get(field.name());
+    HoodieColumnRangeMetadata<Comparable> columnRangeMetadata = new HoodieColumnRangeMetadata<>(
+        filePath,
+        field.name(),
+        String.valueOf(columnStats.get(MIN)),
+        String.valueOf(columnStats.get(MAX)),
+        Long.parseLong(columnStats.getOrDefault(NULL_COUNT, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(VALUE_COUNT, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(TOTAL_SIZE, 0).toString()),
+        Long.parseLong(columnStats.getOrDefault(TOTAL_UNCOMPRESSED_SIZE, 0).toString())
+    );
+    columnRangeMap.merge(field.name(), columnRangeMetadata, COLUMN_RANGE_MERGE_FUNCTION);
+  }
+
+  /**
+   * Aggregates column stats for each field.
+   *
+   * @param record                            - current record
+   * @param schema                            - write schema
+   * @param columnToStats                     - map of column to map of each stat and its value which gets updates in this method
+   * @param consistentLogicalTimestampEnabled - flag to deal with logical timestamp type when getting column value
+   */
+  public static void aggregateColumnStats(IndexedRecord record, Schema schema,
+                                          Map<String, Map<String, Object>> columnToStats,
+                                          boolean consistentLogicalTimestampEnabled) {
+    if (!(record instanceof GenericRecord)) {
+      throw new HoodieIOException("Record is not a generic type to get column range metadata!");
+    }
+
+    schema.getFields().forEach(field -> {
+      Map<String, Object> columnStats = columnToStats.getOrDefault(field.name(), new HashMap<>());
+      final String fieldVal = getNestedFieldValAsString((GenericRecord) record, field.name(), true, consistentLogicalTimestampEnabled);
+      // update stats
+      final int fieldSize = fieldVal == null ? 0 : fieldVal.length();
+      columnStats.put(TOTAL_SIZE, Long.parseLong(columnStats.getOrDefault(TOTAL_SIZE, 0).toString()) + fieldSize);
+      columnStats.put(TOTAL_UNCOMPRESSED_SIZE, Long.parseLong(columnStats.getOrDefault(TOTAL_UNCOMPRESSED_SIZE, 0).toString()) + fieldSize);
+
+      if (!StringUtils.isNullOrEmpty(fieldVal)) {
+        // set the min value of the field
+        if (!columnStats.containsKey(MIN)) {
+          columnStats.put(MIN, fieldVal);
+        }
+        if (fieldVal.compareTo(String.valueOf(columnStats.get(MIN))) < 0) {

Review comment:
       We can leverage Parquet's comparators for that




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan merged pull request #4848: [HUDI-3258] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
nsivabalan merged pull request #4848:
URL: https://github.com/apache/hudi/pull/4848


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048481656


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 97f253e3d9ef2c8caf05810d42e5f54e7598d4de Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187) 
   * 19ba560542a8769475948561e2b607f85f70b548 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222) 
   * 125d2cd385219cb9187e0ce6ac90b00cfea863fc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r813086510



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
##########
@@ -320,7 +325,48 @@ private void updateWriteStatus(HoodieDeltaWriteStat stat, AppendResult result) {
     statuses.add(this.writeStatus);
   }
 
-  private void processAppendResult(AppendResult result) {
+  /**
+   * Get column statistics for the records part of this append handle.
+   *
+   * @param filePath       - Log file that records are part of
+   * @param recordList     - List of records appended to the log for which column statistics is needed for
+   * @param columnRangeMap - Output map to accumulate the column statistics for the records
+   */
+  private void getRecordsStats(final String filePath, List<IndexedRecord> recordList,

Review comment:
       renamed to `computeRecordStats`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on a change in pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
codope commented on a change in pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#discussion_r813086769



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -871,27 +879,39 @@ protected void bootstrapCommit(List<DirectoryInfo> partitionInfoList, String cre
         return HoodieMetadataPayload.createPartitionFilesRecord(
             partitionInfo.getRelativePath().isEmpty() ? NON_PARTITIONED_NAME : partitionInfo.getRelativePath(), Option.of(validFileNameToSizeMap), Option.empty());
       });
-      partitionRecords = partitionRecords.union(fileListRecords);
+      filesPartitionRecords = filesPartitionRecords.union(fileListRecords);

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1048604126


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     }, {
       "hash" : "19ba560542a8769475948561e2b607f85f70b548",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222",
       "triggerID" : "19ba560542a8769475948561e2b607f85f70b548",
       "triggerType" : "PUSH"
     }, {
       "hash" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229",
       "triggerID" : "125d2cd385219cb9187e0ce6ac90b00cfea863fc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 19ba560542a8769475948561e2b607f85f70b548 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6222) 
   * 125d2cd385219cb9187e0ce6ac90b00cfea863fc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6229) 
   * 48399d1f4e5fc3acf04ded4e9ed6e1fbfb34aebd UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4848: [HUDI-3356][HUDI-3203] HoodieData for metadata index records, bloom and colstats init

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4848:
URL: https://github.com/apache/hudi/pull/4848#issuecomment-1047401380


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6124",
       "triggerID" : "63aac434acbbbbd15223dc186635f963e97367e9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187",
       "triggerID" : "97f253e3d9ef2c8caf05810d42e5f54e7598d4de",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 97f253e3d9ef2c8caf05810d42e5f54e7598d4de Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6187) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org