You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/24 12:28:20 UTC

[GitHub] [hudi] wzx140 opened a new pull request, #7050: [Minor] update rfc46 doc

wzx140 opened a new pull request, #7050:
URL: https://github.com/apache/hudi/pull/7050

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1290964423

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528",
       "triggerID" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12571",
       "triggerID" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * be5e0e00a0c270f8bad12b51a6df412aedf45290 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528) 
   * e17d096bc34d48040aff8e2d5c6c372e84ea44b5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12571) 
   * 4dfd13bb18547c8f05621db58758845b0aaa9d0e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1290777334

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528",
       "triggerID" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12571",
       "triggerID" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * be5e0e00a0c270f8bad12b51a6df412aedf45290 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528) 
   * e17d096bc34d48040aff8e2d5c6c372e84ea44b5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12571) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1289070726

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528",
       "triggerID" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 788df1c2f85af37930fb568cdce494debaf85ab9 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527) 
   * be5e0e00a0c270f8bad12b51a6df412aedf45290 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin merged pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
alexeykudinkin merged PR #7050:
URL: https://github.com/apache/hudi/pull/7050


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1310487823

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528",
       "triggerID" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12571",
       "triggerID" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12574",
       "triggerID" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66c71c9f7832fd6554e2cad9dc087ab64b187e3c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "66c71c9f7832fd6554e2cad9dc087ab64b187e3c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4dfd13bb18547c8f05621db58758845b0aaa9d0e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12574) 
   * 66c71c9f7832fd6554e2cad9dc087ab64b187e3c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #7050:
URL: https://github.com/apache/hudi/pull/7050#discussion_r1003628096


##########
rfc/rfc-46/rfc-46.md:
##########
@@ -74,49 +74,115 @@ Following (high-level) steps are proposed:
    2. Split into interface and engine-specific implementations (holding internal engine-specific representation of the payload) 
    3. Implementing new standardized record-level APIs (like `getPartitionKey` , `getRecordKey`, etc)
    4. Staying **internal** component, that will **NOT** contain any user-defined semantic (like merging)
-2. Extract Record Combining (Merge) API from `HoodieRecordPayload` into a standalone, stateless component (engine). Such component will be
+2. Extract Record Merge API from `HoodieRecordPayload` into a standalone, stateless component. Such component will be
    1. Abstracted as stateless object providing API to combine records (according to predefined semantics) for engines (Spark, Flink) of interest
    2. Plug-in point for user-defined combination semantics
 3. Gradually deprecate, phase-out and eventually remove `HoodieRecordPayload` abstraction
 
 Phasing out usage of `HoodieRecordPayload` will also bring the benefit of avoiding to use Java reflection in the hot-path, which
 is known to have poor performance (compared to non-reflection based instantiation).
 
-#### Combine API Engine
+#### Record Merge API
 
-Stateless component interface providing for API Combining Records will look like following:
+CombineAndGetUpdateValue and Precombine will converge to one API. Stateless component interface providing for API Combining Records will look like following:
 
 ```java
-interface HoodieRecordCombiningEngine {
-  
-  default HoodieRecord precombine(HoodieRecord older, HoodieRecord newer) {
-    if (spark) {
-      precombineSpark((SparkHoodieRecord) older, (SparkHoodieRecord) newer);
-    } else if (flink) {
-      // precombine for Flink
-    }
-  }
+interface HoodieRecordMerger {
 
    /**
-    * Spark-specific implementation 
+    * The kind of merging strategy this recordMerger belongs to. A UUID represents merging strategy.
     */
-  SparkHoodieRecord precombineSpark(SparkHoodieRecord older, SparkHoodieRecord newer);
+   String getMergingStrategy();
   
-  // ...
+   // This method converges combineAndGetUpdateValue and precombine from HoodiePayload. 
+   // It'd be associative operation: f(a, f(b, c)) = f(f(a, b), c) (which we can translate as having 3 versions A, B, C of the single record, both orders of operations applications have to yield the same result)
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException;
+   
+   // The record type handled by the current merger
+   // SPARK, AVRO, FLINK
+   HoodieRecordType getRecordType();
+}
+
+/**
+ * Spark-specific implementation 
+ */
+class HoodieSparkRecordMerger implements HoodieRecordMerger {
+
+  @Override
+  public String getMergingStrategy() {
+    return UUID_MERGER_STRATEGY;
+  }
+  
+   @Override
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException {
+     // HoodieSparkRecord precombine and combineAndGetUpdateValue. It'd be associative operation.
+   }
+
+   @Override
+   HoodieRecordType getRecordType() {
+     return HoodieRecordType.SPARK;
+   }
+}
+   
+/**
+ * Flink-specific implementation 
+ */
+class HoodieFlinkRecordMerger implements HoodieRecordMerger {
+
+   @Override
+   public String getMergingStrategy() {
+      return UUID_MERGER_STRATEGY;

Review Comment:
   nit: Let's name this `LATEST_RECORD_MERGING_STRATEGY` to hint at what merging strategy really is



##########
rfc/rfc-46/rfc-46.md:
##########
@@ -74,49 +74,115 @@ Following (high-level) steps are proposed:
    2. Split into interface and engine-specific implementations (holding internal engine-specific representation of the payload) 
    3. Implementing new standardized record-level APIs (like `getPartitionKey` , `getRecordKey`, etc)
    4. Staying **internal** component, that will **NOT** contain any user-defined semantic (like merging)
-2. Extract Record Combining (Merge) API from `HoodieRecordPayload` into a standalone, stateless component (engine). Such component will be
+2. Extract Record Merge API from `HoodieRecordPayload` into a standalone, stateless component. Such component will be
    1. Abstracted as stateless object providing API to combine records (according to predefined semantics) for engines (Spark, Flink) of interest
    2. Plug-in point for user-defined combination semantics
 3. Gradually deprecate, phase-out and eventually remove `HoodieRecordPayload` abstraction
 
 Phasing out usage of `HoodieRecordPayload` will also bring the benefit of avoiding to use Java reflection in the hot-path, which
 is known to have poor performance (compared to non-reflection based instantiation).
 
-#### Combine API Engine
+#### Record Merge API
 
-Stateless component interface providing for API Combining Records will look like following:
+CombineAndGetUpdateValue and Precombine will converge to one API. Stateless component interface providing for API Combining Records will look like following:
 
 ```java
-interface HoodieRecordCombiningEngine {
-  
-  default HoodieRecord precombine(HoodieRecord older, HoodieRecord newer) {
-    if (spark) {
-      precombineSpark((SparkHoodieRecord) older, (SparkHoodieRecord) newer);
-    } else if (flink) {
-      // precombine for Flink
-    }
-  }
+interface HoodieRecordMerger {
 
    /**
-    * Spark-specific implementation 
+    * The kind of merging strategy this recordMerger belongs to. A UUID represents merging strategy.
     */
-  SparkHoodieRecord precombineSpark(SparkHoodieRecord older, SparkHoodieRecord newer);
+   String getMergingStrategy();

Review Comment:
   nit: let's name this `getMerginStrategyId`
   
   Let's also update the docs elaborating what this id is: 
    - Every RecordMerger implementation being engine-specific (referred to as "implementation"), implements particular merging semantic (referred to as "merging strategy")
    - Such tiering allowing us to be flexible in terms of providing implementations for the merging strategy only for engines you might be interested in
    - Merging strategy is a table property that is set once during init
    - Merging implementations could be configured for each write individually



##########
rfc/rfc-46/rfc-46.md:
##########
@@ -128,21 +194,73 @@ Following major components will be refactored:
 
 1. `HoodieWriteHandle`s will be  
    1. Accepting `HoodieRecord` instead of raw Avro payload (avoiding Avro conversion)
-   2. Using Combining API engine to merge records (when necessary) 
+   2. Using Record Merge API to merge records (when necessary) 
    3. Passes `HoodieRecord` as is to `FileWriter`
 2. `HoodieFileWriter`s will be 
    1. Accepting `HoodieRecord`
    2. Will be engine-specific (so that they're able to handle internal record representation)
 3. `HoodieRealtimeRecordReader`s 
    1. API will be returning opaque `HoodieRecord` instead of raw Avro payload
 
+### Config for RecordMerger
+The RecordMerger is engine-aware. We provide a config called HoodieWriteConfig.MERGER_IMPLS. You can set a list of RecordMerger class name to it. And you can set HoodieWriteConfig.MERGER_STRATEGY which is UUID of RecordMerger. Hudi will pick RecordMergers in MERGER_IMPLS which has the same MERGER_STRATEGY according to the engine type at runtime.
+
+### Public Api in HoodieRecord
+Because we implement different types of records, we need to implement functionality similar to AvroUtils in HoodieRecord for different data(avro, InternalRow, RowData).
+Its public API will look like following:
+
+```java
+import java.util.Properties;
+
+class HoodieRecord {
+
+   /**
+    * Get column in record to support RDDCustomColumnsSortPartitioner
+    */
+   ComparableList getComparableColumnValues(Schema recordSchema, String[] columns,

Review Comment:
   We made this `Object[] getColumnValues`, didn't we?



##########
rfc/rfc-46/rfc-46.md:
##########
@@ -128,21 +194,73 @@ Following major components will be refactored:
 
 1. `HoodieWriteHandle`s will be  
    1. Accepting `HoodieRecord` instead of raw Avro payload (avoiding Avro conversion)
-   2. Using Combining API engine to merge records (when necessary) 
+   2. Using Record Merge API to merge records (when necessary) 
    3. Passes `HoodieRecord` as is to `FileWriter`
 2. `HoodieFileWriter`s will be 
    1. Accepting `HoodieRecord`
    2. Will be engine-specific (so that they're able to handle internal record representation)
 3. `HoodieRealtimeRecordReader`s 
    1. API will be returning opaque `HoodieRecord` instead of raw Avro payload
 
+### Config for RecordMerger
+The RecordMerger is engine-aware. We provide a config called HoodieWriteConfig.MERGER_IMPLS. You can set a list of RecordMerger class name to it. And you can set HoodieWriteConfig.MERGER_STRATEGY which is UUID of RecordMerger. Hudi will pick RecordMergers in MERGER_IMPLS which has the same MERGER_STRATEGY according to the engine type at runtime.

Review Comment:
   Let's elaborate on what is relation b/w RM impls and Merging Strategy. Hinted above at the points we should call out.



##########
rfc/rfc-46/rfc-46.md:
##########
@@ -74,49 +74,115 @@ Following (high-level) steps are proposed:
    2. Split into interface and engine-specific implementations (holding internal engine-specific representation of the payload) 
    3. Implementing new standardized record-level APIs (like `getPartitionKey` , `getRecordKey`, etc)
    4. Staying **internal** component, that will **NOT** contain any user-defined semantic (like merging)
-2. Extract Record Combining (Merge) API from `HoodieRecordPayload` into a standalone, stateless component (engine). Such component will be
+2. Extract Record Merge API from `HoodieRecordPayload` into a standalone, stateless component. Such component will be
    1. Abstracted as stateless object providing API to combine records (according to predefined semantics) for engines (Spark, Flink) of interest
    2. Plug-in point for user-defined combination semantics
 3. Gradually deprecate, phase-out and eventually remove `HoodieRecordPayload` abstraction
 
 Phasing out usage of `HoodieRecordPayload` will also bring the benefit of avoiding to use Java reflection in the hot-path, which
 is known to have poor performance (compared to non-reflection based instantiation).
 
-#### Combine API Engine
+#### Record Merge API
 
-Stateless component interface providing for API Combining Records will look like following:
+CombineAndGetUpdateValue and Precombine will converge to one API. Stateless component interface providing for API Combining Records will look like following:
 
 ```java
-interface HoodieRecordCombiningEngine {
-  
-  default HoodieRecord precombine(HoodieRecord older, HoodieRecord newer) {
-    if (spark) {
-      precombineSpark((SparkHoodieRecord) older, (SparkHoodieRecord) newer);
-    } else if (flink) {
-      // precombine for Flink
-    }
-  }
+interface HoodieRecordMerger {
 
    /**
-    * Spark-specific implementation 
+    * The kind of merging strategy this recordMerger belongs to. A UUID represents merging strategy.
     */
-  SparkHoodieRecord precombineSpark(SparkHoodieRecord older, SparkHoodieRecord newer);
+   String getMergingStrategy();
   
-  // ...
+   // This method converges combineAndGetUpdateValue and precombine from HoodiePayload. 
+   // It'd be associative operation: f(a, f(b, c)) = f(f(a, b), c) (which we can translate as having 3 versions A, B, C of the single record, both orders of operations applications have to yield the same result)
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException;
+   
+   // The record type handled by the current merger
+   // SPARK, AVRO, FLINK
+   HoodieRecordType getRecordType();
+}
+
+/**
+ * Spark-specific implementation 
+ */
+class HoodieSparkRecordMerger implements HoodieRecordMerger {
+
+  @Override
+  public String getMergingStrategy() {
+    return UUID_MERGER_STRATEGY;
+  }
+  
+   @Override
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException {
+     // HoodieSparkRecord precombine and combineAndGetUpdateValue. It'd be associative operation.
+   }
+
+   @Override
+   HoodieRecordType getRecordType() {
+     return HoodieRecordType.SPARK;
+   }
+}
+   
+/**
+ * Flink-specific implementation 
+ */
+class HoodieFlinkRecordMerger implements HoodieRecordMerger {
+
+   @Override
+   public String getMergingStrategy() {
+      return UUID_MERGER_STRATEGY;
+   }
+  
+   @Override
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException {
+      // HoodieFlinkRecord precombine and combineAndGetUpdateValue. It'd be associative operation.
+   }
+
+   @Override
+   HoodieRecordType getRecordType() {
+      return HoodieRecordType.FLINK;
+   }
 }
 ```
 Where user can provide their own subclass implementing such interface for the engines of interest.
 
-#### Migration from `HoodieRecordPayload` to `HoodieRecordCombiningEngine`
+#### Migration from `HoodieRecordPayload` to `HoodieRecordMerger`
 
 To warrant backward-compatibility (BWC) on the code-level with already created subclasses of `HoodieRecordPayload` currently
-already used in production by Hudi users, we will provide a BWC-bridge in the form of instance of `HoodieRecordCombiningEngine`, that will 
+already used in production by Hudi users, we will provide a BWC-bridge in the form of instance of `HoodieRecordMerger` called `HoodieAvroRecordMerger`, that will 
 be using user-defined subclass of `HoodieRecordPayload` to combine the records.
 
-Leveraging such bridge will make provide for seamless BWC migration to the 0.11 release, however will be removing the performance 
+Leveraging such bridge will provide for seamless BWC migration to the 0.11 release, however will be removing the performance 
 benefit of this refactoring, since it would unavoidably have to perform conversion to intermediate representation (Avro). To realize
 full-suite of benefits of this refactoring, users will have to migrate their merging logic out of `HoodieRecordPayload` subclass and into
-new `HoodieRecordCombiningEngine` implementation.
+new `HoodieRecordMerger` implementation.
+
+Precombine is used to merge records from logs or incoming records; CombineAndGetUpdateValue is used to merge record from log file and record from base file.
+these two merge logics are unified in HoodieAvroRecordMerger as merge function. `HoodieAvroRecordMerger`'s API will look like following:

Review Comment:
   "unified in HoodieRecordMerger implementation"



##########
rfc/rfc-46/rfc-46.md:
##########
@@ -128,21 +194,73 @@ Following major components will be refactored:
 
 1. `HoodieWriteHandle`s will be  
    1. Accepting `HoodieRecord` instead of raw Avro payload (avoiding Avro conversion)
-   2. Using Combining API engine to merge records (when necessary) 
+   2. Using Record Merge API to merge records (when necessary) 
    3. Passes `HoodieRecord` as is to `FileWriter`
 2. `HoodieFileWriter`s will be 
    1. Accepting `HoodieRecord`
    2. Will be engine-specific (so that they're able to handle internal record representation)
 3. `HoodieRealtimeRecordReader`s 
    1. API will be returning opaque `HoodieRecord` instead of raw Avro payload
 
+### Config for RecordMerger
+The RecordMerger is engine-aware. We provide a config called HoodieWriteConfig.MERGER_IMPLS. You can set a list of RecordMerger class name to it. And you can set HoodieWriteConfig.MERGER_STRATEGY which is UUID of RecordMerger. Hudi will pick RecordMergers in MERGER_IMPLS which has the same MERGER_STRATEGY according to the engine type at runtime.
+
+### Public Api in HoodieRecord
+Because we implement different types of records, we need to implement functionality similar to AvroUtils in HoodieRecord for different data(avro, InternalRow, RowData).

Review Comment:
   "for different engine-specific payload representations (GenericRecord, InternalRow, RowData)"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #7050:
URL: https://github.com/apache/hudi/pull/7050#discussion_r1016074908


##########
rfc/rfc-46/rfc-46.md:
##########
@@ -74,49 +74,122 @@ Following (high-level) steps are proposed:
    2. Split into interface and engine-specific implementations (holding internal engine-specific representation of the payload) 
    3. Implementing new standardized record-level APIs (like `getPartitionKey` , `getRecordKey`, etc)
    4. Staying **internal** component, that will **NOT** contain any user-defined semantic (like merging)
-2. Extract Record Combining (Merge) API from `HoodieRecordPayload` into a standalone, stateless component (engine). Such component will be
+2. Extract Record Merge API from `HoodieRecordPayload` into a standalone, stateless component. Such component will be
    1. Abstracted as stateless object providing API to combine records (according to predefined semantics) for engines (Spark, Flink) of interest
    2. Plug-in point for user-defined combination semantics
 3. Gradually deprecate, phase-out and eventually remove `HoodieRecordPayload` abstraction
 
 Phasing out usage of `HoodieRecordPayload` will also bring the benefit of avoiding to use Java reflection in the hot-path, which
 is known to have poor performance (compared to non-reflection based instantiation).
 
-#### Combine API Engine
+#### Record Merge API
 
-Stateless component interface providing for API Combining Records will look like following:
+CombineAndGetUpdateValue and Precombine will converge to one API. Stateless component interface providing for API Combining Records will look like following:
 
 ```java
-interface HoodieRecordCombiningEngine {
-  
-  default HoodieRecord precombine(HoodieRecord older, HoodieRecord newer) {
-    if (spark) {
-      precombineSpark((SparkHoodieRecord) older, (SparkHoodieRecord) newer);
-    } else if (flink) {
-      // precombine for Flink
-    }
-  }
+interface HoodieRecordMerger {
 
    /**
-    * Spark-specific implementation 
+    * The kind of merging strategy this recordMerger belongs to. A UUID represents merging strategy.
     */
-  SparkHoodieRecord precombineSpark(SparkHoodieRecord older, SparkHoodieRecord newer);
+   String getMergingStrategyId();
   
-  // ...
+   // This method converges combineAndGetUpdateValue and precombine from HoodiePayload. 
+   // It'd be associative operation: f(a, f(b, c)) = f(f(a, b), c) (which we can translate as having 3 versions A, B, C of the single record, both orders of operations applications have to yield the same result)
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException;
+   
+   // The record type handled by the current merger
+   // SPARK, AVRO, FLINK
+   HoodieRecordType getRecordType();
+}
+
+/**
+ * Spark-specific implementation 
+ */
+class HoodieSparkRecordMerger implements HoodieRecordMerger {
+
+  @Override
+  public String getMergingStrategyId() {
+    return LATEST_RECORD_MERGING_STRATEGY;
+  }
+  
+   @Override
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException {
+     // HoodieSparkRecord precombine and combineAndGetUpdateValue. It'd be associative operation.

Review Comment:
   I'd suggest to replace this w/ just: "Implements particular merging semantic natively for Spark row representation encapsulated wrapped around in HoodieSparkRecord"



##########
rfc/rfc-46/rfc-46.md:
##########
@@ -128,21 +201,70 @@ Following major components will be refactored:
 
 1. `HoodieWriteHandle`s will be  
    1. Accepting `HoodieRecord` instead of raw Avro payload (avoiding Avro conversion)
-   2. Using Combining API engine to merge records (when necessary) 
+   2. Using Record Merge API to merge records (when necessary) 
    3. Passes `HoodieRecord` as is to `FileWriter`
 2. `HoodieFileWriter`s will be 
    1. Accepting `HoodieRecord`
    2. Will be engine-specific (so that they're able to handle internal record representation)
 3. `HoodieRealtimeRecordReader`s 
    1. API will be returning opaque `HoodieRecord` instead of raw Avro payload
 
+### Public Api in HoodieRecord
+Because we implement different types of records, we need to implement functionality similar to AvroUtils in HoodieRecord for different engine-specific payload representations (GenericRecord, InternalRow, RowData).
+Its public API will look like following:
+
+```java
+import java.util.Properties;

Review Comment:
   No need for imports



##########
rfc/rfc-46/rfc-46.md:
##########
@@ -74,49 +74,122 @@ Following (high-level) steps are proposed:
    2. Split into interface and engine-specific implementations (holding internal engine-specific representation of the payload) 
    3. Implementing new standardized record-level APIs (like `getPartitionKey` , `getRecordKey`, etc)
    4. Staying **internal** component, that will **NOT** contain any user-defined semantic (like merging)
-2. Extract Record Combining (Merge) API from `HoodieRecordPayload` into a standalone, stateless component (engine). Such component will be
+2. Extract Record Merge API from `HoodieRecordPayload` into a standalone, stateless component. Such component will be
    1. Abstracted as stateless object providing API to combine records (according to predefined semantics) for engines (Spark, Flink) of interest
    2. Plug-in point for user-defined combination semantics
 3. Gradually deprecate, phase-out and eventually remove `HoodieRecordPayload` abstraction
 
 Phasing out usage of `HoodieRecordPayload` will also bring the benefit of avoiding to use Java reflection in the hot-path, which
 is known to have poor performance (compared to non-reflection based instantiation).
 
-#### Combine API Engine
+#### Record Merge API
 
-Stateless component interface providing for API Combining Records will look like following:
+CombineAndGetUpdateValue and Precombine will converge to one API. Stateless component interface providing for API Combining Records will look like following:
 
 ```java
-interface HoodieRecordCombiningEngine {
-  
-  default HoodieRecord precombine(HoodieRecord older, HoodieRecord newer) {
-    if (spark) {
-      precombineSpark((SparkHoodieRecord) older, (SparkHoodieRecord) newer);
-    } else if (flink) {
-      // precombine for Flink
-    }
-  }
+interface HoodieRecordMerger {
 
    /**
-    * Spark-specific implementation 
+    * The kind of merging strategy this recordMerger belongs to. A UUID represents merging strategy.
     */
-  SparkHoodieRecord precombineSpark(SparkHoodieRecord older, SparkHoodieRecord newer);
+   String getMergingStrategyId();
   
-  // ...
+   // This method converges combineAndGetUpdateValue and precombine from HoodiePayload. 
+   // It'd be associative operation: f(a, f(b, c)) = f(f(a, b), c) (which we can translate as having 3 versions A, B, C of the single record, both orders of operations applications have to yield the same result)
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException;
+   
+   // The record type handled by the current merger
+   // SPARK, AVRO, FLINK
+   HoodieRecordType getRecordType();
+}
+
+/**
+ * Spark-specific implementation 
+ */
+class HoodieSparkRecordMerger implements HoodieRecordMerger {
+
+  @Override
+  public String getMergingStrategyId() {
+    return LATEST_RECORD_MERGING_STRATEGY;
+  }
+  
+   @Override
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException {
+     // HoodieSparkRecord precombine and combineAndGetUpdateValue. It'd be associative operation.
+   }
+
+   @Override
+   HoodieRecordType getRecordType() {
+     return HoodieRecordType.SPARK;
+   }
+}
+   
+/**
+ * Flink-specific implementation 
+ */
+class HoodieFlinkRecordMerger implements HoodieRecordMerger {
+
+   @Override
+   public String getMergingStrategyId() {
+      return LATEST_RECORD_MERGING_STRATEGY;
+   }
+  
+   @Override
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException {
+      // HoodieFlinkRecord precombine and combineAndGetUpdateValue. It'd be associative operation.

Review Comment:
   Same as above



##########
rfc/rfc-46/rfc-46.md:
##########
@@ -74,49 +74,122 @@ Following (high-level) steps are proposed:
    2. Split into interface and engine-specific implementations (holding internal engine-specific representation of the payload) 
    3. Implementing new standardized record-level APIs (like `getPartitionKey` , `getRecordKey`, etc)
    4. Staying **internal** component, that will **NOT** contain any user-defined semantic (like merging)
-2. Extract Record Combining (Merge) API from `HoodieRecordPayload` into a standalone, stateless component (engine). Such component will be
+2. Extract Record Merge API from `HoodieRecordPayload` into a standalone, stateless component. Such component will be
    1. Abstracted as stateless object providing API to combine records (according to predefined semantics) for engines (Spark, Flink) of interest
    2. Plug-in point for user-defined combination semantics
 3. Gradually deprecate, phase-out and eventually remove `HoodieRecordPayload` abstraction
 
 Phasing out usage of `HoodieRecordPayload` will also bring the benefit of avoiding to use Java reflection in the hot-path, which
 is known to have poor performance (compared to non-reflection based instantiation).
 
-#### Combine API Engine
+#### Record Merge API
 
-Stateless component interface providing for API Combining Records will look like following:
+CombineAndGetUpdateValue and Precombine will converge to one API. Stateless component interface providing for API Combining Records will look like following:
 
 ```java
-interface HoodieRecordCombiningEngine {
-  
-  default HoodieRecord precombine(HoodieRecord older, HoodieRecord newer) {
-    if (spark) {
-      precombineSpark((SparkHoodieRecord) older, (SparkHoodieRecord) newer);
-    } else if (flink) {
-      // precombine for Flink
-    }
-  }
+interface HoodieRecordMerger {
 
    /**
-    * Spark-specific implementation 
+    * The kind of merging strategy this recordMerger belongs to. A UUID represents merging strategy.
     */
-  SparkHoodieRecord precombineSpark(SparkHoodieRecord older, SparkHoodieRecord newer);
+   String getMergingStrategyId();
   
-  // ...
+   // This method converges combineAndGetUpdateValue and precombine from HoodiePayload. 
+   // It'd be associative operation: f(a, f(b, c)) = f(f(a, b), c) (which we can translate as having 3 versions A, B, C of the single record, both orders of operations applications have to yield the same result)
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException;
+   
+   // The record type handled by the current merger
+   // SPARK, AVRO, FLINK
+   HoodieRecordType getRecordType();
+}
+
+/**
+ * Spark-specific implementation 
+ */
+class HoodieSparkRecordMerger implements HoodieRecordMerger {
+
+  @Override
+  public String getMergingStrategyId() {
+    return LATEST_RECORD_MERGING_STRATEGY;
+  }
+  
+   @Override
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException {
+     // HoodieSparkRecord precombine and combineAndGetUpdateValue. It'd be associative operation.
+   }
+
+   @Override
+   HoodieRecordType getRecordType() {
+     return HoodieRecordType.SPARK;
+   }
+}
+   
+/**
+ * Flink-specific implementation 
+ */
+class HoodieFlinkRecordMerger implements HoodieRecordMerger {
+
+   @Override
+   public String getMergingStrategyId() {
+      return LATEST_RECORD_MERGING_STRATEGY;
+   }
+  
+   @Override
+   Option<HoodieRecord> merge(HoodieRecord older, HoodieRecord newer, Schema schema, Properties props) throws IOException {
+      // HoodieFlinkRecord precombine and combineAndGetUpdateValue. It'd be associative operation.
+   }
+
+   @Override
+   HoodieRecordType getRecordType() {
+      return HoodieRecordType.FLINK;
+   }
 }
 ```
 Where user can provide their own subclass implementing such interface for the engines of interest.
 
-#### Migration from `HoodieRecordPayload` to `HoodieRecordCombiningEngine`
+### Merging Strategy
+The RecordMerger is engine-aware. We provide a config called HoodieWriteConfig.MERGER_IMPLS. You can set a list of RecordMerger class name to it. And you can set HoodieWriteConfig.MERGER_STRATEGY which is UUID of RecordMerger. Hudi will pick RecordMergers in MERGER_IMPLS which has the same MERGER_STRATEGY according to the engine type at runtime.
+- Every RecordMerger implementation being engine-specific (referred to as "implementation"), implements particular merging semantic (referred to as "merging strategy")
+- Such tiering allowing us to be flexible in terms of providing implementations for the merging strategy only for engines you might be interested in
+- Merging strategy is a table property that is set once during init
+- Merging implementations could be configured for each write individually
+
+#### Migration from `HoodieRecordPayload` to `HoodieRecordMerger`
 
 To warrant backward-compatibility (BWC) on the code-level with already created subclasses of `HoodieRecordPayload` currently
-already used in production by Hudi users, we will provide a BWC-bridge in the form of instance of `HoodieRecordCombiningEngine`, that will 
+already used in production by Hudi users, we will provide a BWC-bridge in the form of instance of `HoodieRecordMerger` called `HoodieAvroRecordMerger`, that will 
 be using user-defined subclass of `HoodieRecordPayload` to combine the records.
 
-Leveraging such bridge will make provide for seamless BWC migration to the 0.11 release, however will be removing the performance 
+Leveraging such bridge will provide for seamless BWC migration to the 0.11 release, however will be removing the performance 
 benefit of this refactoring, since it would unavoidably have to perform conversion to intermediate representation (Avro). To realize
 full-suite of benefits of this refactoring, users will have to migrate their merging logic out of `HoodieRecordPayload` subclass and into
-new `HoodieRecordCombiningEngine` implementation.
+new `HoodieRecordMerger` implementation.
+
+Precombine is used to merge records from logs or incoming records; CombineAndGetUpdateValue is used to merge record from log file and record from base file.

Review Comment:
   Let's rephrase this to make it more clear that we deprecate `preCombine` and `combineAndGetUpdateValue`:
   
   ```
   Previously, we used to have separate methods for merging:
   
     - `preCombine` was used to either deduplicate records in a batch or merge ones coming from delta-logs, while
     - `combineAndGetUpdateValue` was used to combine incoming record w/ the one persisted in storage
   
   Now both of these methods semantics are unified in a single `merge` API w/in the `RecordMerger`, which is required to be _associative_ operation to be able to take on semantics of both `preCombine` and `combineAndGetUpdateValue`
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1310708904

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528",
       "triggerID" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12571",
       "triggerID" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12574",
       "triggerID" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66c71c9f7832fd6554e2cad9dc087ab64b187e3c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12928",
       "triggerID" : "66c71c9f7832fd6554e2cad9dc087ab64b187e3c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 66c71c9f7832fd6554e2cad9dc087ab64b187e3c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12928) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1289062369

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 788df1c2f85af37930fb568cdce494debaf85ab9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527) 
   * be5e0e00a0c270f8bad12b51a6df412aedf45290 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1289458545

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528",
       "triggerID" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * be5e0e00a0c270f8bad12b51a6df412aedf45290 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1288975488

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 788df1c2f85af37930fb568cdce494debaf85ab9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1288968103

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 788df1c2f85af37930fb568cdce494debaf85ab9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1290769738

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528",
       "triggerID" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * be5e0e00a0c270f8bad12b51a6df412aedf45290 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528) 
   * e17d096bc34d48040aff8e2d5c6c372e84ea44b5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1290970569

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528",
       "triggerID" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12571",
       "triggerID" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12574",
       "triggerID" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e17d096bc34d48040aff8e2d5c6c372e84ea44b5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12571) 
   * 4dfd13bb18547c8f05621db58758845b0aaa9d0e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12574) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1291442799

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528",
       "triggerID" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12571",
       "triggerID" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12574",
       "triggerID" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4dfd13bb18547c8f05621db58758845b0aaa9d0e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12574) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7050: [MINOR] update rfc46 doc

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7050:
URL: https://github.com/apache/hudi/pull/7050#issuecomment-1310496364

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12527",
       "triggerID" : "788df1c2f85af37930fb568cdce494debaf85ab9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12528",
       "triggerID" : "be5e0e00a0c270f8bad12b51a6df412aedf45290",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12571",
       "triggerID" : "e17d096bc34d48040aff8e2d5c6c372e84ea44b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12574",
       "triggerID" : "4dfd13bb18547c8f05621db58758845b0aaa9d0e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "66c71c9f7832fd6554e2cad9dc087ab64b187e3c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12928",
       "triggerID" : "66c71c9f7832fd6554e2cad9dc087ab64b187e3c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4dfd13bb18547c8f05621db58758845b0aaa9d0e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12574) 
   * 66c71c9f7832fd6554e2cad9dc087ab64b187e3c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12928) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org