You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/11 04:29:08 UTC

[GitHub] [hudi] voonhous opened a new pull request, #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

voonhous opened a new pull request, #6915:
URL: https://github.com/apache/hudi/pull/6915

   ### Change Logs
   
   Relax the validation in checking primary key and preCombineField to allow nested field.
   
   Ensure that primaryKey and preCombineField definition is consistent with that of Spark SQL.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1274098338

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120",
       "triggerID" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 063adc8eb9514f64747f98b3cd1f28ddd30f430f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #6915:
URL: https://github.com/apache/hudi/pull/6915#discussion_r1037755466


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##########
@@ -120,6 +122,7 @@ public Set<ConfigOption<?>> optionalOptions() {
    */
   private void sanityCheck(Configuration conf, ResolvedSchema schema) {
     List<String> fields = schema.getColumnNames();
+    Schema inferredSchema = AvroSchemaConverter.convertToSchema(schema.toPhysicalRowDataType().notNull().getLogicalType());
 

Review Comment:
   There is no need to convert the `ResolvedSchema` as an avro schema for validation, the `ResolvedSchema#getColumnDataTypes` can fetch the data type of each field.
   
   Also we need to fix the `RowDataKeyGen#getRecordKey` for nested primary keys.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] voonhous closed pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by "voonhous (via GitHub)" <gi...@apache.org>.
voonhous closed pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql
URL: https://github.com/apache/hudi/pull/6915


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] voonhous commented on a diff in pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
voonhous commented on code in PR #6915:
URL: https://github.com/apache/hudi/pull/6915#discussion_r1037771871


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##########
@@ -120,6 +122,7 @@ public Set<ConfigOption<?>> optionalOptions() {
    */
   private void sanityCheck(Configuration conf, ResolvedSchema schema) {
     List<String> fields = schema.getColumnNames();
+    Schema inferredSchema = AvroSchemaConverter.convertToSchema(schema.toPhysicalRowDataType().notNull().getLogicalType());
 

Review Comment:
   Sure, I will take a look. 
   
   The main reasons for doing this is:
   1. `AvroSchemaConverter.convertToSchema` was already imported and used somewhere else in the code so, just reuse
   2. Convert it to an AvroSchema so that the helper functions can be written in HoodieAvroUtils, where the validation for creation of tables using the Spark as entrypoint can be reused.
   
   Let me try to see if we can use the `ResolvedSchema` instead, will get back to you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] voonhous commented on a diff in pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
voonhous commented on code in PR #6915:
URL: https://github.com/apache/hudi/pull/6915#discussion_r991886945


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/TestHoodieTableFactory.java:
##########
@@ -165,6 +165,35 @@ void testRequiredOptionsForSource() {
 
     assertDoesNotThrow(() -> new HoodieTableFactory().createDynamicTableSource(sourceContext6));
     assertDoesNotThrow(() -> new HoodieTableFactory().createDynamicTableSink(sourceContext6));
+
+    // nested pk field is allowed
+    ResolvedSchema schema6 = SchemaBuilder.instance()
+        .field("f0",
+            DataTypes.ROW(DataTypes.FIELD("id", DataTypes.INT()), DataTypes.FIELD("date", DataTypes.VARCHAR(20))))

Review Comment:
   Hmmm, do you mean add IT tests? 
   
   Or remove the UT i included here and rewrite them as IT tests?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1330332603

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120",
       "triggerID" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13307",
       "triggerID" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f4336b732de18867f653e43c7dba7cf73dd79ed1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f4336b732de18867f653e43c7dba7cf73dd79ed1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7727723e5b09a86d950aaa20b1ac77187680e1f0 UNKNOWN
   * bc1999e36789fe7273456a3fac8b7ae067778b45 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13307) 
   * f4336b732de18867f653e43c7dba7cf73dd79ed1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #6915:
URL: https://github.com/apache/hudi/pull/6915#discussion_r991881909


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/TestHoodieTableFactory.java:
##########
@@ -165,6 +165,35 @@ void testRequiredOptionsForSource() {
 
     assertDoesNotThrow(() -> new HoodieTableFactory().createDynamicTableSource(sourceContext6));
     assertDoesNotThrow(() -> new HoodieTableFactory().createDynamicTableSink(sourceContext6));
+
+    // nested pk field is allowed
+    ResolvedSchema schema6 = SchemaBuilder.instance()
+        .field("f0",
+            DataTypes.ROW(DataTypes.FIELD("id", DataTypes.INT()), DataTypes.FIELD("date", DataTypes.VARCHAR(20))))

Review Comment:
   Can we write a IT test in `ITTestHoodieDataSource`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1330414591

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120",
       "triggerID" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13307",
       "triggerID" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f4336b732de18867f653e43c7dba7cf73dd79ed1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f4336b732de18867f653e43c7dba7cf73dd79ed1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "300a0217d2e65a6ca250abc99f2f488d455cd84c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "300a0217d2e65a6ca250abc99f2f488d455cd84c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7727723e5b09a86d950aaa20b1ac77187680e1f0 UNKNOWN
   * bc1999e36789fe7273456a3fac8b7ae067778b45 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13307) 
   * f4336b732de18867f653e43c7dba7cf73dd79ed1 UNKNOWN
   * 300a0217d2e65a6ca250abc99f2f488d455cd84c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1330150865

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120",
       "triggerID" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13307",
       "triggerID" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 063adc8eb9514f64747f98b3cd1f28ddd30f430f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120) 
   * 7727723e5b09a86d950aaa20b1ac77187680e1f0 UNKNOWN
   * bc1999e36789fe7273456a3fac8b7ae067778b45 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13307) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1330745582

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120",
       "triggerID" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13307",
       "triggerID" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f4336b732de18867f653e43c7dba7cf73dd79ed1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f4336b732de18867f653e43c7dba7cf73dd79ed1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "300a0217d2e65a6ca250abc99f2f488d455cd84c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13313",
       "triggerID" : "300a0217d2e65a6ca250abc99f2f488d455cd84c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7727723e5b09a86d950aaa20b1ac77187680e1f0 UNKNOWN
   * f4336b732de18867f653e43c7dba7cf73dd79ed1 UNKNOWN
   * 300a0217d2e65a6ca250abc99f2f488d455cd84c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13313) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] voonhous commented on a diff in pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
voonhous commented on code in PR #6915:
URL: https://github.com/apache/hudi/pull/6915#discussion_r991886189


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##########
@@ -142,7 +143,7 @@ private void sanityCheck(Configuration conf, ResolvedSchema schema) {
 
     // validate pre_combine key
     String preCombineField = conf.get(FlinkOptions.PRECOMBINE_FIELD);
-    if (!fields.contains(preCombineField)) {
+    if (!fields.contains(getRootLevelFieldName(preCombineField))) {
       if (OptionsResolver.isDefaultHoodieRecordPayloadClazz(conf)) {

Review Comment:
   I agree. This change is to standardize the validations between Flink and Spark. As such, no checks on the nested field names were made.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1274095295

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 063adc8eb9514f64747f98b3cd1f28ddd30f430f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1330143346

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120",
       "triggerID" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 063adc8eb9514f64747f98b3cd1f28ddd30f430f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120) 
   * 7727723e5b09a86d950aaa20b1ac77187680e1f0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1330147096

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120",
       "triggerID" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 063adc8eb9514f64747f98b3cd1f28ddd30f430f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120) 
   * 7727723e5b09a86d950aaa20b1ac77187680e1f0 UNKNOWN
   * bc1999e36789fe7273456a3fac8b7ae067778b45 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] voonhous commented on a diff in pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
voonhous commented on code in PR #6915:
URL: https://github.com/apache/hudi/pull/6915#discussion_r1034344167


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##########
@@ -142,7 +143,7 @@ private void sanityCheck(Configuration conf, ResolvedSchema schema) {
 
     // validate pre_combine key
     String preCombineField = conf.get(FlinkOptions.PRECOMBINE_FIELD);
-    if (!fields.contains(preCombineField)) {
+    if (!fields.contains(getRootLevelFieldName(preCombineField))) {
       if (OptionsResolver.isDefaultHoodieRecordPayloadClazz(conf)) {

Review Comment:
   @danny0405 I have added the feature to validate nested fields for Flink. 
   
   Can you please help to review this PR again? 
   
   Thank you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] voonhous commented on a diff in pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
voonhous commented on code in PR #6915:
URL: https://github.com/apache/hudi/pull/6915#discussion_r991885011


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##########
@@ -132,7 +133,7 @@ private void sanityCheck(Configuration conf, ResolvedSchema schema) {
       }
 
       Arrays.stream(recordKeys)
-          .filter(field -> !fields.contains(field))
+          .filter(field -> !fields.contains(getRootLevelFieldName(field)))
           .findAny()

Review Comment:
   Uhm, from what i understand, Spark-SQL supports nested primaryKey and precombineField.
   
   The changes I made here are to standardize the validations in Spark and Flink.
   https://issues.apache.org/jira/browse/HUDI-4051
   https://github.com/apache/hudi/pull/5517/files



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##########
@@ -132,7 +133,7 @@ private void sanityCheck(Configuration conf, ResolvedSchema schema) {
       }
 
       Arrays.stream(recordKeys)
-          .filter(field -> !fields.contains(field))
+          .filter(field -> !fields.contains(getRootLevelFieldName(field)))
           .findAny()

Review Comment:
   Uhm, from what i understand, Spark-SQL supports nested primaryKey and precombineField.
   
   The changes I made here is to standardize the validations in Spark and Flink.
   https://issues.apache.org/jira/browse/HUDI-4051
   https://github.com/apache/hudi/pull/5517/files



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #6915:
URL: https://github.com/apache/hudi/pull/6915#discussion_r991881455


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##########
@@ -142,7 +143,7 @@ private void sanityCheck(Configuration conf, ResolvedSchema schema) {
 
     // validate pre_combine key
     String preCombineField = conf.get(FlinkOptions.PRECOMBINE_FIELD);
-    if (!fields.contains(preCombineField)) {
+    if (!fields.contains(getRootLevelFieldName(preCombineField))) {
       if (OptionsResolver.isDefaultHoodieRecordPayloadClazz(conf)) {

Review Comment:
   We may also need to validate the nested field names.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] voonhous commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by "voonhous (via GitHub)" <gi...@apache.org>.
voonhous commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1407950553

   Closing this PR as @hbgstc123 has already fixed this issue by disabling schema sanity checks prior to creating the source in this [PR](https://github.com/apache/hudi/pull/7608)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #6915:
URL: https://github.com/apache/hudi/pull/6915#discussion_r991880981


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##########
@@ -132,7 +133,7 @@ private void sanityCheck(Configuration conf, ResolvedSchema schema) {
       }
 
       Arrays.stream(recordKeys)
-          .filter(field -> !fields.contains(field))
+          .filter(field -> !fields.contains(getRootLevelFieldName(field)))
           .findAny()

Review Comment:
   Do we support nested primary key also in this patch ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] voonhous commented on a diff in pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
voonhous commented on code in PR #6915:
URL: https://github.com/apache/hudi/pull/6915#discussion_r991886945


##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/TestHoodieTableFactory.java:
##########
@@ -165,6 +165,35 @@ void testRequiredOptionsForSource() {
 
     assertDoesNotThrow(() -> new HoodieTableFactory().createDynamicTableSource(sourceContext6));
     assertDoesNotThrow(() -> new HoodieTableFactory().createDynamicTableSink(sourceContext6));
+
+    // nested pk field is allowed
+    ResolvedSchema schema6 = SchemaBuilder.instance()
+        .field("f0",
+            DataTypes.ROW(DataTypes.FIELD("id", DataTypes.INT()), DataTypes.FIELD("date", DataTypes.VARCHAR(20))))

Review Comment:
   Hmmm, do you mean add IT tests? 
   
   Or remove the UT i included here and write them as IT tests?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1274233282

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120",
       "triggerID" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 063adc8eb9514f64747f98b3cd1f28ddd30f430f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1330324492

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120",
       "triggerID" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13307",
       "triggerID" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f4336b732de18867f653e43c7dba7cf73dd79ed1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f4336b732de18867f653e43c7dba7cf73dd79ed1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 063adc8eb9514f64747f98b3cd1f28ddd30f430f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120) 
   * 7727723e5b09a86d950aaa20b1ac77187680e1f0 UNKNOWN
   * bc1999e36789fe7273456a3fac8b7ae067778b45 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13307) 
   * f4336b732de18867f653e43c7dba7cf73dd79ed1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6915:
URL: https://github.com/apache/hudi/pull/6915#issuecomment-1330423203

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12120",
       "triggerID" : "063adc8eb9514f64747f98b3cd1f28ddd30f430f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7727723e5b09a86d950aaa20b1ac77187680e1f0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13307",
       "triggerID" : "bc1999e36789fe7273456a3fac8b7ae067778b45",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f4336b732de18867f653e43c7dba7cf73dd79ed1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f4336b732de18867f653e43c7dba7cf73dd79ed1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "300a0217d2e65a6ca250abc99f2f488d455cd84c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13313",
       "triggerID" : "300a0217d2e65a6ca250abc99f2f488d455cd84c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7727723e5b09a86d950aaa20b1ac77187680e1f0 UNKNOWN
   * bc1999e36789fe7273456a3fac8b7ae067778b45 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13307) 
   * f4336b732de18867f653e43c7dba7cf73dd79ed1 UNKNOWN
   * 300a0217d2e65a6ca250abc99f2f488d455cd84c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=13313) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] voonhous commented on a diff in pull request #6915: [HUDI-5004] Allow nested field as primary key and preCombineField in flink sql

Posted by GitBox <gi...@apache.org>.
voonhous commented on code in PR #6915:
URL: https://github.com/apache/hudi/pull/6915#discussion_r1037771871


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##########
@@ -120,6 +122,7 @@ public Set<ConfigOption<?>> optionalOptions() {
    */
   private void sanityCheck(Configuration conf, ResolvedSchema schema) {
     List<String> fields = schema.getColumnNames();
+    Schema inferredSchema = AvroSchemaConverter.convertToSchema(schema.toPhysicalRowDataType().notNull().getLogicalType());
 

Review Comment:
   Sure, I will take a look. 
   
   The main reasons for doing this is:
   1. `AvroSchemaConverter.convertToSchema` was already imported and used somewhere else in the code so, just reuse
   2. Convert it to an AvroSchema so that the helper functions can be written in HoodieAvroUtils, where the validation for creation of tables using the Spark as entrypoint can be reused.
   
   Let me try to see if we can use the `ResolvedSchema` instead, will get back to you.
   
   >Also we need to fix the RowDataKeyGen#getRecordKey for nested primary keys.
   
   Got it!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org