You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2023/01/18 09:05:31 UTC

[GitHub] [hudi] lokeshj1703 opened a new pull request, #7695: Record key gen refactor

lokeshj1703 opened a new pull request, #7695:
URL: https://github.com/apache/hudi/pull/7695

   ### Change Logs
   
   [HUDI-5535](https://issues.apache.org/jira/browse/HUDI-5535) adds support for record key generation along w/ any partition path generation. It also separates the record key generation and partition path generation into separate interfaces.
   
   This jira aims to add similar support for the row writer path in spark.
   
   ### Impact
   
   Enables users to choose any record key generation along w/ any partition path generation strategy.
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7695:
URL: https://github.com/apache/hudi/pull/7695#issuecomment-1387121067

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14385",
       "triggerID" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9d044da669dca732e72268f576f42839ef4e4eab",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14392",
       "triggerID" : "9d044da669dca732e72268f576f42839ef4e4eab",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7c308cac45633c1025d8c951077d336f7102a3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14393",
       "triggerID" : "6a7c308cac45633c1025d8c951077d336f7102a3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d044da669dca732e72268f576f42839ef4e4eab Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14392) 
   * 6a7c308cac45633c1025d8c951077d336f7102a3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14393) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan closed pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer
URL: https://github.com/apache/hudi/pull/7695


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7695:
URL: https://github.com/apache/hudi/pull/7695#issuecomment-1387108971

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14385",
       "triggerID" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9d044da669dca732e72268f576f42839ef4e4eab",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14392",
       "triggerID" : "9d044da669dca732e72268f576f42839ef4e4eab",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7c308cac45633c1025d8c951077d336f7102a3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6a7c308cac45633c1025d8c951077d336f7102a3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 84b21fb47463b92406b4e5e3256144b628d6f8c6 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14385) 
   * 9d044da669dca732e72268f576f42839ef4e4eab Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14392) 
   * 6a7c308cac45633c1025d8c951077d336f7102a3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7695:
URL: https://github.com/apache/hudi/pull/7695#issuecomment-1386986642

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14385",
       "triggerID" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9d044da669dca732e72268f576f42839ef4e4eab",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14392",
       "triggerID" : "9d044da669dca732e72268f576f42839ef4e4eab",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 84b21fb47463b92406b4e5e3256144b628d6f8c6 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14385) 
   * 9d044da669dca732e72268f576f42839ef4e4eab Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14392) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #7695:
URL: https://github.com/apache/hudi/pull/7695#discussion_r1073789912


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java:
##########
@@ -333,11 +223,11 @@ static UTF8String toUTF8String(Object o) {
     }
   }
 
-  private static String toString(Object o) {
+  static String toString(Object o) {
     return o == null ? null : o.toString();
   }
 
-  private static String handleNullOrEmptyCompositeKeyPart(Object keyPart) {
+  static String handleNullOrEmptyCompositeKeyPart(Object keyPart) {

Review Comment:
   shouldn't we move this to CompositeSparkRecordKeyGenerator
   or add a separate Utils class where do move all such static methods and use it wherever required. 



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java:
##########
@@ -423,7 +313,7 @@ GenericRecord convertToAvro(InternalRow row) {
     }
   }
 
-  protected class SparkRowAccessor {
+  public class SparkRowAccessor {

Review Comment:
   lets move this to separate class. 



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java:
##########
@@ -127,6 +129,8 @@ protected void tryInitRowAccessor(StructType schema) {
       synchronized (this) {
         if (this.rowAccessor == null) {
           this.rowAccessor = new SparkRowAccessor(schema);
+          this.sparkRecordKeyGenerator = SparkRecordKeyGeneratorFactory.getSparkRecordKeyGenerator(config,
+              rowAccessor, recordKeyFields, this.getClass());

Review Comment:
   does this rightly pass the impl class ? did you confirm.



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java:
##########
@@ -127,6 +129,8 @@ protected void tryInitRowAccessor(StructType schema) {
       synchronized (this) {
         if (this.rowAccessor == null) {
           this.rowAccessor = new SparkRowAccessor(schema);
+          this.sparkRecordKeyGenerator = SparkRecordKeyGeneratorFactory.getSparkRecordKeyGenerator(config,
+              rowAccessor, recordKeyFields, this.getClass());

Review Comment:
   if not, we should move this to impl classes. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7695:
URL: https://github.com/apache/hudi/pull/7695#issuecomment-1386759249

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14385",
       "triggerID" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 84b21fb47463b92406b4e5e3256144b628d6f8c6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14385) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] lokeshj1703 commented on pull request #7695: Record key gen refactor

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on PR #7695:
URL: https://github.com/apache/hudi/pull/7695#issuecomment-1386716389

   I have refactored and created new classes for SparkRecordKeyGeneration. `ComplexSparkRecordKeyGenerator` and `CompositeSparkRecordKeyGenerator` are quite similar in functionality. It might be possible to simplify this code path and probably merge the two generators today.
   cc @alexeykudinkin @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7695:
URL: https://github.com/apache/hudi/pull/7695#issuecomment-1387505715

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14385",
       "triggerID" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9d044da669dca732e72268f576f42839ef4e4eab",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14392",
       "triggerID" : "9d044da669dca732e72268f576f42839ef4e4eab",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6a7c308cac45633c1025d8c951077d336f7102a3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14393",
       "triggerID" : "6a7c308cac45633c1025d8c951077d336f7102a3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6a7c308cac45633c1025d8c951077d336f7102a3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14393) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7695:
URL: https://github.com/apache/hudi/pull/7695#issuecomment-1386748888

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 84b21fb47463b92406b4e5e3256144b628d6f8c6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7695:
URL: https://github.com/apache/hudi/pull/7695#issuecomment-1386977241

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14385",
       "triggerID" : "84b21fb47463b92406b4e5e3256144b628d6f8c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9d044da669dca732e72268f576f42839ef4e4eab",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9d044da669dca732e72268f576f42839ef4e4eab",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 84b21fb47463b92406b4e5e3256144b628d6f8c6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14385) 
   * 9d044da669dca732e72268f576f42839ef4e4eab UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #7695:
URL: https://github.com/apache/hudi/pull/7695#issuecomment-1387728615

   I took a stab at this
   https://github.com/apache/hudi/pull/7700
   yet to add AutoGenerateRecordKey. but have fixed abstractions in general. So, any key gen class could generate any kind of record keys for spark row methods as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #7695: HUDI-5575. Support any record key generation along w/ any partition path generation for row writer

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on PR #7695:
URL: https://github.com/apache/hudi/pull/7695#issuecomment-1402400682

   closing this in favor of https://github.com/apache/hudi/pull/7700
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org