You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "wuwenchi (via GitHub)" <gi...@apache.org> on 2023/03/27 09:55:56 UTC

[GitHub] [hudi] wuwenchi opened a new pull request, #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

wuwenchi opened a new pull request, #8300:
URL: https://github.com/apache/hudi/pull/8300

   ### Change Logs
   
   When obtaining multiple specified fields, the return value is actually an array, but here it is directly obtained as an object: 
   ``` Object recordValue = record.getColumnValues(...)```
   
   So it is converted into a string later: ```StringUtils.objToString(recordValue)```, 
   in fact, the address of the previous array is obtained, resulting in a sorting error.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   ### Risk level (write none, low medium or high below)
   
   below
   
   ### Documentation Update
   
   none
   
   
   ### Contributor's checklist
   
   - [ x ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ x ] Change Logs and Impact were stated clearly
   - [ x ] Adequate tests were added if applicable
   - [ x ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wuwenchi commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "wuwenchi (via GitHub)" <gi...@apache.org>.
wuwenchi commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1162677890


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -61,13 +61,8 @@ public JavaRDD<HoodieRecord<T>> repartitionRecords(JavaRDD<HoodieRecord<T>> reco
     final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled;
     return records.sortBy(
         record -> {
-          Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
-          // null values are replaced with empty string for null_first order
-          if (recordValue == null) {
-            return StringUtils.EMPTY_STRING;
-          } else {
-            return StringUtils.objToString(recordValue);
-          }
+          Object[] columnValues = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
+          return FlatLists.ofComparableArray(columnValues);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1484905931

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1163570948


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -61,13 +61,8 @@ public JavaRDD<HoodieRecord<T>> repartitionRecords(JavaRDD<HoodieRecord<T>> reco
     final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled;
     return records.sortBy(
         record -> {
-          Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
-          // null values are replaced with empty string for null_first order
-          if (recordValue == null) {
-            return StringUtils.EMPTY_STRING;
-          } else {
-            return StringUtils.objToString(recordValue);
-          }
+          Object[] columnValues = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
+          return FlatLists.ofComparableArray(columnValues);

Review Comment:
   We should fix `JavaCustomColumnsSortPartitioner` too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1503970614

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255",
       "triggerID" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b7ab237090a715521e580113486849489d1bf00c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260",
       "triggerID" : "b7ab237090a715521e580113486849489d1bf00c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b7ab237090a715521e580113486849489d1bf00c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1506902040

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255",
       "triggerID" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b7ab237090a715521e580113486849489d1bf00c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260",
       "triggerID" : "b7ab237090a715521e580113486849489d1bf00c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16282",
       "triggerID" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "triggerType" : "PUSH"
     }, {
       "hash" : "059463c77c641929a07e9b9ebb9e369d746c157f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "059463c77c641929a07e9b9ebb9e369d746c157f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16282) 
   * 059463c77c641929a07e9b9ebb9e369d746c157f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wuwenchi commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "wuwenchi (via GitHub)" <gi...@apache.org>.
wuwenchi commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1498557891

   @danny0405 Can you help review it? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1504806109

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255",
       "triggerID" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b7ab237090a715521e580113486849489d1bf00c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260",
       "triggerID" : "b7ab237090a715521e580113486849489d1bf00c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16282",
       "triggerID" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b7ab237090a715521e580113486849489d1bf00c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260) 
   * 01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16282) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1505405791

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255",
       "triggerID" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b7ab237090a715521e580113486849489d1bf00c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260",
       "triggerID" : "b7ab237090a715521e580113486849489d1bf00c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16282",
       "triggerID" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16282) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1507251814

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255",
       "triggerID" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b7ab237090a715521e580113486849489d1bf00c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260",
       "triggerID" : "b7ab237090a715521e580113486849489d1bf00c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16282",
       "triggerID" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "triggerType" : "PUSH"
     }, {
       "hash" : "059463c77c641929a07e9b9ebb9e369d746c157f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16318",
       "triggerID" : "059463c77c641929a07e9b9ebb9e369d746c157f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 059463c77c641929a07e9b9ebb9e369d746c157f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wuwenchi commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "wuwenchi (via GitHub)" <gi...@apache.org>.
wuwenchi commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1161198464


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -59,17 +62,17 @@ public JavaRDD<HoodieRecord<T>> repartitionRecords(JavaRDD<HoodieRecord<T>> reco
     final String[] sortColumns = this.sortColumnNames;
     final SerializableSchema schema = this.serializableSchema;
     final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled;
-    return records.sortBy(
-        record -> {
-          Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
-          // null values are replaced with empty string for null_first order
-          if (recordValue == null) {
-            return StringUtils.EMPTY_STRING;
-          } else {
-            return StringUtils.objToString(recordValue);
-          }
-        },

Review Comment:
   
   If there are multiple sorting fields specified by the user, then the original situation is that there will be two palces:
   1. 
    https://github.com/apache/hudi/blob/3cc6233b58773d45a8726f70a75c6d1edda7b313/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java#L761-L763
   Extract the fields specified in the record and concatenate them into a string. (This is wrong  because multi-field sorting is to sort by one field first, and then sort by another field, instead of splicing the contents of multiple fields together and then sorting)
   
   2
   
   https://github.com/apache/hudi/blob/3cc6233b58773d45a8726f70a75c6d1edda7b313/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecord.java#L119-L121
   `getRecordColumnValues` returns an `Object` (actually a string), but `getColumnValues` is forcibly replaced with `Object[]`, and in `repartitionRecords` it is forcibly converted back to an `Object`, and then directly fetches toString for the `Object`, resulting in the fact that the strings compared here are actually is the object address



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1486150365

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942) 
   * 62b564e1d50ec5c5d3d25c8938fc050005589f8d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wuwenchi commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "wuwenchi (via GitHub)" <gi...@apache.org>.
wuwenchi commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1502588400

   > Thanks for the contribution, I have reviewed and attached a patch which is based on the latest master: [5991.patch.zip](https://github.com/apache/hudi/files/11186917/5991.patch.zip)
   
   @danny0405 
   In the patch, put multiple fields together and compare them, so this may happen:
   ![image](https://user-images.githubusercontent.com/19755729/231037013-45858656-a4a4-4434-b0cf-22df61ffb7df.png)
   ![image](https://user-images.githubusercontent.com/19755729/231037039-aeab9426-c21d-4155-b603-949439c03702.png)
   
   So I think it may be more reasonable to use a list, put each field value in the list, and then use the comparison function of each item alone.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1503220871

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c5898a372ce32e120079fe297870137f31998620 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251) 
   * f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1503231205

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255",
       "triggerID" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c5898a372ce32e120079fe297870137f31998620 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251) 
   * f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 merged PR #8300:
URL: https://github.com/apache/hudi/pull/8300


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wuwenchi commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "wuwenchi (via GitHub)" <gi...@apache.org>.
wuwenchi commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1159425077


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -59,17 +62,17 @@ public JavaRDD<HoodieRecord<T>> repartitionRecords(JavaRDD<HoodieRecord<T>> reco
     final String[] sortColumns = this.sortColumnNames;
     final SerializableSchema schema = this.serializableSchema;
     final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled;
-    return records.sortBy(
-        record -> {
-          Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
-          // null values are replaced with empty string for null_first order
-          if (recordValue == null) {
-            return StringUtils.EMPTY_STRING;
-          } else {
-            return StringUtils.objToString(recordValue);
-          }
-        },

Review Comment:
   Because more than one field may be sorted, users may need to sort multiple fields. But line 69 here can only return the contents of the first field at most. 
   This was changed to tuple so that we can use a custom sort function that can extract multiple field values from record and then sort multiple fields.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1501290456

   Thanks for the contribution, I have reviewed and attached a patch which is based on the latest master:
   [5991.patch.zip](https://github.com/apache/hudi/files/11186917/5991.patch.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wuwenchi commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "wuwenchi (via GitHub)" <gi...@apache.org>.
wuwenchi commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1162817827


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -61,13 +61,8 @@ public JavaRDD<HoodieRecord<T>> repartitionRecords(JavaRDD<HoodieRecord<T>> reco
     final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled;
     return records.sortBy(
         record -> {
-          Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
-          // null values are replaced with empty string for null_first order
-          if (recordValue == null) {
-            return StringUtils.EMPTY_STRING;
-          } else {
-            return StringUtils.objToString(recordValue);
-          }
+          Object[] columnValues = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
+          return FlatLists.ofComparableArray(columnValues);

Review Comment:
   No exception will be thrown here, it's null_first.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1159372025


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -59,17 +62,17 @@ public JavaRDD<HoodieRecord<T>> repartitionRecords(JavaRDD<HoodieRecord<T>> reco
     final String[] sortColumns = this.sortColumnNames;
     final SerializableSchema schema = this.serializableSchema;
     final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled;
-    return records.sortBy(
-        record -> {
-          Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
-          // null values are replaced with empty string for null_first order
-          if (recordValue == null) {
-            return StringUtils.EMPTY_STRING;
-          } else {
-            return StringUtils.objToString(recordValue);
-          }
-        },

Review Comment:
   Does it work too if we just return `FlatLists.ofComparableArray(recordValue)` in line 69? Why mapping the records into tuple first.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1503454134

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255",
       "triggerID" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b7ab237090a715521e580113486849489d1bf00c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260",
       "triggerID" : "b7ab237090a715521e580113486849489d1bf00c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255) 
   * b7ab237090a715521e580113486849489d1bf00c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1504796941

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255",
       "triggerID" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b7ab237090a715521e580113486849489d1bf00c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260",
       "triggerID" : "b7ab237090a715521e580113486849489d1bf00c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b7ab237090a715521e580113486849489d1bf00c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260) 
   * 01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1503143007

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c5898a372ce32e120079fe297870137f31998620 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1486343375

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 62b564e1d50ec5c5d3d25c8938fc050005589f8d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1484896222

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1506963865

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255",
       "triggerID" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b7ab237090a715521e580113486849489d1bf00c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260",
       "triggerID" : "b7ab237090a715521e580113486849489d1bf00c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16282",
       "triggerID" : "01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251",
       "triggerType" : "PUSH"
     }, {
       "hash" : "059463c77c641929a07e9b9ebb9e369d746c157f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16318",
       "triggerID" : "059463c77c641929a07e9b9ebb9e369d746c157f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 01bb8ec6a3f1d0e359a9e1fccd328d1fe3461251 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16282) 
   * 059463c77c641929a07e9b9ebb9e369d746c157f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1503368007

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255",
       "triggerID" : "f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b7ab237090a715521e580113486849489d1bf00c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b7ab237090a715521e580113486849489d1bf00c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c5898a372ce32e120079fe297870137f31998620 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251) 
   * f88b842a8dfb6c146ad9fd42ae1a74b2a38f92bb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16255) 
   * b7ab237090a715521e580113486849489d1bf00c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1502902472

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 62b564e1d50ec5c5d3d25c8938fc050005589f8d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950) 
   * c5898a372ce32e120079fe297870137f31998620 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1502913289

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c5898a372ce32e120079fe297870137f31998620",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251",
       "triggerID" : "c5898a372ce32e120079fe297870137f31998620",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 62b564e1d50ec5c5d3d25c8938fc050005589f8d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950) 
   * c5898a372ce32e120079fe297870137f31998620 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16251) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1162582638


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -61,13 +61,8 @@ public JavaRDD<HoodieRecord<T>> repartitionRecords(JavaRDD<HoodieRecord<T>> reco
     final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled;
     return records.sortBy(
         record -> {
-          Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
-          // null values are replaced with empty string for null_first order
-          if (recordValue == null) {
-            return StringUtils.EMPTY_STRING;
-          } else {
-            return StringUtils.objToString(recordValue);
-          }
+          Object[] columnValues = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
+          return FlatLists.ofComparableArray(columnValues);

Review Comment:
   Is the NULL still obeying the null_first order? Could the nulls throw exception here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1485118467

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8300:
URL: https://github.com/apache/hudi/pull/8300#issuecomment-1486155173

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942",
       "triggerID" : "470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950",
       "triggerID" : "62b564e1d50ec5c5d3d25c8938fc050005589f8d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 470faf8d4ca82f3e0e915a4c7aa5bf62de11fed3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15942) 
   * 62b564e1d50ec5c5d3d25c8938fc050005589f8d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15950) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1162750761


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -61,13 +61,8 @@ public JavaRDD<HoodieRecord<T>> repartitionRecords(JavaRDD<HoodieRecord<T>> reco
     final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled;
     return records.sortBy(
         record -> {
-          Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
-          // null values are replaced with empty string for null_first order
-          if (recordValue == null) {
-            return StringUtils.EMPTY_STRING;
-          } else {
-            return StringUtils.objToString(recordValue);
-          }
+          Object[] columnValues = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
+          return FlatLists.ofComparableArray(columnValues);

Review Comment:
   I'm not asking you to throw exception for nulls here, the original code has the NULL_FIRST semantics, that means a null is always greater than any other non_nulls.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1159543670


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -59,17 +62,17 @@ public JavaRDD<HoodieRecord<T>> repartitionRecords(JavaRDD<HoodieRecord<T>> reco
     final String[] sortColumns = this.sortColumnNames;
     final SerializableSchema schema = this.serializableSchema;
     final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled;
-    return records.sortBy(
-        record -> {
-          Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
-          // null values are replaced with empty string for null_first order
-          if (recordValue == null) {
-            return StringUtils.EMPTY_STRING;
-          } else {
-            return StringUtils.objToString(recordValue);
-          }
-        },

Review Comment:
   The whole record is passed into the comparator, what do you mean by 'field'?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wuwenchi commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "wuwenchi (via GitHub)" <gi...@apache.org>.
wuwenchi commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1165454235


##########
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaCustomColumnsSortPartitioner.java:
##########
@@ -51,9 +52,13 @@ public JavaCustomColumnsSortPartitioner(String[] columnNames, Schema schema, boo
   public List<HoodieRecord<T>> repartitionRecords(
       List<HoodieRecord<T>> records, int outputPartitions) {
     return records.stream().sorted((o1, o2) -> {
-      Object values1 = HoodieAvroUtils.getRecordColumnValues((HoodieAvroRecord)o1, sortColumnNames, schema, consistentLogicalTimestampEnabled);
-      Object values2 = HoodieAvroUtils.getRecordColumnValues((HoodieAvroRecord)o2, sortColumnNames, schema, consistentLogicalTimestampEnabled);
-      return values1.toString().compareTo(values2.toString());
+      FlatLists.ComparableList<Comparable> cmp1 = FlatLists.ofComparableArray(
+          HoodieAvroUtils.getRecordColumnValues((HoodieAvroRecord) o1, sortColumnNames, schema, consistentLogicalTimestampEnabled)
+      );

Review Comment:
   done



##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/execution/bulkinsert/TestBulkInsertInternalPartitioner.java:
##########
@@ -238,16 +239,19 @@ private Comparator<HoodieRecord<? extends HoodieRecordPayload>> getCustomColumnC
     Comparator<HoodieRecord<? extends HoodieRecordPayload>> comparator = Comparator.comparing(record -> {
       try {
         GenericRecord genericRecord = (GenericRecord) record.getData().getInsertValue(schema).get();
-        StringBuilder sb = new StringBuilder();
+        List<Object> keys = new ArrayList<>();
         for (String col : sortColumns) {
-          sb.append(genericRecord.get(col));
+          keys.add(genericRecord.get(col));
         }
-
-        return sb.toString();
+        return keys;
       } catch (IOException e) {
         throw new HoodieIOException("unable to read value for " + sortColumns);
       }
-    });
+    }, (o1, o2) -> {
+        FlatLists.ComparableList obj1 = FlatLists.ofComparableArray(o1.toArray());
+        FlatLists.ComparableList obj2 = FlatLists.ofComparableArray(o2.toArray());
+        return obj1.compareTo(obj2);

Review Comment:
   done
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1163569337


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java:
##########
@@ -61,13 +61,8 @@ public JavaRDD<HoodieRecord<T>> repartitionRecords(JavaRDD<HoodieRecord<T>> reco
     final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled;
     return records.sortBy(
         record -> {
-          Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
-          // null values are replaced with empty string for null_first order
-          if (recordValue == null) {
-            return StringUtils.EMPTY_STRING;
-          } else {
-            return StringUtils.objToString(recordValue);
-          }
+          Object[] columnValues = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled);
+          return FlatLists.ofComparableArray(columnValues);

Review Comment:
   The default behavior is null_last, the original comment is wrong, it returned empty string for nulls, empty string should be always smaller than non empty strings.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8300:
URL: https://github.com/apache/hudi/pull/8300#discussion_r1165101064


##########
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaCustomColumnsSortPartitioner.java:
##########
@@ -51,9 +52,13 @@ public JavaCustomColumnsSortPartitioner(String[] columnNames, Schema schema, boo
   public List<HoodieRecord<T>> repartitionRecords(
       List<HoodieRecord<T>> records, int outputPartitions) {
     return records.stream().sorted((o1, o2) -> {
-      Object values1 = HoodieAvroUtils.getRecordColumnValues((HoodieAvroRecord)o1, sortColumnNames, schema, consistentLogicalTimestampEnabled);
-      Object values2 = HoodieAvroUtils.getRecordColumnValues((HoodieAvroRecord)o2, sortColumnNames, schema, consistentLogicalTimestampEnabled);
-      return values1.toString().compareTo(values2.toString());
+      FlatLists.ComparableList<Comparable> cmp1 = FlatLists.ofComparableArray(
+          HoodieAvroUtils.getRecordColumnValues((HoodieAvroRecord) o1, sortColumnNames, schema, consistentLogicalTimestampEnabled)
+      );

Review Comment:
   cmp1 -> values1, cmp2 -> values2



##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/execution/bulkinsert/TestBulkInsertInternalPartitioner.java:
##########
@@ -238,16 +239,19 @@ private Comparator<HoodieRecord<? extends HoodieRecordPayload>> getCustomColumnC
     Comparator<HoodieRecord<? extends HoodieRecordPayload>> comparator = Comparator.comparing(record -> {
       try {
         GenericRecord genericRecord = (GenericRecord) record.getData().getInsertValue(schema).get();
-        StringBuilder sb = new StringBuilder();
+        List<Object> keys = new ArrayList<>();
         for (String col : sortColumns) {
-          sb.append(genericRecord.get(col));
+          keys.add(genericRecord.get(col));
         }
-
-        return sb.toString();
+        return keys;
       } catch (IOException e) {
         throw new HoodieIOException("unable to read value for " + sortColumns);
       }
-    });
+    }, (o1, o2) -> {
+        FlatLists.ComparableList obj1 = FlatLists.ofComparableArray(o1.toArray());
+        FlatLists.ComparableList obj2 = FlatLists.ofComparableArray(o2.toArray());
+        return obj1.compareTo(obj2);

Review Comment:
   obj1 -> values1, obj2 -> values2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org