You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "renshangtao (via GitHub)" <gi...@apache.org> on 2023/03/16 08:37:26 UTC

[GitHub] [hudi] renshangtao opened a new pull request, #8200: The hoodie.datasource.write.row.writer.enable should set to be true.

renshangtao opened a new pull request, #8200:
URL: https://github.com/apache/hudi/pull/8200

   ### Change Logs
   
   Fix the mismatch between the value of hoodie.datasource.write.row.writer.enable and the document.
   
   ### Impact
   
   If the user does not find this configuration when sorting within the partition, the desired result will not be obtained.
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   none
   
   ### Documentation Update
   
   no
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8200: The hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1471800338

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15747",
       "triggerID" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e0317dc8630e3b6a059d73d9110aea178d96409c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15747) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #8200: The hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #8200:
URL: https://github.com/apache/hudi/pull/8200#discussion_r1169440451


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -108,7 +108,7 @@ public HoodieWriteMetadata<HoodieData<WriteStatus>> performClustering(final Hood
     Stream<HoodieData<WriteStatus>> writeStatusesStream = FutureUtils.allOf(
             clusteringPlan.getInputGroups().stream()
                 .map(inputGroup -> {
-                  if (getWriteConfig().getBooleanOrDefault("hoodie.datasource.write.row.writer.enable", false)) {
+                  if (getWriteConfig().getBooleanOrDefault("hoodie.datasource.write.row.writer.enable", true)) {

Review Comment:
   cc @nsivabalan . 
   
   Good catch. It looks like we cannot use the ConfigProperty directly due to circular dependency. Can you comb the codebase to see if there are similar cases ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #8200: [MINOR] hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1555020095

   as row-writing for clustering is introduced recently and we want to keep row-writing to false for longer to fully stablize it, will close this for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] renshangtao commented on pull request #8200: The hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "renshangtao (via GitHub)" <gi...@apache.org>.
renshangtao commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1471849601

   > does it cause error result?
   
   Yes, when I tested clustering, I found that the files under the same partition after sorting can only ensure internal order of the files, but there is still no order between the files. The location code only found that this configuration and document settings are inconsistent


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on pull request #8200: The hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "KnightChess (via GitHub)" <gi...@apache.org>.
KnightChess commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1471908321

   on, I got it, the default value in config is true. But I think it will not lead to the differences of sorting results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8200: The hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1471535285

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e0317dc8630e3b6a059d73d9110aea178d96409c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #8200: [MINOR] hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #8200:
URL: https://github.com/apache/hudi/pull/8200#discussion_r1187662523


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -108,7 +108,7 @@ public HoodieWriteMetadata<HoodieData<WriteStatus>> performClustering(final Hood
     Stream<HoodieData<WriteStatus>> writeStatusesStream = FutureUtils.allOf(
             clusteringPlan.getInputGroups().stream()
                 .map(inputGroup -> {
-                  if (getWriteConfig().getBooleanOrDefault("hoodie.datasource.write.row.writer.enable", false)) {
+                  if (getWriteConfig().getBooleanOrDefault("hoodie.datasource.write.row.writer.enable", true)) {

Review Comment:
   guess this was intentional. unless user explicitly enables this config, we don't want to enable it for clustering. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8200: [MINOR] hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1529618638

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15747",
       "triggerID" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6758894423e01caeda58b6fcb632919d92962133",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6758894423e01caeda58b6fcb632919d92962133",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e0317dc8630e3b6a059d73d9110aea178d96409c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15747) 
   * 6758894423e01caeda58b6fcb632919d92962133 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan closed pull request #8200: [MINOR] hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan closed pull request #8200: [MINOR] hoodie.datasource.write.row.writer.enable should set to be true.
URL: https://github.com/apache/hudi/pull/8200


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #8200: [MINOR] hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1554024122

   > > on, I got it, the default value in config is true. But I think it will not lead to the differences of sorting results
   > 
   > You can test it,if the value is false , it will create a RDDCustomColumnsSortPartitioner who's class description is " A partitioner that does sorting based on specified column values for each RDD partition."
   
   Both RDDCustomColumnsSortPartitioner and RowCustomColumnsSortPartitioner should sort globally. If you observe sorting issue, then it's a different bug to be fixed. Flipping this default value here is irrelevant to sorting issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on pull request #8200: The hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1513984319

   @renshangtao : Can you create a jira and add it to the PR description so that PR validation can succeed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8200: [MINOR] hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1529786646

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15747",
       "triggerID" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6758894423e01caeda58b6fcb632919d92962133",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16776",
       "triggerID" : "6758894423e01caeda58b6fcb632919d92962133",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6758894423e01caeda58b6fcb632919d92962133 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16776) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8200: [MINOR] hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1529674918

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15747",
       "triggerID" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6758894423e01caeda58b6fcb632919d92962133",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16776",
       "triggerID" : "6758894423e01caeda58b6fcb632919d92962133",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e0317dc8630e3b6a059d73d9110aea178d96409c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15747) 
   * 6758894423e01caeda58b6fcb632919d92962133 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16776) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8200: The hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1471544933

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15747",
       "triggerID" : "e0317dc8630e3b6a059d73d9110aea178d96409c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e0317dc8630e3b6a059d73d9110aea178d96409c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15747) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #8200: [MINOR] hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #8200:
URL: https://github.com/apache/hudi/pull/8200#discussion_r1198551786


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java:
##########
@@ -108,7 +108,7 @@ public HoodieWriteMetadata<HoodieData<WriteStatus>> performClustering(final Hood
     Stream<HoodieData<WriteStatus>> writeStatusesStream = FutureUtils.allOf(
             clusteringPlan.getInputGroups().stream()
                 .map(inputGroup -> {
-                  if (getWriteConfig().getBooleanOrDefault("hoodie.datasource.write.row.writer.enable", false)) {
+                  if (getWriteConfig().getBooleanOrDefault("hoodie.datasource.write.row.writer.enable", true)) {

Review Comment:
   lets also consider issues like https://github.com/apache/hudi/issues/8259 before we can make it default. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] renshangtao commented on pull request #8200: The hoodie.datasource.write.row.writer.enable should set to be true.

Posted by "renshangtao (via GitHub)" <gi...@apache.org>.
renshangtao commented on PR #8200:
URL: https://github.com/apache/hudi/pull/8200#issuecomment-1474584244

   > on, I got it, the default value in config is true. But I think it will not lead to the differences of sorting results
   
   You can test it,if the value is false , it will create a RDDCustomColumnsSortPartitioner who's class description is "
   A partitioner that does sorting based on specified column values for each RDD partition."
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org