You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/23 03:14:41 UTC

[GitHub] [hudi] xicm opened a new pull request, #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

xicm opened a new pull request, #6759:
URL: https://github.com/apache/hudi/pull/6759

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6759:
URL: https://github.com/apache/hudi/pull/6759#issuecomment-1255777888

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f91c177cb72f63f4f3bc0ebffab95221d9df1bca UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6759:
URL: https://github.com/apache/hudi/pull/6759#issuecomment-1256247406

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606",
       "triggerID" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1256194045",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * f91c177cb72f63f4f3bc0ebffab95221d9df1bca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6759:
URL: https://github.com/apache/hudi/pull/6759#issuecomment-1256968558

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606",
       "triggerID" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606",
       "triggerID" : "1256194045",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "74953bcb6c064234c22808fed8d155e75fa226d5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11691",
       "triggerID" : "74953bcb6c064234c22808fed8d155e75fa226d5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 74953bcb6c064234c22808fed8d155e75fa226d5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11691) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

Posted by GitBox <gi...@apache.org>.
nsivabalan merged PR #6759:
URL: https://github.com/apache/hudi/pull/6759


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6759:
URL: https://github.com/apache/hudi/pull/6759#issuecomment-1255780370

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606",
       "triggerID" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f91c177cb72f63f4f3bc0ebffab95221d9df1bca Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6759:
URL: https://github.com/apache/hudi/pull/6759#issuecomment-1256040387

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606",
       "triggerID" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f91c177cb72f63f4f3bc0ebffab95221d9df1bca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6759:
URL: https://github.com/apache/hudi/pull/6759#issuecomment-1256253970

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606",
       "triggerID" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606",
       "triggerID" : "1256194045",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * f91c177cb72f63f4f3bc0ebffab95221d9df1bca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6759:
URL: https://github.com/apache/hudi/pull/6759#issuecomment-1256926641

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606",
       "triggerID" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606",
       "triggerID" : "1256194045",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "74953bcb6c064234c22808fed8d155e75fa226d5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "74953bcb6c064234c22808fed8d155e75fa226d5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f91c177cb72f63f4f3bc0ebffab95221d9df1bca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606) 
   * 74953bcb6c064234c22808fed8d155e75fa226d5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6759:
URL: https://github.com/apache/hudi/pull/6759#discussion_r979164209


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLayoutConfig.java:
##########
@@ -80,8 +83,15 @@ public HoodieLayoutConfig build() {
     }
 
     private void setDefault() {
-      if (layoutConfig.contains(HoodieIndexConfig.INDEX_TYPE.key()) && layoutConfig.getString(HoodieIndexConfig.INDEX_TYPE.key()).equals(HoodieIndex.IndexType.BUCKET.name())) {
+      if (layoutConfig.contains(HoodieIndexConfig.INDEX_TYPE.key())
+          && layoutConfig.getString(HoodieIndexConfig.INDEX_TYPE.key()).equals(HoodieIndex.IndexType.BUCKET.name())) {
         layoutConfig.setDefaultValue(LAYOUT_TYPE, HoodieStorageLayout.LayoutType.BUCKET.name());
+
+        // Currently, the partitioner of the SIMPLE bucket index is supported by SparkBucketIndexPartitioner only.
+        if (layoutConfig.contains(HoodieIndexConfig.BUCKET_INDEX_ENGINE_TYPE)
+            && layoutConfig.getString(HoodieIndexConfig.BUCKET_INDEX_ENGINE_TYPE).equals("SIMPLE")) {

Review Comment:
   nit: can be simplified as `"SIMPLE".equals(layoutConfig.getString(HoodieIndexConfig.BUCKET_INDEX_ENGINE_TYPE))` without the `contains` check.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieLayoutConfig.java:
##########
@@ -80,8 +83,15 @@ public HoodieLayoutConfig build() {
     }
 
     private void setDefault() {
-      if (layoutConfig.contains(HoodieIndexConfig.INDEX_TYPE.key()) && layoutConfig.getString(HoodieIndexConfig.INDEX_TYPE.key()).equals(HoodieIndex.IndexType.BUCKET.name())) {
+      if (layoutConfig.contains(HoodieIndexConfig.INDEX_TYPE.key())
+          && layoutConfig.getString(HoodieIndexConfig.INDEX_TYPE.key()).equals(HoodieIndex.IndexType.BUCKET.name())) {
         layoutConfig.setDefaultValue(LAYOUT_TYPE, HoodieStorageLayout.LayoutType.BUCKET.name());
+
+        // Currently, the partitioner of the SIMPLE bucket index is supported by SparkBucketIndexPartitioner only.
+        if (layoutConfig.contains(HoodieIndexConfig.BUCKET_INDEX_ENGINE_TYPE)
+            && layoutConfig.getString(HoodieIndexConfig.BUCKET_INDEX_ENGINE_TYPE).equals("SIMPLE")) {
+          layoutConfig.setDefaultValue(LAYOUT_PARTITIONER_CLASS_NAME, SIMPLE_BUCKET_LAYOUT_PARTITIONER_CLASS_NAME);

Review Comment:
   Could you add a unit test for this default value overwrite?  You can check similar tests in `TestHoodieWriteConfig`.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6759:
URL: https://github.com/apache/hudi/pull/6759#issuecomment-1256927396

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606",
       "triggerID" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f91c177cb72f63f4f3bc0ebffab95221d9df1bca",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606",
       "triggerID" : "1256194045",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "74953bcb6c064234c22808fed8d155e75fa226d5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11691",
       "triggerID" : "74953bcb6c064234c22808fed8d155e75fa226d5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f91c177cb72f63f4f3bc0ebffab95221d9df1bca Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11606) 
   * 74953bcb6c064234c22808fed8d155e75fa226d5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11691) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xicm commented on pull request #6759: [HUDI-4902] Set default partitioner for SIMPLE BUCKET index

Posted by GitBox <gi...@apache.org>.
xicm commented on PR #6759:
URL: https://github.com/apache/hudi/pull/6759#issuecomment-1256194045

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org