You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "gaoshihang (via GitHub)" <gi...@apache.org> on 2023/02/14 09:27:59 UTC

[GitHub] [hudi] gaoshihang opened a new pull request, #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

gaoshihang opened a new pull request, #7941:
URL: https://github.com/apache/hudi/pull/7941

   ### Change Logs
   
   In BaseSparkCommitActionExecutor.java, This RDD is hardcoded as persist(MEMORY_AND_DISK_SER)
   `    // TODO: Consistent contract in HoodieWriteClient regarding preppedRecord storage level handling
       JavaRDD<HoodieRecord<T>> inputRDD = HoodieJavaRDD.getJavaRDD(inputRecords);
       if (inputRDD.getStorageLevel() == StorageLevel.NONE()) {
         inputRDD.persist(StorageLevel.MEMORY_AND_DISK_SER());
       } else {
         LOG.info("RDD PreppedRecords was persisted at: " + inputRDD.getStorageLevel());
       }`
   But if we want to change its storage level, we have no way to use a parameter.
   
   ### Impact
   Add a new config: hoodie.spark.write.storage.level.
   To specific the storage level of this RDD.
   1. the config's default value is MEMORY_AND_DISK_SER
   ![image](https://user-images.githubusercontent.com/20013931/218693697-6cd84e79-31ed-4bd2-bf4a-42add13fd9dd.png)
   
   2. If set the config to another storage level
   "hoodie.spark.write.storage.level": "DISK_ONLY"
   ![image](https://user-images.githubusercontent.com/20013931/218694143-b38042df-e089-47c1-97fa-ea07aa911ab7.png)
   
   3. If the storage level is wrong, It will throw exception
   like set "hoodie.spark.write.storage.level": "DISKE_ONLY"
   Caused by: java.lang.IllegalArgumentException: Invalid StorageLevel: DISKE_ONLY
   
   ### Risk level (write none, low medium or high below)
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1429771976

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] gaoshihang commented on a diff in pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "gaoshihang (via GitHub)" <gi...@apache.org>.
gaoshihang commented on code in PR #7941:
URL: https://github.com/apache/hudi/pull/7941#discussion_r1108035002


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -127,6 +127,12 @@ public class HoodieWriteConfig extends HoodieConfig {
       .noDefaultValue()
       .withDocumentation("Table name that will be used for registering with metastores like HMS. Needs to be same across runs.");
 
+  public static final ConfigProperty<String> SPARK_WRITE_STORAGE_LEVEL_VALUE = ConfigProperty
+       .key("hoodie.spark.write.storage.level")
+       .defaultValue("MEMORY_AND_DISK_SER")
+       .withDocumentation("Determine what level of persistence is used to cache write RDDs. "

Review Comment:
   updated



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -127,6 +127,12 @@ public class HoodieWriteConfig extends HoodieConfig {
       .noDefaultValue()
       .withDocumentation("Table name that will be used for registering with metastores like HMS. Needs to be same across runs.");
 
+  public static final ConfigProperty<String> SPARK_WRITE_STORAGE_LEVEL_VALUE = ConfigProperty
+       .key("hoodie.spark.write.storage.level")

Review Comment:
   updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1430789562

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192",
       "triggerID" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155) 
   * 21d0208f08bcf33ff9c37871c73f395d9571170e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1429418042

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1430785448

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155) 
   * 21d0208f08bcf33ff9c37871c73f395d9571170e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1430886544

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192",
       "triggerID" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155) 
   * 21d0208f08bcf33ff9c37871c73f395d9571170e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192) 
   * 21b97776670a8bcf75eaacaa5933fbddc1c9eb00 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1432562490

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192",
       "triggerID" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197",
       "triggerID" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15232",
       "triggerID" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 21b97776670a8bcf75eaacaa5933fbddc1c9eb00 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197) 
   * a6978e803fbf835ae8cc3180fe0a8c361bc179b6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15232) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1434016159

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192",
       "triggerID" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197",
       "triggerID" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15232",
       "triggerID" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15264",
       "triggerID" : "1434009159",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * a6978e803fbf835ae8cc3180fe0a8c361bc179b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15232) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15264) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] zhangyue19921010 merged pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "zhangyue19921010 (via GitHub)" <gi...@apache.org>.
zhangyue19921010 merged PR #7941:
URL: https://github.com/apache/hudi/pull/7941


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] gaoshihang commented on a diff in pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "gaoshihang (via GitHub)" <gi...@apache.org>.
gaoshihang commented on code in PR #7941:
URL: https://github.com/apache/hudi/pull/7941#discussion_r1108034909


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -127,6 +127,12 @@ public class HoodieWriteConfig extends HoodieConfig {
       .noDefaultValue()
       .withDocumentation("Table name that will be used for registering with metastores like HMS. Needs to be same across runs.");
 
+  public static final ConfigProperty<String> SPARK_WRITE_STORAGE_LEVEL_VALUE = ConfigProperty

Review Comment:
   updated
   



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -1069,6 +1075,10 @@ public String getWriteSchema() {
     return getSchema();
   }
 
+  public String getSparkWriteStorageLevel() {

Review Comment:
   updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1429430076

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1433026657

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192",
       "triggerID" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197",
       "triggerID" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15232",
       "triggerID" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a6978e803fbf835ae8cc3180fe0a8c361bc179b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15232) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1431464017

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192",
       "triggerID" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197",
       "triggerID" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 21b97776670a8bcf75eaacaa5933fbddc1c9eb00 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1434768944

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192",
       "triggerID" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197",
       "triggerID" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15232",
       "triggerID" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15264",
       "triggerID" : "1434009159",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * a6978e803fbf835ae8cc3180fe0a8c361bc179b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15232) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15264) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1430893023

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192",
       "triggerID" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197",
       "triggerID" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 21d0208f08bcf33ff9c37871c73f395d9571170e Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192) 
   * 21b97776670a8bcf75eaacaa5933fbddc1c9eb00 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] zhangyue19921010 commented on a diff in pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "zhangyue19921010 (via GitHub)" <gi...@apache.org>.
zhangyue19921010 commented on code in PR #7941:
URL: https://github.com/apache/hudi/pull/7941#discussion_r1107957756


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -127,6 +127,12 @@ public class HoodieWriteConfig extends HoodieConfig {
       .noDefaultValue()
       .withDocumentation("Table name that will be used for registering with metastores like HMS. Needs to be same across runs.");
 
+  public static final ConfigProperty<String> SPARK_WRITE_STORAGE_LEVEL_VALUE = ConfigProperty
+       .key("hoodie.spark.write.storage.level")

Review Comment:
   nit: could we use `hoodie.write.tagged.record.storage.level`



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -127,6 +127,12 @@ public class HoodieWriteConfig extends HoodieConfig {
       .noDefaultValue()
       .withDocumentation("Table name that will be used for registering with metastores like HMS. Needs to be same across runs.");
 
+  public static final ConfigProperty<String> SPARK_WRITE_STORAGE_LEVEL_VALUE = ConfigProperty
+       .key("hoodie.spark.write.storage.level")
+       .defaultValue("MEMORY_AND_DISK_SER")
+       .withDocumentation("Determine what level of persistence is used to cache write RDDs. "

Review Comment:
   is used to cache the tagged records



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -127,6 +127,12 @@ public class HoodieWriteConfig extends HoodieConfig {
       .noDefaultValue()
       .withDocumentation("Table name that will be used for registering with metastores like HMS. Needs to be same across runs.");
 
+  public static final ConfigProperty<String> SPARK_WRITE_STORAGE_LEVEL_VALUE = ConfigProperty

Review Comment:
   TAGGED_RECORD_STORAGE_LEVEL_VALUE



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -1069,6 +1075,10 @@ public String getWriteSchema() {
     return getSchema();
   }
 
+  public String getSparkWriteStorageLevel() {

Review Comment:
   getTaggedRecordStorageLevel()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1432556681

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15155",
       "triggerID" : "42e12ad1b6bebdcc3dc9d985e5be661b198f3f5c",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15192",
       "triggerID" : "21d0208f08bcf33ff9c37871c73f395d9571170e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197",
       "triggerID" : "21b97776670a8bcf75eaacaa5933fbddc1c9eb00",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a6978e803fbf835ae8cc3180fe0a8c361bc179b6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 21b97776670a8bcf75eaacaa5933fbddc1c9eb00 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15197) 
   * a6978e803fbf835ae8cc3180fe0a8c361bc179b6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] zhangyue19921010 commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "zhangyue19921010 (via GitHub)" <gi...@apache.org>.
zhangyue19921010 commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1434009159

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] zhangyue19921010 commented on pull request #7941: [HUDI-5786] Add a new config to specific spark write rdd storage level

Posted by "zhangyue19921010 (via GitHub)" <gi...@apache.org>.
zhangyue19921010 commented on PR #7941:
URL: https://github.com/apache/hudi/pull/7941#issuecomment-1436195721

   LGTM. 
   1. Only config changed with the same default value as before. 
   2. All UT/IT passed. 
   
   Low impact. Merged now.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org