You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "nsivabalan (via GitHub)" <gi...@apache.org> on 2023/02/15 00:00:12 UTC

[GitHub] [hudi] nsivabalan opened a new pull request, #7951: [HUDI-5796] Adding auto inferring partition from incoming df

nsivabalan opened a new pull request, #7951:
URL: https://github.com/apache/hudi/pull/7951

   ### Change Logs
   
   If someone tries to write to hudi in following syntax, we should infer the partition automatically if hoodie's partition path field is not explicitly set.  
   ```
   df.write.partitionBy("col1").format("hudi").options(...).save()
   ```
   
   ### Impact
   
   Improves usability of hudi. 
   
   ### Risk level (write none, low medium or high below)
   
   low.
   
   ### Documentation Update
   
   We might need to enhance our quick start to call it out. 
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1430586014

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7209efd0df54978907b937f1a2aaef0e6b1f74b0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1454064985

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492",
       "triggerID" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494",
       "triggerID" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15555",
       "triggerID" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9ae7b06b3f38d34875349f98d5e64390ab6d60db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15558",
       "triggerID" : "9ae7b06b3f38d34875349f98d5e64390ab6d60db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bbf05d39a470149af7259e2ea0a69b76ebb660df Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15555) 
   * 9ae7b06b3f38d34875349f98d5e64390ab6d60db Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15558) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1449036535

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b678c5be48c132ee8ff047093ea58febd20a3ed Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450) 
   * 22a797678bab6fe57256d35d24ecbee6b92338e5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491) 
   * 9874aa6b73f32857f248c0d4fadbc8a7291e287b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1449141448

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492",
       "triggerID" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494",
       "triggerID" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9874aa6b73f32857f248c0d4fadbc8a7291e287b Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492) 
   * e5ed02b3c18025fc3b0c5a135be64991fb43417b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1440728509

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7209efd0df54978907b937f1a2aaef0e6b1f74b0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183) 
   * 1bc2fd1e84e07cff545656df5eb0d7163223a8b4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1449044316

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492",
       "triggerID" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 22a797678bab6fe57256d35d24ecbee6b92338e5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491) 
   * 9874aa6b73f32857f248c0d4fadbc8a7291e287b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1120583549


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala:
##########
@@ -643,8 +736,9 @@ class TestCOWDataSource extends HoodieSparkClientTestBase with ScalaAssertionSup
   def testSparkPartitionByWithCustomKeyGenerator(recordType: HoodieRecordType): Unit = {
     val (writeOpts, readOpts) = getWriterReaderOpts(recordType)
 
+    val updatedWriteOpts = writeOpts.-(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())

Review Comment:
   in all tests where we are testing df.partitionBy, we don't want to override partition field and hence. bcoz, the main purpose of these test is, hudi should infer df.partitionBy columns if partition path config is not explicitly overriden by the user. getWriterReaderOpts() unfortunately already has partition path set and hence for these tests I had to remove them. I can add a comment here explaining why we are doing this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1120954001


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -288,44 +287,42 @@ object DataSourceWriteOptions {
     .withDocumentation("The table type for the underlying data, for this write. This can’t change between writes.")
 
   /**
-    * Translate spark parameters to hudi parameters
-    *
-    * @param optParams Parameters to be translated
-    * @return Parameters after translation
-    */
-  def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
-    var translatedOptParams = optParams
-    // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    // we should set hoodie's partition path only if its not set by the user.
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
-      && !optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
-      val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
-        .map(SparkDataSourceUtils.decodePartitioningColumns)
-        .getOrElse(Nil)
-      val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
-        DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)
-
-      keyGeneratorClass match {
-        // CustomKeyGenerator needs special treatment, because it needs to be specified in a way
-        // such as "field1:PartitionKeyType1,field2:PartitionKeyType2".
-        // partitionBy can specify the partition like this: partitionBy("p1", "p2:SIMPLE", "p3:TIMESTAMP")
-        case c if (c.nonEmpty && c == classOf[CustomKeyGenerator].getName) =>
-          val partitionPathField = partitionColumns.map(e => {
-            if (e.contains(":")) {
-              e
-            } else {
-              s"$e:SIMPLE"
-            }
-          }).mkString(",")
-          translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
-        case c if (c.isEmpty || !keyGeneratorClass.equals(classOf[NonpartitionedKeyGenerator].getName)) =>
-          // for any key gen other than NonPartitioned key gen, we can override the partition field config.
-          val partitionPathField = partitionColumns.mkString(",")
-          translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
-        case _ => // no op incase of NonPartitioned Key gen.
-      }
+   * Derive [[PARTITIONPATH_FIELD]] based on [[SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY]]
+   * if [[PARTITIONPATH_FIELD]] is not set explicitly.
+   */
+  def derivePartitionPathFieldsIfNeeded(optParams: Map[String, String]): Map[String, String] = {
+    if (!optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      || optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
+      return optParams
+    }
+
+    val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      .map(SparkDataSourceUtils.decodePartitioningColumns)
+      .getOrElse(Nil)
+    val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)
+
+    keyGeneratorClass match {
+      // CustomKeyGenerator needs special treatment, because it needs to be specified in a way
+      // such as "field1:PartitionKeyType1,field2:PartitionKeyType2".
+      // partitionBy can specify the partition like this: partitionBy("p1", "p2:SIMPLE", "p3:TIMESTAMP")
+      case c if Array(classOf[CustomKeyGenerator].getName, classOf[CustomAvroKeyGenerator].getName).contains(c) =>
+        val partitionPathField = partitionColumns.map(e => {
+          if (e.contains(":")) {
+            e
+          } else {
+            s"$e:SIMPLE"
+          }
+        }).mkString(",")
+        optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
+      case c if !Array(classOf[NonpartitionedKeyGenerator].getName, classOf[NonpartitionedAvroKeyGenerator].getName).contains(c) =>

Review Comment:
   not the ideal way of checking key gen type; key gen class itself should tell us if it's of type non-partitioned or not, same for custom keygen classes. The key gen class factory can provide a util to check. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1440783829

   @xushiyan : ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1442900932

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1118224236


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -49,6 +49,7 @@ import org.apache.hudi.internal.schema.InternalSchema
 import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter
 import org.apache.hudi.internal.schema.utils.AvroSchemaEvolutionUtils.reconcileNullability
 import org.apache.hudi.internal.schema.utils.{AvroSchemaEvolutionUtils, SerDeHelper}
+import org.apache.hudi.keygen.constant.KeyGeneratorOptions

Review Comment:
   unused import



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -295,30 +295,35 @@ object DataSourceWriteOptions {
   def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
     var translatedOptParams = optParams
     // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)) {
+    // we should set hoodie's partition path only if its not set by the user.
+    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)

Review Comment:
   maybe we should fail the write if these 2 options are not the same? at least we should avoid unintended writes



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -295,30 +295,35 @@ object DataSourceWriteOptions {
   def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
     var translatedOptParams = optParams
     // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)) {
+    // we should set hoodie's partition path only if its not set by the user.
+    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)

Review Comment:
   let's track this behavior change for next release. 



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -295,30 +295,35 @@ object DataSourceWriteOptions {
   def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
     var translatedOptParams = optParams
     // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)) {
+    // we should set hoodie's partition path only if its not set by the user.
+    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      && !optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
       val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
         .map(SparkDataSourceUtils.decodePartitioningColumns)
         .getOrElse(Nil)
       val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
         DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)
 
-      val partitionPathField =
+      // if not nonpartitioned key gen
+      if (keyGeneratorClass.isEmpty || !keyGeneratorClass.equals(classOf[NonpartitionedKeyGenerator].getName)) {
         keyGeneratorClass match {
           // Only CustomKeyGenerator needs special treatment, because it needs to be specified in a way
           // such as "field1:PartitionKeyType1,field2:PartitionKeyType2".
           // partitionBy can specify the partition like this: partitionBy("p1", "p2:SIMPLE", "p3:TIMESTAMP")
           case c if c == classOf[CustomKeyGenerator].getName =>
-            partitionColumns.map(e => {
+            val partitionPathField = partitionColumns.map(e => {
               if (e.contains(":")) {
                 e
               } else {
                 s"$e:SIMPLE"
               }
             }).mkString(",")
+            translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
           case _ =>
-            partitionColumns.mkString(",")
+            val partitionPathField = partitionColumns.mkString(",")
+            translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
         }
-      translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)

Review Comment:
   in case of keygen not set or NonpartitionedKeyGenerator, partition path field should be empty string. So it's fine right?  what does this change fix? besides, we should do config validation instead of fixing the logic here, for e.g., when keygen is non partitioned, partition field should not be non-empty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1120980660


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -288,44 +287,42 @@ object DataSourceWriteOptions {
     .withDocumentation("The table type for the underlying data, for this write. This can’t change between writes.")
 
   /**
-    * Translate spark parameters to hudi parameters
-    *
-    * @param optParams Parameters to be translated
-    * @return Parameters after translation
-    */
-  def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
-    var translatedOptParams = optParams
-    // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    // we should set hoodie's partition path only if its not set by the user.
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
-      && !optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
-      val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
-        .map(SparkDataSourceUtils.decodePartitioningColumns)
-        .getOrElse(Nil)
-      val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
-        DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)
-
-      keyGeneratorClass match {
-        // CustomKeyGenerator needs special treatment, because it needs to be specified in a way
-        // such as "field1:PartitionKeyType1,field2:PartitionKeyType2".
-        // partitionBy can specify the partition like this: partitionBy("p1", "p2:SIMPLE", "p3:TIMESTAMP")
-        case c if (c.nonEmpty && c == classOf[CustomKeyGenerator].getName) =>
-          val partitionPathField = partitionColumns.map(e => {
-            if (e.contains(":")) {
-              e
-            } else {
-              s"$e:SIMPLE"
-            }
-          }).mkString(",")
-          translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
-        case c if (c.isEmpty || !keyGeneratorClass.equals(classOf[NonpartitionedKeyGenerator].getName)) =>
-          // for any key gen other than NonPartitioned key gen, we can override the partition field config.
-          val partitionPathField = partitionColumns.mkString(",")
-          translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
-        case _ => // no op incase of NonPartitioned Key gen.
-      }
+   * Derive [[PARTITIONPATH_FIELD]] based on [[SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY]]
+   * if [[PARTITIONPATH_FIELD]] is not set explicitly.
+   */
+  def derivePartitionPathFieldsIfNeeded(optParams: Map[String, String]): Map[String, String] = {
+    if (!optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      || optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
+      return optParams
+    }
+
+    val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      .map(SparkDataSourceUtils.decodePartitioningColumns)
+      .getOrElse(Nil)
+    val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)

Review Comment:
   don't think we standardized or supported key gen type on all code paths. 
   ```
   NOTE: Please use hoodie.datasource.write.keygenerator.class instead of hoodie.datasource.write.keygenerator.type. The second config was introduced more recently. and will internally instantiate the correct KeyGenerator class based on the type name. The second one is intended for ease of use and is being actively worked on. We still recommend using the first config until it is marked as deprecated.
   ```
   https://hudi.apache.org/blog/2021/02/13/hudi-key-generators/#key-generators



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1120551966


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala:
##########
@@ -643,8 +736,9 @@ class TestCOWDataSource extends HoodieSparkClientTestBase with ScalaAssertionSup
   def testSparkPartitionByWithCustomKeyGenerator(recordType: HoodieRecordType): Unit = {
     val (writeOpts, readOpts) = getWriterReaderOpts(recordType)
 
+    val updatedWriteOpts = writeOpts.-(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())

Review Comment:
   some cases we need to remove PARTITIONPATH_FIELD_NAME and some cases we don't. this test logic will be hard to maintain. can we avoid special handling like this? as CI passed, we can land once the test is fixed and passed locally.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1448944921

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b678c5be48c132ee8ff047093ea58febd20a3ed Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450) 
   * 22a797678bab6fe57256d35d24ecbee6b92338e5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1120838073


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -295,30 +295,35 @@ object DataSourceWriteOptions {
   def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
     var translatedOptParams = optParams
     // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)) {
+    // we should set hoodie's partition path only if its not set by the user.
+    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)

Review Comment:
   the reasoning is: if people use partitionBy() and set PARTITIONPATH_FIELD_NAME right now, they are likely to be matched. so now we make PARTITIONPATH_FIELD_NAME higher precedence, it'll be compatible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1449135357

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492",
       "triggerID" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 22a797678bab6fe57256d35d24ecbee6b92338e5 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491) 
   * 9874aa6b73f32857f248c0d4fadbc8a7291e287b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492) 
   * e5ed02b3c18025fc3b0c5a135be64991fb43417b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1119377486


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -295,30 +295,35 @@ object DataSourceWriteOptions {
   def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
     var translatedOptParams = optParams
     // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)) {
+    // we should set hoodie's partition path only if its not set by the user.
+    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      && !optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
       val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
         .map(SparkDataSourceUtils.decodePartitioningColumns)
         .getOrElse(Nil)
       val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
         DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)
 
-      val partitionPathField =
+      // if not nonpartitioned key gen
+      if (keyGeneratorClass.isEmpty || !keyGeneratorClass.equals(classOf[NonpartitionedKeyGenerator].getName)) {
         keyGeneratorClass match {
           // Only CustomKeyGenerator needs special treatment, because it needs to be specified in a way
           // such as "field1:PartitionKeyType1,field2:PartitionKeyType2".
           // partitionBy can specify the partition like this: partitionBy("p1", "p2:SIMPLE", "p3:TIMESTAMP")
           case c if c == classOf[CustomKeyGenerator].getName =>
-            partitionColumns.map(e => {
+            val partitionPathField = partitionColumns.map(e => {
               if (e.contains(":")) {
                 e
               } else {
                 s"$e:SIMPLE"
               }
             }).mkString(",")
+            translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
           case _ =>
-            partitionColumns.mkString(",")
+            val partitionPathField = partitionColumns.mkString(",")
+            translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
         }
-      translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)

Review Comment:
   will fix



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -295,30 +295,35 @@ object DataSourceWriteOptions {
   def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
     var translatedOptParams = optParams
     // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)) {
+    // we should set hoodie's partition path only if its not set by the user.
+    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)

Review Comment:
   synced up directly. this should be ok behavior. may be we can add to our faq on how we deduce the partitioning columns. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1453744033

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492",
       "triggerID" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494",
       "triggerID" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15555",
       "triggerID" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5ed02b3c18025fc3b0c5a135be64991fb43417b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494) 
   * bbf05d39a470149af7259e2ea0a69b76ebb660df Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15555) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1454173661

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492",
       "triggerID" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494",
       "triggerID" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15555",
       "triggerID" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9ae7b06b3f38d34875349f98d5e64390ab6d60db",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15558",
       "triggerID" : "9ae7b06b3f38d34875349f98d5e64390ab6d60db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9ae7b06b3f38d34875349f98d5e64390ab6d60db Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15558) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1430592172

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7209efd0df54978907b937f1a2aaef0e6b1f74b0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1447482614

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b678c5be48c132ee8ff047093ea58febd20a3ed Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1120959200


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -288,44 +287,42 @@ object DataSourceWriteOptions {
     .withDocumentation("The table type for the underlying data, for this write. This can’t change between writes.")
 
   /**
-    * Translate spark parameters to hudi parameters
-    *
-    * @param optParams Parameters to be translated
-    * @return Parameters after translation
-    */
-  def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
-    var translatedOptParams = optParams
-    // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    // we should set hoodie's partition path only if its not set by the user.
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
-      && !optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
-      val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
-        .map(SparkDataSourceUtils.decodePartitioningColumns)
-        .getOrElse(Nil)
-      val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
-        DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)
-
-      keyGeneratorClass match {
-        // CustomKeyGenerator needs special treatment, because it needs to be specified in a way
-        // such as "field1:PartitionKeyType1,field2:PartitionKeyType2".
-        // partitionBy can specify the partition like this: partitionBy("p1", "p2:SIMPLE", "p3:TIMESTAMP")
-        case c if (c.nonEmpty && c == classOf[CustomKeyGenerator].getName) =>
-          val partitionPathField = partitionColumns.map(e => {
-            if (e.contains(":")) {
-              e
-            } else {
-              s"$e:SIMPLE"
-            }
-          }).mkString(",")
-          translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
-        case c if (c.isEmpty || !keyGeneratorClass.equals(classOf[NonpartitionedKeyGenerator].getName)) =>
-          // for any key gen other than NonPartitioned key gen, we can override the partition field config.
-          val partitionPathField = partitionColumns.mkString(",")
-          translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
-        case _ => // no op incase of NonPartitioned Key gen.
-      }
+   * Derive [[PARTITIONPATH_FIELD]] based on [[SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY]]
+   * if [[PARTITIONPATH_FIELD]] is not set explicitly.
+   */
+  def derivePartitionPathFieldsIfNeeded(optParams: Map[String, String]): Map[String, String] = {
+    if (!optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      || optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
+      return optParams
+    }
+
+    val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      .map(SparkDataSourceUtils.decodePartitioningColumns)
+      .getOrElse(Nil)
+    val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)

Review Comment:
   @nsivabalan how do we consider `hoodie.datasource.write.keygenerator.type` compare to key gen class in terms of precedence? it should fail a validation if unmatched or we ignore key gen type?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1441022595

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ffc4e3d7fb447cb72feaeaa4a1aec866c519e561 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1448410464

   CI is green
   <img width="1223" alt="image" src="https://user-images.githubusercontent.com/513218/221904204-246c7156-b1ba-4a43-a729-bc16efa205fa.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1120979273


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -288,44 +287,42 @@ object DataSourceWriteOptions {
     .withDocumentation("The table type for the underlying data, for this write. This can’t change between writes.")
 
   /**
-    * Translate spark parameters to hudi parameters
-    *
-    * @param optParams Parameters to be translated
-    * @return Parameters after translation
-    */
-  def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
-    var translatedOptParams = optParams
-    // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    // we should set hoodie's partition path only if its not set by the user.
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
-      && !optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
-      val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
-        .map(SparkDataSourceUtils.decodePartitioningColumns)
-        .getOrElse(Nil)
-      val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
-        DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)
-
-      keyGeneratorClass match {
-        // CustomKeyGenerator needs special treatment, because it needs to be specified in a way
-        // such as "field1:PartitionKeyType1,field2:PartitionKeyType2".
-        // partitionBy can specify the partition like this: partitionBy("p1", "p2:SIMPLE", "p3:TIMESTAMP")
-        case c if (c.nonEmpty && c == classOf[CustomKeyGenerator].getName) =>
-          val partitionPathField = partitionColumns.map(e => {
-            if (e.contains(":")) {
-              e
-            } else {
-              s"$e:SIMPLE"
-            }
-          }).mkString(",")
-          translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
-        case c if (c.isEmpty || !keyGeneratorClass.equals(classOf[NonpartitionedKeyGenerator].getName)) =>
-          // for any key gen other than NonPartitioned key gen, we can override the partition field config.
-          val partitionPathField = partitionColumns.mkString(",")
-          translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
-        case _ => // no op incase of NonPartitioned Key gen.
-      }
+   * Derive [[PARTITIONPATH_FIELD]] based on [[SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY]]
+   * if [[PARTITIONPATH_FIELD]] is not set explicitly.
+   */
+  def derivePartitionPathFieldsIfNeeded(optParams: Map[String, String]): Map[String, String] = {
+    if (!optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      || optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
+      return optParams
+    }
+
+    val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      .map(SparkDataSourceUtils.decodePartitioningColumns)
+      .getOrElse(Nil)
+    val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)
+
+    keyGeneratorClass match {
+      // CustomKeyGenerator needs special treatment, because it needs to be specified in a way
+      // such as "field1:PartitionKeyType1,field2:PartitionKeyType2".
+      // partitionBy can specify the partition like this: partitionBy("p1", "p2:SIMPLE", "p3:TIMESTAMP")
+      case c if Array(classOf[CustomKeyGenerator].getName, classOf[CustomAvroKeyGenerator].getName).contains(c) =>
+        val partitionPathField = partitionColumns.map(e => {
+          if (e.contains(":")) {
+            e
+          } else {
+            s"$e:SIMPLE"
+          }
+        }).mkString(",")
+        optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
+      case c if !Array(classOf[NonpartitionedKeyGenerator].getName, classOf[NonpartitionedAvroKeyGenerator].getName).contains(c) =>

Review Comment:
   yeah. don't think we can do any better here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1442604881

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ffc4e3d7fb447cb72feaeaa4a1aec866c519e561 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344) 
   * cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1114880387


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1018,6 +1023,26 @@ object HoodieSparkSqlWriter {
     }
   }
 
+  private def mayBeInferPartition(rawParams: Map[String, String]): Map[String, String] = {
+    var optParams = rawParams
+    // if hoodie's partition path field is not set and incoming df's partition is set, infer from it.
+    if (!rawParams.containsKey(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key()) && optParams.containsKey(SPARK_DF_PARTITION_COLUMN_NAME)){
+      val partitionCols : String =  optParams.get(SPARK_DF_PARTITION_COLUMN_NAME).get
+      val partitionFieldValue : String = if (partitionCols.startsWith("[")) {
+        val parts : Array[String] = partitionCols.substring(1, partitionCols.length-1).split(",")
+        var partitionFieldStr = ""
+        parts.foreach(part => {
+          partitionFieldStr += part.substring(1, part.length-1) + ","
+        })
+        partitionFieldStr.substring(0, partitionFieldStr.length - 1)
+      } else {
+        partitionCols
+      }

Review Comment:
   lets jam on this. I am not sure if we can match an array of items. I know we can match a group and we have to split it anyways. lets see if we can simplify this or keep it how it is as of now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1113598296


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1018,6 +1023,26 @@ object HoodieSparkSqlWriter {
     }
   }
 
+  private def mayBeInferPartition(rawParams: Map[String, String]): Map[String, String] = {

Review Comment:
   /nit `maybeInferPartition()`



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -95,14 +96,16 @@ object HoodieSparkSqlWriter {
     ConfigProperty.key("hoodie.internal.sql.merge.into.writes")
       .defaultValue(false)
 
+  val SPARK_DF_PARTITION_COLUMN_NAME = "__partition_columns"

Review Comment:
   can you added doc to explain where this special constant comes from?



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -111,15 +114,17 @@ object HoodieSparkSqlWriter {
             extraPreCommitFn: Option[BiConsumer[HoodieTableMetaClient, HoodieCommitMetadata]] = Option.empty):
   (Boolean, HOption[String], HOption[String], HOption[String], SparkRDDWriteClient[_], HoodieTableConfig) = {
 
-    assert(optParams.get("path").exists(!StringUtils.isNullOrEmpty(_)), "'path' must be set")
-    val path = optParams("path")
+    assert(rawParams.get("path").exists(!StringUtils.isNullOrEmpty(_)), "'path' must be set")
+    val path = rawParams("path")
     val basePath = new Path(path)
 
     val spark = sqlContext.sparkSession
     val sparkContext = sqlContext.sparkContext
 
     val fs = basePath.getFileSystem(sparkContext.hadoopConfiguration)
     tableExists = fs.exists(new Path(basePath, HoodieTableMetaClient.METAFOLDER_NAME))
+    var optParams = mayBeInferPartition(rawParams)

Review Comment:
   minor: `optParams` sounds more intuitive indicating it comes from option(). you can call it `finalOptParams` after making inferences.



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1018,6 +1023,26 @@ object HoodieSparkSqlWriter {
     }
   }
 
+  private def mayBeInferPartition(rawParams: Map[String, String]): Map[String, String] = {
+    var optParams = rawParams
+    // if hoodie's partition path field is not set and incoming df's partition is set, infer from it.
+    if (!rawParams.containsKey(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key()) && optParams.containsKey(SPARK_DF_PARTITION_COLUMN_NAME)){
+      val partitionCols : String =  optParams.get(SPARK_DF_PARTITION_COLUMN_NAME).get
+      val partitionFieldValue : String = if (partitionCols.startsWith("[")) {
+        val parts : Array[String] = partitionCols.substring(1, partitionCols.length-1).split(",")
+        var partitionFieldStr = ""
+        parts.foreach(part => {
+          partitionFieldStr += part.substring(1, part.length-1) + ","
+        })
+        partitionFieldStr.substring(0, partitionFieldStr.length - 1)
+      } else {
+        partitionCols
+      }

Review Comment:
   should use pattern matching to handle all cases inclduing invalid ones



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala:
##########
@@ -142,6 +145,61 @@ class TestCOWDataSource extends HoodieClientTestBase with ScalaAssertionSupport
     spark.read.format("org.apache.hudi").options(readOpts).load(basePath).count()
   }
 
+  @Test
+  def testInferPartitionBy(): Unit = {
+    val (writeOpts, readOpts) = getWriterReaderOpts(HoodieRecordType.AVRO, Map())
+      // Insert Operation
+      val records = recordsToStrings(dataGen.generateInserts("000", 100)).toList
+      val inputDF = spark.read.json(spark.sparkContext.parallelize(records, 2))
+
+      val commonOptsNoPreCombine = Map(
+        "hoodie.insert.shuffle.parallelism" -> "4",
+        "hoodie.upsert.shuffle.parallelism" -> "4",
+        DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+        HoodieWriteConfig.TBL_NAME.key -> "hoodie_test"
+      ) ++ writeOpts
+
+      inputDF.write.partitionBy("partition").format("hudi")
+        .options(commonOptsNoPreCombine)
+        .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
+        .mode(SaveMode.Overwrite)
+        .save(basePath)
+
+    val snapshot0 = spark.read.format("org.apache.hudi").options(readOpts).load(basePath)
+    snapshot0.cache()
+    assertEquals(100, snapshot0.count())
+
+    // verify partition cols
+    assertTrue(snapshot0.filter("_hoodie_partition_path = '" + HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH + "'").count() > 0)
+    assertTrue(snapshot0.filter("_hoodie_partition_path = '" + HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH + "'").count() > 0)
+    assertTrue(snapshot0.filter("_hoodie_partition_path = '" + HoodieTestDataGenerator.DEFAULT_THIRD_PARTITION_PATH + "'").count() > 0)

Review Comment:
   assert the physical partition path too?



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala:
##########
@@ -142,6 +145,61 @@ class TestCOWDataSource extends HoodieClientTestBase with ScalaAssertionSupport
     spark.read.format("org.apache.hudi").options(readOpts).load(basePath).count()
   }
 
+  @Test
+  def testInferPartitionBy(): Unit = {
+    val (writeOpts, readOpts) = getWriterReaderOpts(HoodieRecordType.AVRO, Map())
+      // Insert Operation
+      val records = recordsToStrings(dataGen.generateInserts("000", 100)).toList
+      val inputDF = spark.read.json(spark.sparkContext.parallelize(records, 2))
+
+      val commonOptsNoPreCombine = Map(
+        "hoodie.insert.shuffle.parallelism" -> "4",
+        "hoodie.upsert.shuffle.parallelism" -> "4",
+        DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+        HoodieWriteConfig.TBL_NAME.key -> "hoodie_test"
+      ) ++ writeOpts
+
+      inputDF.write.partitionBy("partition").format("hudi")
+        .options(commonOptsNoPreCombine)
+        .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
+        .mode(SaveMode.Overwrite)
+        .save(basePath)
+
+    val snapshot0 = spark.read.format("org.apache.hudi").options(readOpts).load(basePath)
+    snapshot0.cache()
+    assertEquals(100, snapshot0.count())
+
+    // verify partition cols
+    assertTrue(snapshot0.filter("_hoodie_partition_path = '" + HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH + "'").count() > 0)
+    assertTrue(snapshot0.filter("_hoodie_partition_path = '" + HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH + "'").count() > 0)
+    assertTrue(snapshot0.filter("_hoodie_partition_path = '" + HoodieTestDataGenerator.DEFAULT_THIRD_PARTITION_PATH + "'").count() > 0)
+
+    // try w/ multi field partition paths
+    // generate two batches of df w/ diff partition path values.
+    val records1 = recordsToStrings(dataGen.generateInserts("000", 100)).toList
+    val inputDF1 = spark.read.json(spark.sparkContext.parallelize(records1, 2))
+    val records2 = recordsToStrings(dataGen.generateInserts("000", 100)).toList
+    val inputDF2 = spark.read.json(spark.sparkContext.parallelize(records2, 2))
+    // hard code the value for rider and fare so that we can verify the partitions paths with hudi
+    val toInsertDf = inputDF1.withColumn("fare",lit(100)).withColumn("rider",lit("rider-123"))
+      .union(inputDF2.withColumn("fare",lit(200)).withColumn("rider",lit("rider-456")))
+
+    toInsertDf.write.partitionBy("fare","rider").format("hudi")
+      .options(commonOptsNoPreCombine)
+      .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
+      .mode(SaveMode.Overwrite)
+      .save(basePath)
+
+    val snapshot1 = spark.read.format("org.apache.hudi").options(readOpts).load(basePath)
+    snapshot1.cache()
+    assertEquals(200, snapshot1.count())
+
+    val partitionPaths = FSUtils.getAllPartitionPaths(new HoodieSparkEngineContext(new JavaSparkContext(spark.sparkContext)), HoodieMetadataConfig.newBuilder().build(), basePath)
+    partitionPaths.foreach(entry => println("partition path :: " + entry))
+    assertTrue(partitionPaths.contains("100/rider-123"))
+    assertTrue(partitionPaths.contains("200/rider-456"))

Review Comment:
   assert the partition col value too?



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala:
##########
@@ -142,6 +145,61 @@ class TestCOWDataSource extends HoodieClientTestBase with ScalaAssertionSupport
     spark.read.format("org.apache.hudi").options(readOpts).load(basePath).count()
   }
 
+  @Test
+  def testInferPartitionBy(): Unit = {
+    val (writeOpts, readOpts) = getWriterReaderOpts(HoodieRecordType.AVRO, Map())
+      // Insert Operation
+      val records = recordsToStrings(dataGen.generateInserts("000", 100)).toList
+      val inputDF = spark.read.json(spark.sparkContext.parallelize(records, 2))
+
+      val commonOptsNoPreCombine = Map(
+        "hoodie.insert.shuffle.parallelism" -> "4",
+        "hoodie.upsert.shuffle.parallelism" -> "4",
+        DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+        HoodieWriteConfig.TBL_NAME.key -> "hoodie_test"
+      ) ++ writeOpts
+
+      inputDF.write.partitionBy("partition").format("hudi")
+        .options(commonOptsNoPreCombine)
+        .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
+        .mode(SaveMode.Overwrite)
+        .save(basePath)
+
+    val snapshot0 = spark.read.format("org.apache.hudi").options(readOpts).load(basePath)
+    snapshot0.cache()
+    assertEquals(100, snapshot0.count())
+
+    // verify partition cols
+    assertTrue(snapshot0.filter("_hoodie_partition_path = '" + HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH + "'").count() > 0)
+    assertTrue(snapshot0.filter("_hoodie_partition_path = '" + HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH + "'").count() > 0)
+    assertTrue(snapshot0.filter("_hoodie_partition_path = '" + HoodieTestDataGenerator.DEFAULT_THIRD_PARTITION_PATH + "'").count() > 0)
+
+    // try w/ multi field partition paths
+    // generate two batches of df w/ diff partition path values.
+    val records1 = recordsToStrings(dataGen.generateInserts("000", 100)).toList
+    val inputDF1 = spark.read.json(spark.sparkContext.parallelize(records1, 2))
+    val records2 = recordsToStrings(dataGen.generateInserts("000", 100)).toList
+    val inputDF2 = spark.read.json(spark.sparkContext.parallelize(records2, 2))
+    // hard code the value for rider and fare so that we can verify the partitions paths with hudi
+    val toInsertDf = inputDF1.withColumn("fare",lit(100)).withColumn("rider",lit("rider-123"))
+      .union(inputDF2.withColumn("fare",lit(200)).withColumn("rider",lit("rider-456")))
+
+    toInsertDf.write.partitionBy("fare","rider").format("hudi")
+      .options(commonOptsNoPreCombine)
+      .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
+      .mode(SaveMode.Overwrite)
+      .save(basePath)
+
+    val snapshot1 = spark.read.format("org.apache.hudi").options(readOpts).load(basePath)
+    snapshot1.cache()
+    assertEquals(200, snapshot1.count())
+
+    val partitionPaths = FSUtils.getAllPartitionPaths(new HoodieSparkEngineContext(new JavaSparkContext(spark.sparkContext)), HoodieMetadataConfig.newBuilder().build(), basePath)

Review Comment:
   the parent test harness class already provides jsc, engine context, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1430939405

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7209efd0df54978907b937f1a2aaef0e6b1f74b0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1454046221

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492",
       "triggerID" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494",
       "triggerID" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15555",
       "triggerID" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bbf05d39a470149af7259e2ea0a69b76ebb660df Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15555) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1453732980

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492",
       "triggerID" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494",
       "triggerID" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5ed02b3c18025fc3b0c5a135be64991fb43417b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494) 
   * bbf05d39a470149af7259e2ea0a69b76ebb660df UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1447278078

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371) 
   * 9b678c5be48c132ee8ff047093ea58febd20a3ed UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1448954585

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9b678c5be48c132ee8ff047093ea58febd20a3ed Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450) 
   * 22a797678bab6fe57256d35d24ecbee6b92338e5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1440800610

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7209efd0df54978907b937f1a2aaef0e6b1f74b0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183) 
   * 1bc2fd1e84e07cff545656df5eb0d7163223a8b4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342) 
   * ffc4e3d7fb447cb72feaeaa4a1aec866c519e561 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1114942532


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -295,30 +295,35 @@ object DataSourceWriteOptions {
   def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
     var translatedOptParams = optParams
     // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)) {
+    // we should set hoodie's partition path only if its not set by the user.
+    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)

Review Comment:
   this might be backwards incompatible change. but not sure if previous behavior was supported by mistake. 
   for eg, if some sets hoodie partiiton path field to col1, but incoming df had col2, prior to this patch, col2 will be considered as partitioning col for hudi. but after this patch, it will be col1. 
   only if user did not explicitly set hoodie partition path config, we will use col2.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1440737610

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7209efd0df54978907b937f1a2aaef0e6b1f74b0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183) 
   * 1bc2fd1e84e07cff545656df5eb0d7163223a8b4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan merged PR #7951:
URL: https://github.com/apache/hudi/pull/7951


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1447284289

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371) 
   * 9b678c5be48c132ee8ff047093ea58febd20a3ed Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1120975062


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -288,44 +287,42 @@ object DataSourceWriteOptions {
     .withDocumentation("The table type for the underlying data, for this write. This can’t change between writes.")
 
   /**
-    * Translate spark parameters to hudi parameters
-    *
-    * @param optParams Parameters to be translated
-    * @return Parameters after translation
-    */
-  def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
-    var translatedOptParams = optParams
-    // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    // we should set hoodie's partition path only if its not set by the user.
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
-      && !optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
-      val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
-        .map(SparkDataSourceUtils.decodePartitioningColumns)
-        .getOrElse(Nil)
-      val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
-        DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)
-
-      keyGeneratorClass match {
-        // CustomKeyGenerator needs special treatment, because it needs to be specified in a way
-        // such as "field1:PartitionKeyType1,field2:PartitionKeyType2".
-        // partitionBy can specify the partition like this: partitionBy("p1", "p2:SIMPLE", "p3:TIMESTAMP")
-        case c if (c.nonEmpty && c == classOf[CustomKeyGenerator].getName) =>
-          val partitionPathField = partitionColumns.map(e => {
-            if (e.contains(":")) {
-              e
-            } else {
-              s"$e:SIMPLE"
-            }
-          }).mkString(",")
-          translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
-        case c if (c.isEmpty || !keyGeneratorClass.equals(classOf[NonpartitionedKeyGenerator].getName)) =>
-          // for any key gen other than NonPartitioned key gen, we can override the partition field config.
-          val partitionPathField = partitionColumns.mkString(",")
-          translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
-        case _ => // no op incase of NonPartitioned Key gen.
-      }
+   * Derive [[PARTITIONPATH_FIELD]] based on [[SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY]]
+   * if [[PARTITIONPATH_FIELD]] is not set explicitly.
+   */
+  def derivePartitionPathFieldsIfNeeded(optParams: Map[String, String]): Map[String, String] = {
+    if (!optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      || optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
+      return optParams
+    }
+
+    val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      .map(SparkDataSourceUtils.decodePartitioningColumns)
+      .getOrElse(Nil)
+    val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)

Review Comment:
   I don't think key gen type is recommended. there are some flows where its not honored. So, we fixed out quick start to call out and ask users to use key gen class only. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1449340221

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492",
       "triggerID" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494",
       "triggerID" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e5ed02b3c18025fc3b0c5a135be64991fb43417b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7951:
URL: https://github.com/apache/hudi/pull/7951#discussion_r1118232901


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -295,30 +295,35 @@ object DataSourceWriteOptions {
   def translateSqlOptions(optParams: Map[String, String]): Map[String, String] = {
     var translatedOptParams = optParams
     // translate the api partitionBy of spark DataFrameWriter to PARTITIONPATH_FIELD
-    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)) {
+    // we should set hoodie's partition path only if its not set by the user.
+    if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+      && !optParams.contains(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())) {
       val partitionColumns = optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
         .map(SparkDataSourceUtils.decodePartitioningColumns)
         .getOrElse(Nil)
       val keyGeneratorClass = optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key(),
         DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.defaultValue)
 
-      val partitionPathField =
+      // if not nonpartitioned key gen
+      if (keyGeneratorClass.isEmpty || !keyGeneratorClass.equals(classOf[NonpartitionedKeyGenerator].getName)) {
         keyGeneratorClass match {
           // Only CustomKeyGenerator needs special treatment, because it needs to be specified in a way
           // such as "field1:PartitionKeyType1,field2:PartitionKeyType2".
           // partitionBy can specify the partition like this: partitionBy("p1", "p2:SIMPLE", "p3:TIMESTAMP")
           case c if c == classOf[CustomKeyGenerator].getName =>
-            partitionColumns.map(e => {
+            val partitionPathField = partitionColumns.map(e => {
               if (e.contains(":")) {
                 e
               } else {
                 s"$e:SIMPLE"
               }
             }).mkString(",")
+            translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
           case _ =>
-            partitionColumns.mkString(",")
+            val partitionPathField = partitionColumns.mkString(",")
+            translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)
         }
-      translatedOptParams = optParams ++ Map(PARTITIONPATH_FIELD.key -> partitionPathField)

Review Comment:
   in case of keygen not set or NonpartitionedKeyGenerator, `partitionColumns.mkString(",")` still make sense.  what does this change fix? besides, we should do config validation instead of fixing the logic here, for e.g., when keygen is non partitioned, partition field should not be non-empty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1440808948

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1bc2fd1e84e07cff545656df5eb0d7163223a8b4 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342) 
   * ffc4e3d7fb447cb72feaeaa4a1aec866c519e561 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1442610576

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ffc4e3d7fb447cb72feaeaa4a1aec866c519e561 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344) 
   * cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7951:
URL: https://github.com/apache/hudi/pull/7951#issuecomment-1454056652

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15183",
       "triggerID" : "7209efd0df54978907b937f1a2aaef0e6b1f74b0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15342",
       "triggerID" : "1bc2fd1e84e07cff545656df5eb0d7163223a8b4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15344",
       "triggerID" : "ffc4e3d7fb447cb72feaeaa4a1aec866c519e561",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15371",
       "triggerID" : "cbb0a8c7b89b90b134b7ad41442cfaf59b3654a5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15450",
       "triggerID" : "9b678c5be48c132ee8ff047093ea58febd20a3ed",
       "triggerType" : "PUSH"
     }, {
       "hash" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15491",
       "triggerID" : "22a797678bab6fe57256d35d24ecbee6b92338e5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15492",
       "triggerID" : "9874aa6b73f32857f248c0d4fadbc8a7291e287b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15494",
       "triggerID" : "e5ed02b3c18025fc3b0c5a135be64991fb43417b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15555",
       "triggerID" : "bbf05d39a470149af7259e2ea0a69b76ebb660df",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9ae7b06b3f38d34875349f98d5e64390ab6d60db",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9ae7b06b3f38d34875349f98d5e64390ab6d60db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bbf05d39a470149af7259e2ea0a69b76ebb660df Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15555) 
   * 9ae7b06b3f38d34875349f98d5e64390ab6d60db UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org