You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "nsivabalan (via GitHub)" <gi...@apache.org> on 2023/02/06 01:18:41 UTC

[GitHub] [hudi] nsivabalan opened a new pull request, #7859: [HUDI-2175] Adding support for dynamic schemas

nsivabalan opened a new pull request, #7859:
URL: https://github.com/apache/hudi/pull/7859

   ### Change Logs
   
   This patch introduce reconcile strategy and add dynamic schema strategy. Existing reconcile strategy is deemed as "legacy". 
   
   Legacy reconcile strategy:
   if newer incoming has more columns than table schema, newer incoming will be chosen as the new table schema. 
   if newer incoming has few columns than table schema, table schema will remain as is. 
   No other flows are supported. 
   
   Dynamic schema reconcile strategy:
   This is a super set of legacy. In this, newer incoming can have some dropped columns and could have new columns as well compared to table schema. New table schema will be last known table schema + new columns in new batch (even if new batch had some dropped columns, hudi will auto fill nulls) 
   
   ### Impact
   
   More flexibility in evolving schemas w/ hudi. 
   
   ### Risk level (write none, low medium or high below)
   
   low. 
   
   ### Documentation Update
   
   Introducing a new config named `hoodie.datasource.write.reconcile.schema.strategy`. Default value is `legacy_reconcile_strategy`. and to leverage dynamic schema, value to set is `dynamic_schema_reconcile_strategy`. 
   Users have to set reconcile `hoodie.datasource.write.reconcile.schema` to true to leverage this.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7859: [HUDI-2175] Adding support for dynamic schemas

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7859:
URL: https://github.com/apache/hudi/pull/7859#issuecomment-1418725509

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6df8030d8daacb65904ea8c48442d1f5fcab0eb",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14959",
       "triggerID" : "f6df8030d8daacb65904ea8c48442d1f5fcab0eb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f6df8030d8daacb65904ea8c48442d1f5fcab0eb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14959) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7859: [HUDI-2175] Adding support for dynamic schemas

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7859:
URL: https://github.com/apache/hudi/pull/7859#issuecomment-1418673483

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6df8030d8daacb65904ea8c48442d1f5fcab0eb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f6df8030d8daacb65904ea8c48442d1f5fcab0eb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f6df8030d8daacb65904ea8c48442d1f5fcab0eb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7859: [HUDI-2175] Adding support for dynamic schemas

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7859:
URL: https://github.com/apache/hudi/pull/7859#issuecomment-1419238010

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6df8030d8daacb65904ea8c48442d1f5fcab0eb",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14959",
       "triggerID" : "f6df8030d8daacb65904ea8c48442d1f5fcab0eb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f6df8030d8daacb65904ea8c48442d1f5fcab0eb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14959) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on pull request #7859: [HUDI-2175] Adding support for dynamic schemas

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on PR #7859:
URL: https://github.com/apache/hudi/pull/7859#issuecomment-1418885678

   Hi @nsivabalan if I enable full schema evolution, can I add a column in the middle will dynamic schema reconciliation handle it? Or this is only for out of the box schema evolution?
   There was a PR https://github.com/apache/hudi/pull/6017 implementing similar behaviour when both full schema evolution and reconciliation were enabled.
   I'm interested in preserving similar behavior if possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org