You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "jhchee (via GitHub)" <gi...@apache.org> on 2023/04/19 14:38:10 UTC
[GitHub] [hudi] jhchee opened a new issue, #8502: [SUPPORT] Does merge into Spark SQL supports schema evolution
jhchee opened a new issue, #8502:
URL: https://github.com/apache/hudi/issues/8502
**Describe the problem you faced**
I have created a table with 2 columns namely `userId` and `updatedAt`. I'm passing new column in the `merge into` command. but gotten an exception.
```java
spark.sql("" +
"MERGE INTO target USING source ON target.userId = source.userId " +
"WHEN MATCHED THEN UPDATE SET target.nested = struct(source.colA), target.updatedAt = source.updatedAt " +
"WHEN NOT MATCHED THEN INSERT (userId, nested, updatedAt) " +
"VALUES (source.userId, struct(source.colA), source.updatedAt)" +
"")
```
```
Cannot resolve 'target.nested
```
**To Reproduce**
Steps to reproduce the behavior:
1. Merge into command with new column specified.
2. Try setting `.config("hoodie.schema.on.read.enable", "true")` doesn't help.
**Expected behavior**
The schema should evolve and detect that this is a new column.
A clear and concise description of what you expected to happen.
**Environment Description**
* Hudi version : 0.12.2
* Spark version : 3.3.1
* Hive version : -
* Hadoop version : -
* Storage (HDFS/S3/GCS..) : -
* Running on Docker? (yes/no) : -
**Additional context**
Add any other context about the problem here.
**Stacktrace**
```Add the stacktrace of the error.```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] kazdy commented on issue #8502: [SUPPORT] Does spark.sql("MERGE INTO") supports schema evolution write option
Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on issue #8502:
URL: https://github.com/apache/hudi/issues/8502#issuecomment-1523888006
@ad1happy2go
Not sure if this is really something blocked by spark sql parser, as an example Delta Lake supports schema evolution in MERGE INTO (both for partial updates as well as for update * and insert *):
https://docs.delta.io/latest/delta-update.html#-merge-schema-evolution
Would be great to have something similar in Hudi. Currently, Hudi tries to use target table schema during MERGE INTO (and drops incoming columns if schema is wider for example).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8502: [SUPPORT] Does spark.sql("MERGE INTO") supports schema evolution write option
Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8502:
URL: https://github.com/apache/hudi/issues/8502#issuecomment-1621375702
@kazdy @jhchee You are correct, this should be supported for MERGE INTO.. I confirmed master also doesn't support it. Attaching the same code which should work.
```
create table test_insert3 (
id int,
name string,
updated_at timestamp
) using hudi
options (
type = 'cow',
primaryKey = 'id',
preCombineField = 'updated_at'
) location 'file:///tmp/test_insert3';
merge into test_insert3 as target
using (
select 1 as id, 'c' as name, 1 as new_col, current_timestamp as updated_at
union select 1 as id,'d' as name, 1 as new_col, current_timestamp as updated_at
union select 1 as id,'e' as name, 1 as new_col, current_timestamp as updated_at
) source
on target.id = source.id
when matched then update set target.new_col = source.new_col
when not matched then insert *;
```
Create JIRA to track - https://issues.apache.org/jira/browse/HUDI-6483
Feel free to contribute.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8502: [SUPPORT] Does spark.sql("MERGE INTO") supports schema evolution write option
Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8502:
URL: https://github.com/apache/hudi/issues/8502#issuecomment-1522911733
@jhchee Spark sql parser doesn't supports this so not sure if we can do anything on our end. All configs comes into play during the execution of sql.
you can do ALTER table first and add column before calling the merge.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org