You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "董可伦 (Jira)" <ji...@apache.org> on 2021/08/05 13:51:00 UTC
[jira] [Updated] (HUDI-2279) Support column name matching for insert * and update set * when sourceTable's columns contains all targetTable's columns

     [ https://issues.apache.org/jira/browse/HUDI-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

董可伦 updated HUDI-2279:
----------------------
    Description: 
Example:
{code:java}
val tableName = generateTableName
// Create table
spark.sql(
 s"""
 |create table $tableName (
 | id int,
 | name string,
 | price double,
 | ts long,
 | dt string
 |) using hudi
 | location '${tmp.getCanonicalPath}/$tableName'
 | options (
 | primaryKey ='id',
 | preCombineField = 'ts'
 | )
 """.stripMargin)
spark.sql(
  s"""
     |merge into ${tableName} as t0
     |using (
     |  select 1 as id, '2021-05-05' as dt, 1002 as ts, 97 as price, 'a1' as name union all
     |  select 1 as id, '2021-05-05' as dt, 1003 as ts, 98 as price, 'a2' as name union all
     |  select 2 as id, '2021-05-05' as dt, 1001 as ts, 99 as price, 'a3' as name
     | ) as s0
     |on t0.id = s0.id
     |when matched then update set *
     |when not matched  then insert *
     |""".stripMargin)
spark.sql(s"select id,name,price,dt from ${tableName}").show()
 
 {code}
Fow now,the result is:

+---+----------+-----+---+
| id| name|price| dt|
+---+----------+-----+---+
| 1|2021-05-05| 98.0| a2|
| 2|2021-05-05| 99.0| a3|
+---+----------+-----+---+

When the order of the column types of souceTable is different from that of the column types of targetTable

 
{code:java}
spark.sql(
  s"""
     |merge into ${tableName} as t0
     |using (
     |  select 1 as id, 'a1' as name, 1002 as ts, '2021-05-05' as dt, 97 as price union all
     |  select 1 as id, 'a2' as name, 1003 as ts, '2021-05-05' as dt, 98 as price union all
     |  select 2 as id, 'a3' as name, 1001 as ts, '2021-05-05' as dt, 99 as price
     | ) as s0
     |on t0.id = s0.id
     |when matched then update set *
     |when not matched  then insert *
     |""".stripMargin){code}
 

It will throw an exception:
{code:java}
[ERROR] 2021-08-05 21:48:53,941 org.apache.hudi.io.HoodieWriteHandle  - Error writing record HoodieRecord{key=HoodieKey { recordKey=id:2 partitionPath=}, currentLocation='null', newLocation='null'}
java.lang.RuntimeException: Error in execute expression: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer.
Expressions is: [boundreference() AS `id`  boundreference() AS `name`  CAST(boundreference() AS `price` AS DOUBLE)  CAST(boundreference() AS `ts` AS BIGINT)  CAST(boundreference() AS `dt` AS STRING)]
CodeBody is: {
......
Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.IntegerCaused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer at org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_366797ae_4c30_4862_8222_7be486ede4f8.eval(Unknown Source) at org.apache.spark.sql.hudi.command.payload.ExpressionPayload.org$apache$spark$sql$hudi$command$payload$ExpressionPayload$$evaluate(ExpressionPayload.scala:258) ... 18 more{code}
 

> Support column name matching for insert * and update set * when sourceTable's columns contains all targetTable's columns
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-2279
>                 URL: https://issues.apache.org/jira/browse/HUDI-2279
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Spark Integration
>            Reporter: 董可伦
>            Assignee: 董可伦
>            Priority: Major
>             Fix For: 0.9.0
>
>
> Example:
> {code:java}
> val tableName = generateTableName
> // Create table
> spark.sql(
>  s"""
>  |create table $tableName (
>  | id int,
>  | name string,
>  | price double,
>  | ts long,
>  | dt string
>  |) using hudi
>  | location '${tmp.getCanonicalPath}/$tableName'
>  | options (
>  | primaryKey ='id',
>  | preCombineField = 'ts'
>  | )
>  """.stripMargin)
> spark.sql(
>   s"""
>      |merge into ${tableName} as t0
>      |using (
>      |  select 1 as id, '2021-05-05' as dt, 1002 as ts, 97 as price, 'a1' as name union all
>      |  select 1 as id, '2021-05-05' as dt, 1003 as ts, 98 as price, 'a2' as name union all
>      |  select 2 as id, '2021-05-05' as dt, 1001 as ts, 99 as price, 'a3' as name
>      | ) as s0
>      |on t0.id = s0.id
>      |when matched then update set *
>      |when not matched  then insert *
>      |""".stripMargin)
> spark.sql(s"select id,name,price,dt from ${tableName}").show()
>  
>  {code}
> Fow now,the result is:
> +---+----------+-----+---+
> | id| name|price| dt|
> +---+----------+-----+---+
> | 1|2021-05-05| 98.0| a2|
> | 2|2021-05-05| 99.0| a3|
> +---+----------+-----+---+
> When the order of the column types of souceTable is different from that of the column types of targetTable
>  
> {code:java}
> spark.sql(
>   s"""
>      |merge into ${tableName} as t0
>      |using (
>      |  select 1 as id, 'a1' as name, 1002 as ts, '2021-05-05' as dt, 97 as price union all
>      |  select 1 as id, 'a2' as name, 1003 as ts, '2021-05-05' as dt, 98 as price union all
>      |  select 2 as id, 'a3' as name, 1001 as ts, '2021-05-05' as dt, 99 as price
>      | ) as s0
>      |on t0.id = s0.id
>      |when matched then update set *
>      |when not matched  then insert *
>      |""".stripMargin){code}
>  
> It will throw an exception:
> {code:java}
> [ERROR] 2021-08-05 21:48:53,941 org.apache.hudi.io.HoodieWriteHandle  - Error writing record HoodieRecord{key=HoodieKey { recordKey=id:2 partitionPath=}, currentLocation='null', newLocation='null'}
> java.lang.RuntimeException: Error in execute expression: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer.
> Expressions is: [boundreference() AS `id`  boundreference() AS `name`  CAST(boundreference() AS `price` AS DOUBLE)  CAST(boundreference() AS `ts` AS BIGINT)  CAST(boundreference() AS `dt` AS STRING)]
> CodeBody is: {
> ......
> Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.IntegerCaused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer at org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_366797ae_4c30_4862_8222_7be486ede4f8.eval(Unknown Source) at org.apache.spark.sql.hudi.command.payload.ExpressionPayload.org$apache$spark$sql$hudi$command$payload$ExpressionPayload$$evaluate(ExpressionPayload.scala:258) ... 18 more{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)