You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "jingxiong zhong (Jira)" <ji...@apache.org> on 2021/12/17 10:46:00 UTC

[jira] [Comment Edited] (SPARK-37521) insert overwrite table but the partition information stored in Metastore was not changed

    [ https://issues.apache.org/jira/browse/SPARK-37521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452521#comment-17452521 ] 

jingxiong zhong edited comment on SPARK-37521 at 12/17/21, 10:45 AM:
---------------------------------------------------------------------

The schema of metasotre's updated partition was not found in Hive

when you execute


{code:sql}
'create table updata_col_test1(a int) partitioned by (dt string); 
insert overwrite table updata_col_test1 partition(dt='20200101') values(1); 
insert overwrite table updata_col_test1 partition(dt='20200102') values(1);
insert overwrite table updata_col_test1 partition(dt='20200103') values(1);
alter table  updata_col_test1 add columns (b int);
insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); '
{code}

result from two engine

HIVE:

hive> select * from bigdata_qa.updata_col_test1;
OK
updata_col_test1.a    updata_col_test1.b    updata_col_test1.dt
1    NULL    20200101
1    NULL    20200102
1    NULL    20200103
Time taken: 2.985 seconds, Fetched: 3 row(s)
hive>  desc bigdata_qa.updata_col_test1 partition(dt='20200101');
OK
col_name    data_type    comment
a                       int
dt                      string

# Partition Information
# col_name                data_type               comment

dt                      string
Time taken: 6.469 seconds, Fetched: 7 row(s)

SPARK:

spark-sql> select * from bigdata_qa.updata_col_test1;

a    b    dt
1    2    20200101
1    NULL    20200102
1    NULL    20200103
Time taken: 0.357 seconds, Fetched 3 row(s)
spark-sql> desc bigdata_qa.updata_col_test1 partition(dt='20200101');
col_name    data_type    comment
a                       int
b                       int
dt                      string
# Partition Information
# col_name              data_type               comment
dt                      string
Time taken: 0.196 seconds, Fetched 6 row(s)

 


was (Author: JIRAUSER281124):
The schema of metasotre's updated partition was not found in Hive

when you execute

'create table updata_col_test1(a int) partitioned by (dt string); 
insert overwrite table updata_col_test1 partition(dt='20200101') values(1); 
insert overwrite table updata_col_test1 partition(dt='20200102') values(1);
insert overwrite table updata_col_test1 partition(dt='20200103') values(1);

alter table  updata_col_test1 add columns (b int);

insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); '

result from two engine

HIVE:

hive> select * from bigdata_qa.updata_col_test1;
OK
updata_col_test1.a    updata_col_test1.b    updata_col_test1.dt
1    NULL    20200101
1    NULL    20200102
1    NULL    20200103
Time taken: 2.985 seconds, Fetched: 3 row(s)
hive>  desc bigdata_qa.updata_col_test1 partition(dt='20200101');
OK
col_name    data_type    comment
a                       int
dt                      string

# Partition Information
# col_name                data_type               comment

dt                      string
Time taken: 6.469 seconds, Fetched: 7 row(s)

SPARK:

spark-sql> select * from bigdata_qa.updata_col_test1;

a    b    dt
1    2    20200101
1    NULL    20200102
1    NULL    20200103
Time taken: 0.357 seconds, Fetched 3 row(s)
spark-sql> desc bigdata_qa.updata_col_test1 partition(dt='20200101');
col_name    data_type    comment
a                       int
b                       int
dt                      string
# Partition Information
# col_name              data_type               comment
dt                      string
Time taken: 0.196 seconds, Fetched 6 row(s)

 

> insert overwrite table but the partition information stored in Metastore was not changed
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-37521
>                 URL: https://issues.apache.org/jira/browse/SPARK-37521
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>         Environment: spark3.2.0
> hive2.3.9
> metastore2.3.9
>            Reporter: jingxiong zhong
>            Priority: Major
>
> I create a partitioned table in SparkSQL, insert a data entry, add a regular field, and finally insert a new data entry into the partition,The query is normal in SparkSQL, but the return value of the newly inserted field is NULL in Hive 2.3.9
> for example
> create table updata_col_test1(a int) partitioned by (dt string); 
> insert overwrite table updata_col_test1 partition(dt='20200101') values(1); 
> insert overwrite table updata_col_test1 partition(dt='20200102') values(1);
> insert overwrite table updata_col_test1 partition(dt='20200103') values(1);
> alter table  updata_col_test1 add columns (b int);
> insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); fail
> insert overwrite table updata_col_test1 partition(dt='20200101') values(1, 2); fail
> insert overwrite table updata_col_test1 partition(dt='20200104') values(1, 2); sucessfully



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org