You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "jingxiong zhong (Jira)" <ji...@apache.org> on 2021/12/17 10:46:00 UTC
[jira] [Comment Edited] (SPARK-37521) insert overwrite table but the partition information stored in Metastore was not changed
[ https://issues.apache.org/jira/browse/SPARK-37521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452521#comment-17452521 ]
jingxiong zhong edited comment on SPARK-37521 at 12/17/21, 10:45 AM:
---------------------------------------------------------------------
The schema of metasotre's updated partition was not found in Hive
when you execute
{code:sql}
'create table updata_col_test1(a int) partitioned by (dt string);
insert overwrite table updata_col_test1 partition(dt='20200101') values(1);
insert overwrite table updata_col_test1 partition(dt='20200102') values(1);
insert overwrite table updata_col_test1 partition(dt='20200103') values(1);
alter table updata_col_test1 add columns (b int);
insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); '
{code}
result from two engine
HIVE:
hive> select * from bigdata_qa.updata_col_test1;
OK
updata_col_test1.a updata_col_test1.b updata_col_test1.dt
1 NULL 20200101
1 NULL 20200102
1 NULL 20200103
Time taken: 2.985 seconds, Fetched: 3 row(s)
hive> desc bigdata_qa.updata_col_test1 partition(dt='20200101');
OK
col_name data_type comment
a int
dt string
# Partition Information
# col_name data_type comment
dt string
Time taken: 6.469 seconds, Fetched: 7 row(s)
SPARK:
spark-sql> select * from bigdata_qa.updata_col_test1;
a b dt
1 2 20200101
1 NULL 20200102
1 NULL 20200103
Time taken: 0.357 seconds, Fetched 3 row(s)
spark-sql> desc bigdata_qa.updata_col_test1 partition(dt='20200101');
col_name data_type comment
a int
b int
dt string
# Partition Information
# col_name data_type comment
dt string
Time taken: 0.196 seconds, Fetched 6 row(s)
was (Author: JIRAUSER281124):
The schema of metasotre's updated partition was not found in Hive
when you execute
'create table updata_col_test1(a int) partitioned by (dt string);
insert overwrite table updata_col_test1 partition(dt='20200101') values(1);
insert overwrite table updata_col_test1 partition(dt='20200102') values(1);
insert overwrite table updata_col_test1 partition(dt='20200103') values(1);
alter table updata_col_test1 add columns (b int);
insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); '
result from two engine
HIVE:
hive> select * from bigdata_qa.updata_col_test1;
OK
updata_col_test1.a updata_col_test1.b updata_col_test1.dt
1 NULL 20200101
1 NULL 20200102
1 NULL 20200103
Time taken: 2.985 seconds, Fetched: 3 row(s)
hive> desc bigdata_qa.updata_col_test1 partition(dt='20200101');
OK
col_name data_type comment
a int
dt string
# Partition Information
# col_name data_type comment
dt string
Time taken: 6.469 seconds, Fetched: 7 row(s)
SPARK:
spark-sql> select * from bigdata_qa.updata_col_test1;
a b dt
1 2 20200101
1 NULL 20200102
1 NULL 20200103
Time taken: 0.357 seconds, Fetched 3 row(s)
spark-sql> desc bigdata_qa.updata_col_test1 partition(dt='20200101');
col_name data_type comment
a int
b int
dt string
# Partition Information
# col_name data_type comment
dt string
Time taken: 0.196 seconds, Fetched 6 row(s)
> insert overwrite table but the partition information stored in Metastore was not changed
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-37521
> URL: https://issues.apache.org/jira/browse/SPARK-37521
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.2.0
> Environment: spark3.2.0
> hive2.3.9
> metastore2.3.9
> Reporter: jingxiong zhong
> Priority: Major
>
> I create a partitioned table in SparkSQL, insert a data entry, add a regular field, and finally insert a new data entry into the partition,The query is normal in SparkSQL, but the return value of the newly inserted field is NULL in Hive 2.3.9
> for example
> create table updata_col_test1(a int) partitioned by (dt string);
> insert overwrite table updata_col_test1 partition(dt='20200101') values(1);
> insert overwrite table updata_col_test1 partition(dt='20200102') values(1);
> insert overwrite table updata_col_test1 partition(dt='20200103') values(1);
> alter table updata_col_test1 add columns (b int);
> insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); fail
> insert overwrite table updata_col_test1 partition(dt='20200101') values(1, 2); fail
> insert overwrite table updata_col_test1 partition(dt='20200104') values(1, 2); sucessfully
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org