You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Attila Magyar (Jira)" <ji...@apache.org> on 2020/04/20 11:38:00 UTC
[jira] [Updated] (HIVE-23253) Synchronization between external
SerDe schemas and Metastore
[ https://issues.apache.org/jira/browse/HIVE-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Magyar updated HIVE-23253:
---------------------------------
Fix Version/s: (was: 3.0.0)
> Synchronization between external SerDe schemas and Metastore
> ------------------------------------------------------------
>
> Key: HIVE-23253
> URL: https://issues.apache.org/jira/browse/HIVE-23253
> Project: Hive
> Issue Type: Bug
> Components: Hive, Metastore
> Affects Versions: 3.1.2
> Reporter: Attila Magyar
> Priority: Major
>
> In HIVE-15995 an ALTER <table> UPDATE COLUMNS statement was introduce to sync external SerDe schema changes with the metastore. This command can only be manually invoked.
> See it in the documentation.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionUpdatecolumns
>
> Maybe it would make sense to run an update columns automatically in certain cases to prevent problems coming from cases where the user forgets running the update columns manually.
>
> One way to reproduce the issue is to change the schema url via an alter table statement.
> {code:java}
> [root@c7401 vagrant]# cat test_schema1.avsc
> {
> "type":"record",
> "name":"test_schema",
> "namespace":"gdc_datascience_qa",
> "fields":[
> {
> "name":"name",
> "type":[
> "null",
> "string"
> ],
> "default":null
> }
> ]
> }[root@c7401 vagrant]# cat test_schema2.avsc
> {
> "type":"record",
> "name":"test_schema",
> "namespace":"gdc_datascience_qa",
> "fields":[
> {
> "name":"name",
> "type":[
> "null",
> "string"
> ],
> "default":null
> },
> {
> "name":"last_name",
> "type":[
> "null",
> "string"
> ],
> "default":null
> }
> ]
> }
> {code}
> {code:java}
> $ hadoop fs -copyFromLocal *.avsc /tmp/
> [beeline] create external table t1 stored as avro tblproperties ('avro.schema.url'='/tmp/test_schema1.avsc');
> [beeline] alter table t1 set tblproperties('avro.schema.url'='/tmp/test_schema2.avsc');
> [beeline] insert into t1 values ('n1', 'l1');
> [beeline] create external table t2 stored as avro tblproperties ('avro.schema.url'='/tmp/test_schema2.avsc');
> [beeline] insert into t2 values ('n2', 'l2');
> [beeline] insert overwrite table t1 select * from t2; {code}
> Error:
> {code:java}
> MetaException(message:Column last_name doesn't exist in table t1 in database default)
> at org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:8652)
> at org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:8602)
> at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionColStats(ObjectStore.java:8416)
> at org.apache.hadoop.hive.metastore.ObjectStore.updateTableColumnStatistics(ObjectStore.java:8446 {code}
> Running an ALTER UPDATE COLUMNS fixes the problem.
>
> cc: [~szita]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)