You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Attila Magyar (Jira)" <ji...@apache.org> on 2020/04/20 11:38:00 UTC

[jira] [Updated] (HIVE-23253) Synchronization between external SerDe schemas and Metastore

     [ https://issues.apache.org/jira/browse/HIVE-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Attila Magyar updated HIVE-23253:
---------------------------------
    Fix Version/s:     (was: 3.0.0)

> Synchronization between external SerDe schemas and Metastore
> ------------------------------------------------------------
>
>                 Key: HIVE-23253
>                 URL: https://issues.apache.org/jira/browse/HIVE-23253
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Metastore
>    Affects Versions: 3.1.2
>            Reporter: Attila Magyar
>            Priority: Major
>
> In HIVE-15995 an ALTER <table> UPDATE COLUMNS statement was introduce to sync external SerDe schema changes with the metastore. This command can only be manually invoked.
> See it in the documentation.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionUpdatecolumns
>  
> Maybe it would make sense to run an update columns automatically in certain cases to prevent problems coming from cases where the user forgets running the update columns manually.
>  
> One way to reproduce the issue is to change the schema url via an alter table statement.
> {code:java}
> [root@c7401 vagrant]# cat test_schema1.avsc
> {
> "type":"record",
> "name":"test_schema",
> "namespace":"gdc_datascience_qa",
> "fields":[
> {
> "name":"name",
> "type":[
> "null",
> "string"
> ],
> "default":null
> }
> ]
> }[root@c7401 vagrant]# cat test_schema2.avsc
> {
> "type":"record",
> "name":"test_schema",
> "namespace":"gdc_datascience_qa",
> "fields":[
> {
> "name":"name",
> "type":[
> "null",
> "string"
> ],
> "default":null
> },
> {
> "name":"last_name",
> "type":[
> "null",
> "string"
> ],
> "default":null
> }
> ]
> }
>  {code}
> {code:java}
>  $ hadoop fs -copyFromLocal *.avsc /tmp/
>   [beeline] create external table t1 stored as avro tblproperties ('avro.schema.url'='/tmp/test_schema1.avsc');
>   [beeline] alter table t1 set tblproperties('avro.schema.url'='/tmp/test_schema2.avsc'); 
>   [beeline] insert into t1 values ('n1', 'l1');
>   [beeline] create external table t2 stored as avro tblproperties ('avro.schema.url'='/tmp/test_schema2.avsc');
>   [beeline] insert into t2 values ('n2', 'l2');
>   [beeline] insert overwrite table t1 select * from t2; {code}
> Error:
> {code:java}
>  MetaException(message:Column last_name doesn't exist in table t1 in database default)
>         at org.apache.hadoop.hive.metastore.ObjectStore.validateTableCols(ObjectStore.java:8652)
>         at org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:8602)
>         at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionColStats(ObjectStore.java:8416)
>         at org.apache.hadoop.hive.metastore.ObjectStore.updateTableColumnStatistics(ObjectStore.java:8446 {code}
> Running an ALTER UPDATE COLUMNS fixes the problem.
>  
> cc: [~szita]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)