You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Yussuf Burke <yb...@gmail.com> on 2021/11/04 13:05:00 UTC

IMetaStoreClient.alter_table does not support dropping columns?

Hello,
We recently ran into an issue where calls to IMetaStoreClient.alter_table() were failing with an error message like the following:
"The following columns have types incompatible with the existing columns in their respective positions : foo, bar".
 
For some background, we use Confluent's KafkaConnect to consume data from Kafka. It writes the data it consumes into HDFS (as Parquet files) and then uses the IMetaStoreClient to update the schema of an external table in Hive pointing to those Parquet files.
 
After some investigation, we identified that the error from MetaStore was due to us attempting to remove some columns in our external table when updating the schema (by calling IMetaStoreClient.alter_table).
I found this code that validates changes in MetaStore here: https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DefaultIncompatibleTableChangeHandler.java#L82
 
It seems to validate the changes (made when calling IMetaStoreClient.alter_table) by matching columns from the current schema with columns from the new schema by index. And checking to ensure any type changes are compatible.
The problem is, if I am trying to drop a column as part of my new schema, this logic could end up comparing completely different columns when validating their types.
 
I am aware that we can forgo the type validation entirely by setting "metastore.disallow.incompatible.col.type.changes" but this is undesirable as we want some safeguards in place against bad schema updates.
 
My question is, is this expected behaviour? We are fairly new to using Hive so are unsure if dropping columns via IMetaStoreClient is unsupported by design.
If it is not expected behaviour, could the logic be changed to match columns by names before checking that type changes are compatible?
 
 
Kind regards,
Yussuf

Re: IMetaStoreClient.alter_table does not support dropping columns?

Posted by Patrick Duin <pa...@gmail.com>.
Hi Yussuf

A hive user here having the same issues.
I think the interface method just follows the same code path as an Alter
table query would do.
My current thinking is that this safeguard was probably more useful in the
olden days of CSV files. With the more modern file formats like Avro, ORC
and Parquet you get lookup by name (instead of position). So there is just
much more manipulation that could be supported.
Our use case is doing complex struct type changes, adding a field in a
struct is basically a new type so disallowed by Hive with the flag on. So
we're turning the safeguard off, worst case a bad schema update has to be
rolled back but at least that's also allowed :). It would be nice to have
protection but I am not aware of anything besides what you mention and
setting: 'metastore.disallow.incompatible.col.type.changes'

Kind regards,
 Patrick





Op do 4 nov. 2021 om 14:05 schreef Yussuf Burke <yb...@gmail.com>:

> Hello,
>
> We recently ran into an issue where calls to
> IMetaStoreClient.alter_table() were failing with an error message like the
> following:
>
> "The following columns have types incompatible with the existing columns
> in their respective positions : foo, bar".
>
>
>
> For some background, we use Confluent's KafkaConnect to consume data from
> Kafka. It writes the data it consumes into HDFS (as Parquet files) and then
> uses the IMetaStoreClient to update the schema of an external table in Hive
> pointing to those Parquet files.
>
>
>
> After some investigation, we identified that the error from MetaStore was
> due to us attempting to remove some columns in our external table when
> updating the schema (by calling IMetaStoreClient.alter_table).
>
> I found this code that validates changes in MetaStore here:
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DefaultIncompatibleTableChangeHandler.java#L82
>
>
>
> It seems to validate the changes (made when calling
> IMetaStoreClient.alter_table) by matching columns from the current schema
> with columns from the new schema by index. And checking to ensure any type
> changes are compatible.
>
> The problem is, if I am trying to drop a column as part of my new schema,
> this logic could end up comparing completely different columns when
> validating their types.
>
>
>
> I am aware that we can forgo the type validation entirely by setting
> "metastore.disallow.incompatible.col.type.changes" but this is undesirable
> as we want some safeguards in place against bad schema updates.
>
>
>
> My question is, is this expected behaviour? We are fairly new to using
> Hive so are unsure if dropping columns via IMetaStoreClient is unsupported
> by design.
>
> If it is not expected behaviour, could the logic be changed to match
> columns by names before checking that type changes are compatible?
>
>
>
>
>
> Kind regards,
>
> Yussuf
>