You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Hemanth Yamijala <yh...@gmail.com> on 2013/03/13 06:03:56 UTC

Hive metastore schema question related to ColumnDescriptors

Hi folks,

In some recent work, I've been looking at some issues related to the Hive
metastore schema. It seems more dev related, hence I'm not sure if the user
list is ideal, but will ask anyway. Still, it would help me if someone can
clarify a few questions as below:

Firstly, I would like to know the relationship between a Table,
StorageDescriptor and ColumnDescriptor, primarily from a cardinality point
of view. Is it 1-1 / many-1 between them ?

One specific reason I ask is related to the Hive upgrade scripts of
https://issues.apache.org/jira/browse/HIVE-2246 that introduced the CDS
table. In the upgrade scripts, AFAICS, we pick up non-null SD_IDs from the
TBLS table, and insert them into (ultimately) into the CDS table. However,
in these tables, the IDs are primary keys and hence need to be unique. This
would imply that we expect the values in SD_ID column in the TBLS table to
be unique as well ?

If yes, there doesn't seem to be a constraint of the TBLS table to enforce
this.
If no, the upgrade script is doing the wrong thing.

Could someone please comment on this ?

Thanks
Hemanth