You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2013/01/09 11:26:41 UTC

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

    [ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547981#comment-13547981 ] 

Hudson commented on HIVE-2246:
------------------------------

Integrated in Hive-trunk-hadoop2 #54 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/54/])
    HIVE-3424. Error by upgrading a Hive 0.7.0 database to 0.8.0 (008-HIVE-2246.mysql.sql) (Alexander Alten-Lorenz via cws) (Revision 1380483)

     Result = ABORTED
cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380483
Files : 
* /hive/trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql

                
> Dedupe tables' column schemas from partitions in the metastore db
> -----------------------------------------------------------------
>
>                 Key: HIVE-2246
>                 URL: https://issues.apache.org/jira/browse/HIVE-2246
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, HIVE-2246.8.patch
>
>
> Note: this patch proposes a schema change, and is therefore incompatible with the current metastore.
> We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future.  Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition.  We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. 
> An idea is to create an additional level of indirection with a "Column Descriptor" that has a list of columns.  A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution).  Partitions and Indexes can reference the same Column Descriptors as their parent table.
> Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows.  We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors.
> Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira