You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Phabricator (JIRA)" <ji...@apache.org> on 2012/05/04 19:42:50 UTC

[jira] [Commented] (HIVE-2529) metastore 0.8 upgrade script for PostgreSQL

    [ https://issues.apache.org/jira/browse/HIVE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268567#comment-13268567 ] 

Phabricator commented on HIVE-2529:
-----------------------------------

zhenxiao has commented on the revision "HIVE-2529 [jira] metastore 0.8 upgrade script for PostgreSQL".

  Manually tested Hive Metastore upgrade from 0.7 to 0.8, and from 0.8 to 0.9

  Before the upgrade:

  [~/Notes/metastore/postgres]hive
  Hive history file=/tmp/cloudera/hive_job_log_cloudera_201205041018_1363096543.txt
  hive> show tables;
  OK
  Time taken: 2.913 seconds
  hive> create table test3(a string) partitioned by (b int);
  OK
  Time taken: 0.213 seconds
  hive> create table test(c array<int>, b int);
  OK
  Time taken: 0.176 seconds
  hive> create table test2(key string, value string) partitioned by (ds string) stored as textfile;
  OK
  Time taken: 0.088 seconds
  hive> load data local inpath '/home/cloudera/Code/hive/data/files/kv1.txt' into table test2 partition (ds='2012-02-09');
  Copying data from file:/home/cloudera/Code/hive/data/files/kv1.txt
  Copying file: file:/home/cloudera/Code/hive/data/files/kv1.txt
  Loading data to table default.test2 partition (ds=2012-02-09)
  OK
  Time taken: 0.658 seconds
  hive> create table src(key string);
  OK
  Time taken: 0.075 seconds
  hive> create index src_index_8 on table src(key) as 'compact' WITH DEFERRED REBUILD IDXPROPERTIES ("prop1"="val1", "prop2"="val2");
  OK
  Time taken: 0.23 seconds
  hive> alter index src_index_8 on src set IDXPROPERTIES ("prop1"="val1_new", "prop3"="val3");
  OK
  Time taken: 0.078 seconds
  hive> show tables;
  OK
  default__src_src_index_8__
  src
  test
  test2
  test3
  Time taken: 0.142 seconds

  In postgreSQL metastore:


  metastore=# \dt
                 List of relations
   Schema |        Name        | Type  |  Owner
  --------+--------------------+-------+----------
   public | BUCKETING_COLS     | table | cloudera
   public | COLUMNS            | table | cloudera
   public | DATABASE_PARAMS    | table | cloudera
   public | DBS                | table | cloudera
   public | DB_PRIVS           | table | cloudera
   public | GLOBAL_PRIVS       | table | cloudera
   public | IDXS               | table | cloudera
   public | INDEX_PARAMS       | table | cloudera
   public | NUCLEUS_TABLES     | table | cloudera
   public | PARTITIONS         | table | cloudera
   public | PARTITION_KEYS     | table | cloudera
   public | PARTITION_KEY_VALS | table | cloudera
   public | PARTITION_PARAMS   | table | cloudera
   public | PART_COL_PRIVS     | table | cloudera
   public | PART_PRIVS         | table | cloudera
   public | ROLES              | table | cloudera
   public | ROLE_MAP           | table | cloudera
   public | SDS                | table | cloudera
   public | SD_PARAMS          | table | cloudera
   public | SEQUENCE_TABLE     | table | cloudera
   public | SERDES             | table | cloudera
   public | SERDE_PARAMS       | table | cloudera
   public | SORT_COLS          | table | cloudera
   public | TABLE_PARAMS       | table | cloudera
   public | TBLS               | table | cloudera
   public | TBL_COL_PRIVS      | table | cloudera
   public | TBL_PRIVS          | table | cloudera
   public | TYPES              | table | cloudera
   public | TYPE_FIELDS        | table | cloudera
  (29 rows)

  metastore=# select * from "TBLS";
   TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME |  OWNER   | RETENTION | SD_ID |          TBL_NAME          |   TBL_TYPE    | VIEW_EXPANDED_TEXT | VIEW_ORIG
  INAL_TEXT
  --------+-------------+-------+------------------+----------+-----------+-------+----------------------------+---------------+--------------------+----------
  ----------
        1 |  1336151986 |     1 |                0 | cloudera |         0 |     1 | test3                      | MANAGED_TABLE |                    |
        2 |  1336152012 |     1 |                0 | cloudera |         0 |     2 | test                       | MANAGED_TABLE |                    |
        3 |  1336152016 |     1 |                0 | cloudera |         0 |     3 | test2                      | MANAGED_TABLE |                    |
        4 |  1336152032 |     1 |                0 | cloudera |         0 |     5 | src                        | MANAGED_TABLE |                    |
        5 |  1336152039 |     1 |                0 |          |         0 |     6 | default__src_src_index_8__ | INDEX_TABLE   |                    |
  (5 rows)

  metastore=# select * from "COLUMNS";
   SD_ID | COMMENT | COLUMN_NAME |   TYPE_NAME   | INTEGER_IDX
  -------+---------+-------------+---------------+-------------
       1 |         | a           | string        |           0
       2 |         | c           | array<int>    |           0
       2 |         | b           | int           |           1
       3 |         | key         | string        |           0
       3 |         | value       | string        |           1
       4 |         | key         | string        |           0
       4 |         | value       | string        |           1
       5 |         | key         | string        |           0
       6 |         | key         | string        |           0
       6 |         | _bucketname | string        |           1
       6 |         | _offsets    | array<bigint> |           2
       7 |         | key         | string        |           0
  (12 rows)

  metastore=# select * from "PARTITIONS";
   PART_ID | CREATE_TIME | LAST_ACCESS_TIME |   PART_NAME   | SD_ID | TBL_ID
  ---------+-------------+------------------+---------------+-------+--------
         1 |  1336152022 |                0 | ds=2012-02-09 |     4 |      3
  (1 row)


  metastore=# select * from "IDXS";
   INDEX_ID | CREATE_TIME | DEFERRED_REBUILD |                     INDEX_HANDLER_CLASS                     | INDEX_NAME  | INDEX_TBL_ID | LAST_ACCESS_TIME | OR
  IG_TBL_ID | SD_ID
  ----------+-------------+------------------+-------------------------------------------------------------+-------------+--------------+------------------+---
  ----------+-------
          1 |  1336152039 | t                | org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler | src_index_8 |            5 |       1336152039 |
          4 |     7
  (1 row)

  metastore=# select * from "SDS";
   SD_ID |               INPUT_FORMAT               | IS_COMPRESSED |                            LOCATION                             | NUM_BUCKETS |
               OUTPUT_FORMAT                        | SERDE_ID
  -------+------------------------------------------+---------------+-----------------------------------------------------------------+-------------+----------
  --------------------------------------------------+----------
       1 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/test3                      |          -1 | org.apach
  e.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        1
       2 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/test                       |          -1 | org.apach
  e.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        2
       3 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/test2                      |          -1 | org.apach
  e.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        3
       4 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/test2/ds=2012-02-09        |          -1 | org.apach
  e.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        4
       5 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/src                        |          -1 | org.apach
  e.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        5
       6 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/default__src_src_index_8__ |          -1 | org.apach
  e.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat     |        6
       7 | org.apache.hadoop.mapred.TextInputFormat | f             |                                                                 |          -1 | org.apach
  e.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat     |        7
  (7 rows)

  metastore=# select * from "PARTITION_KEYS";
   TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX
  --------+--------------+-----------+-----------+-------------
        1 |              | b         | int       |           0
        3 |              | ds        | string    |           0
  (2 rows)

  metastore=# select * from "SERDES";
   SERDE_ID | NAME |                        SLIB
  ----------+------+----------------------------------------------------
          1 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          2 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          3 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          4 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          5 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          6 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          7 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  (7 rows)

  metastore=# select * from "BUCKETING_COLS";
   SD_ID | BUCKET_COL_NAME | INTEGER_IDX
  -------+-----------------+-------------
  (0 rows)

  Upgrade from hive0.7 to hive0.8:

  [~/Code/hive/metastore/scripts/upgrade/postgres]psql -f upgrade-0.7.0-to-0.8.0.postgres.sql metastore
                      ?column?
  ------------------------------------------------
   Upgrading MetaStore schema from 0.7.0 to 0.8.0
  (1 row)

  SET
                                      ?column?
  ---------------------------------------------------------------------------------
   < HIVE-2246: Dedupe tables column schemas from partitions in the metastore db >
  (1 row)

  CREATE TABLE
  CREATE TABLE
  ALTER TABLE
      ?column?
  ----------------
   Tables Created
  (1 row)

  ALTER TABLE
  ALTER TABLE
       ?column?
  -------------------
   SDS Table Updated
  (1 row)

  INSERT 0 5
     ?column?
  ---------------
   CDS populated
  (1 row)

  UPDATE 5
  INSERT 0 9
      ?column?
  -----------------
   Tables Migrated
  (1 row)

  CREATE TABLE
  INSERT 0 1
  UPDATE 1
        ?column?
  ---------------------
   Partitions Migrated
  (1 row)

  INSERT 0 1
  UPDATE 1
  INSERT 0 1
       ?column?
  ------------------
   Indexes Migrated
  (1 row)

  ALTER TABLE
         ?column?
  -----------------------
   Columns Table Renamed
  (1 row)

                                  ?column?
  -------------------------------------------------------------------------
   < HIVE-2215 Add api for marking querying set of partitions for events >
  (1 row)

  CREATE TABLE
  CREATE INDEX
                          ?column?
  ---------------------------------------------------------
   Finished upgrading MetaStore schema from 0.7.0 to 0.8.0
  (1 row)

  The postgreSQL metastore after upgrading from 0.7 to 0.8:

  metastore=# \dt
                 List of relations
   Schema |        Name        | Type  |  Owner
  --------+--------------------+-------+----------
   public | BUCKETING_COLS     | table | cloudera
   public | CDS                | table | cloudera
   public | COLUMNS_OLD        | table | cloudera
   public | COLUMNS_V2         | table | cloudera
   public | DATABASE_PARAMS    | table | cloudera
   public | DBS                | table | cloudera
   public | DB_PRIVS           | table | cloudera
   public | GLOBAL_PRIVS       | table | cloudera
   public | IDXS               | table | cloudera
   public | INDEX_PARAMS       | table | cloudera
   public | NUCLEUS_TABLES     | table | cloudera
   public | PARTITIONS         | table | cloudera
   public | PARTITION_EVENTS   | table | cloudera
   public | PARTITION_KEYS     | table | cloudera
   public | PARTITION_KEY_VALS | table | cloudera
   public | PARTITION_PARAMS   | table | cloudera
   public | PART_COL_PRIVS     | table | cloudera
   public | PART_PRIVS         | table | cloudera
   public | ROLES              | table | cloudera
   public | ROLE_MAP           | table | cloudera
   public | SDS                | table | cloudera
   public | SD_PARAMS          | table | cloudera
   public | SEQUENCE_TABLE     | table | cloudera
   public | SERDES             | table | cloudera
   public | SERDE_PARAMS       | table | cloudera
   public | SORT_COLS          | table | cloudera
   public | TABLE_PARAMS       | table | cloudera
   public | TBLS               | table | cloudera
   public | TBL_COL_PRIVS      | table | cloudera
   public | TBL_PRIVS          | table | cloudera
   public | TYPES              | table | cloudera
   public | TYPE_FIELDS        | table | cloudera
  (32 rows)

  metastore=# select * from "TBLS";
   TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME |  OWNER   | RETENTION | SD_ID |          TBL_NAME          |   TBL_TYPE    | VIEW_EXPANDED_TEXT | VIEW_ORIG
  INAL_TEXT
  --------+-------------+-------+------------------+----------+-----------+-------+----------------------------+---------------+--------------------+----------
  ----------
        1 |  1336151986 |     1 |                0 | cloudera |         0 |     1 | test3                      | MANAGED_TABLE |                    |
        2 |  1336152012 |     1 |                0 | cloudera |         0 |     2 | test                       | MANAGED_TABLE |                    |
        3 |  1336152016 |     1 |                0 | cloudera |         0 |     3 | test2                      | MANAGED_TABLE |                    |
        4 |  1336152032 |     1 |                0 | cloudera |         0 |     5 | src                        | MANAGED_TABLE |                    |
        5 |  1336152039 |     1 |                0 |          |         0 |     6 | default__src_src_index_8__ | INDEX_TABLE   |                    |
  (5 rows)


  metastore=# select * from "COLUMNS_V2";
   CD_ID | COMMENT | COLUMN_NAME |   TYPE_NAME   | INTEGER_IDX
  -------+---------+-------------+---------------+-------------
       1 |         | a           | string        |           0
       2 |         | c           | array<int>    |           0
       2 |         | b           | int           |           1
       3 |         | key         | string        |           0
       3 |         | value       | string        |           1
       5 |         | key         | string        |           0
       6 |         | key         | string        |           0
       6 |         | _bucketname | string        |           1
       6 |         | _offsets    | array<bigint> |           2
       7 |         | key         | string        |           0
  (10 rows)

  metastore=# select * from "COLUMNS_OLD";
   SD_ID | COMMENT | COLUMN_NAME |   TYPE_NAME   | INTEGER_IDX
  -------+---------+-------------+---------------+-------------
       1 |         | a           | string        |           0
       2 |         | c           | array<int>    |           0
       2 |         | b           | int           |           1
       3 |         | key         | string        |           0
       3 |         | value       | string        |           1
       4 |         | key         | string        |           0
       4 |         | value       | string        |           1
       5 |         | key         | string        |           0
       6 |         | key         | string        |           0
       6 |         | _bucketname | string        |           1
       6 |         | _offsets    | array<bigint> |           2
       7 |         | key         | string        |           0
  (12 rows)

  metastore=# select * from "PARTITIONS";
   PART_ID | CREATE_TIME | LAST_ACCESS_TIME |   PART_NAME   | SD_ID | TBL_ID
  ---------+-------------+------------------+---------------+-------+--------
         1 |  1336152022 |                0 | ds=2012-02-09 |     4 |      3
  (1 row)


  metastore=# select * from "IDXS";
   INDEX_ID | CREATE_TIME | DEFERRED_REBUILD |                     INDEX_HANDLER_CLASS                     | INDEX_NAME  | INDEX_TBL_ID | LAST_ACCESS_TIME | OR
  IG_TBL_ID | SD_ID
  ----------+-------------+------------------+-------------------------------------------------------------+-------------+--------------+------------------+---
  ----------+-------
          1 |  1336152039 | t                | org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler | src_index_8 |            5 |       1336152039 |
          4 |     7
  (1 row)

  metastore=# select * from "SDS";
   SD_ID |               INPUT_FORMAT               | IS_COMPRESSED |                            LOCATION                             | NUM_BUCKETS |
               OUTPUT_FORMAT                        | SERDE_ID | CD_ID
  -------+------------------------------------------+---------------+-----------------------------------------------------------------+-------------+----------
  --------------------------------------------------+----------+-------
       1 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/test3                      |          -1 | org.apach
  e.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        1 |     1
       2 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/test                       |          -1 | org.apach
  e.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        2 |     2
       3 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/test2                      |          -1 | org.apach
  e.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        3 |     3
       5 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/src                        |          -1 | org.apach
  e.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        5 |     5
       6 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/default__src_src_index_8__ |          -1 | org.apach
  e.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat     |        6 |     6
       4 | org.apache.hadoop.mapred.TextInputFormat | f             | hdfs://localhost/user/hive/warehouse/test2/ds=2012-02-09        |          -1 | org.apach
  e.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |        4 |     3
       7 | org.apache.hadoop.mapred.TextInputFormat | f             |                                                                 |          -1 | org.apach
  e.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat     |        7 |     7
  (7 rows)

  metastore=# select * from "CDS";
   CD_ID
  -------
       1
       2
       3
       5
       6
       7
  (6 rows)

  metastore=# select * from "PARTITION_KEYS";
   TBL_ID | PKEY_COMMENT | PKEY_NAME | PKEY_TYPE | INTEGER_IDX
  --------+--------------+-----------+-----------+-------------
        1 |              | b         | int       |           0
        3 |              | ds        | string    |           0
  (2 rows)

  metastore=# select * from "SERDES";
   SERDE_ID | NAME |                        SLIB
  ----------+------+----------------------------------------------------
          1 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          2 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          3 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          4 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          5 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          6 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          7 |      | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  (7 rows)

  Upgrading to 0.9 is similar, and there is no schema object change:

  [~/Code/hive/metastore/scripts/upgrade/postgres]psql -f upgrade-0.8.0-to-0.9.0.postgres.sql metastore
                      ?column?
  ------------------------------------------------
   Upgrading MetaStore schema from 0.8.0 to 0.9.0
  (1 row)

                          ?column?
  ---------------------------------------------------------
   Finished upgrading MetaStore schema from 0.8.0 to 0.9.0
  (1 row)


REVISION DETAIL
  https://reviews.facebook.net/D3027

                
> metastore 0.8 upgrade script for PostgreSQL 
> --------------------------------------------
>
>                 Key: HIVE-2529
>                 URL: https://issues.apache.org/jira/browse/HIVE-2529
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.8.0
>            Reporter: John Sichi
>            Assignee: Zhenxiao Luo
>            Priority: Blocker
>         Attachments: HIVE-2529.1.patch.txt, HIVE-2529.D3027.1.patch
>
>
> I think you mentioned that this was in the works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira