You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Chaoyu Tang <ct...@gmail.com> on 2015/03/12 11:53:28 UTC

Review Request 31978: HIE-9720: Metastore does not properly migrate column stats when renaming a table across databases

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31978/
-----------------------------------------------------------

Review request for hive, Brock Noland, Chao Sun, Szehon Ho, and Xuefu Zhang.


Repository: hive-git


Description
-------

Alter table .. rename/change column did not change or invalidate the columns stats data in TAB_COL_STATS or PART_COL_STATS, which lead to inconsistent data in these tables and caused the issues as reported in HIVE-9720 and HIVE-9866. For example, if we do alter table .. rename and move the table to a different database, all related metadata has changed except those in TAB_COL_STATS/PART_COL_STATS. When we drop the moved table, Hive needs delete its column stats data first if it was computed, but it could not since the DB_NAME stored in TAB_COL_STATS does not match the actual DB_NAME, therefore causing the referiential violation seen in HIVE-9720. For another example, after we change a table column type, say from int to string using alter table ... change ..., and if the column stats is computed before and  after the change, you will find this column has the stats data for both int and string, which is not correct. 
This patch is to fix these issues by removing invalid column stats data from TAB_COL_STATS/PART_COL_STATS after the change in db, table, partition and column type for a column.


Diffs
-----

  metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java d99cfdf74d57071183e9385b5a3f2c5335e4ce60 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 612f927d5515fbd5c3257a04b85f9bcc4c6891f3 
  ql/src/test/queries/clientpositive/alter_table_invalidate_column_stats.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter_table_invalidate_column_stats.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/31978/diff/


Testing
-------

1. Manual tests:
Went through cases in alter_table_invalidate_column_stats.q and checked TAB_COL_STATS/PART_COL_STATS to make sure that the invalid column stats has been cleaned after alter table ..., alter table ... cascade, alter table partition ..., with sqldirect and ORM.
2. new qtest alter_table_invalidate_column_stats.q was added and the patch has been submitted to kick off precommitted build


Thanks,

Chaoyu Tang


Re: Review Request 31978: HIVE-9720: Metastore does not properly migrate column stats when renaming a table across databases

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31978/#review76238
-----------------------------------------------------------

Ship it!


Ship It!

- Xuefu Zhang


On March 12, 2015, 10:55 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31978/
> -----------------------------------------------------------
> 
> (Updated March 12, 2015, 10:55 a.m.)
> 
> 
> Review request for hive, Brock Noland, Chao Sun, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Alter table .. rename/change column did not change or invalidate the columns stats data in TAB_COL_STATS or PART_COL_STATS, which lead to inconsistent data in these tables and caused the issues as reported in HIVE-9720 and HIVE-9866. For example, if we do alter table .. rename and move the table to a different database, all related metadata has changed except those in TAB_COL_STATS/PART_COL_STATS. When we drop the moved table, Hive needs delete its column stats data first if it was computed, but it could not since the DB_NAME stored in TAB_COL_STATS does not match the actual DB_NAME, therefore causing the referiential violation seen in HIVE-9720. For another example, after we change a table column type, say from int to string using alter table ... change ..., and if the column stats is computed before and  after the change, you will find this column has the stats data for both int and string, which is not correct. 
> This patch is to fix these issues by removing invalid column stats data from TAB_COL_STATS/PART_COL_STATS after the change in db, table, partition and column type for a column.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java d99cfdf74d57071183e9385b5a3f2c5335e4ce60 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 612f927d5515fbd5c3257a04b85f9bcc4c6891f3 
>   ql/src/test/queries/clientpositive/alter_table_invalidate_column_stats.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_invalidate_column_stats.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/31978/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests:
> Went through cases in alter_table_invalidate_column_stats.q and checked TAB_COL_STATS/PART_COL_STATS to make sure that the invalid column stats has been cleaned after alter table ..., alter table ... cascade, alter table partition ..., with sqldirect and ORM.
> 2. new qtest alter_table_invalidate_column_stats.q was added and the patch has been submitted to kick off precommitted build
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Re: Review Request 31978: HIVE-9720: Metastore does not properly migrate column stats when renaming a table across databases

Posted by Chaoyu Tang <ct...@gmail.com>.

> On March 12, 2015, 1:32 p.m., Xuefu Zhang wrote:
> >

Thanks Xuefu! It looks like the InvalidObjectException only supports three constructors without the InvalidObjectException(String message, Throwable cause), and its string message is the only thrift field which can pass between client/server.

public InvalidObjectException() {
  }
  public InvalidObjectException(
    String message)
  {
    this();
    this.message = message;
  }
  public InvalidObjectException(InvalidObjectException other) {
    if (other.isSetMessage()) {
      this.message = other.message;
    }
  }


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31978/#review76227
-----------------------------------------------------------


On March 12, 2015, 10:55 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31978/
> -----------------------------------------------------------
> 
> (Updated March 12, 2015, 10:55 a.m.)
> 
> 
> Review request for hive, Brock Noland, Chao Sun, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Alter table .. rename/change column did not change or invalidate the columns stats data in TAB_COL_STATS or PART_COL_STATS, which lead to inconsistent data in these tables and caused the issues as reported in HIVE-9720 and HIVE-9866. For example, if we do alter table .. rename and move the table to a different database, all related metadata has changed except those in TAB_COL_STATS/PART_COL_STATS. When we drop the moved table, Hive needs delete its column stats data first if it was computed, but it could not since the DB_NAME stored in TAB_COL_STATS does not match the actual DB_NAME, therefore causing the referiential violation seen in HIVE-9720. For another example, after we change a table column type, say from int to string using alter table ... change ..., and if the column stats is computed before and  after the change, you will find this column has the stats data for both int and string, which is not correct. 
> This patch is to fix these issues by removing invalid column stats data from TAB_COL_STATS/PART_COL_STATS after the change in db, table, partition and column type for a column.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java d99cfdf74d57071183e9385b5a3f2c5335e4ce60 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 612f927d5515fbd5c3257a04b85f9bcc4c6891f3 
>   ql/src/test/queries/clientpositive/alter_table_invalidate_column_stats.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_invalidate_column_stats.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/31978/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests:
> Went through cases in alter_table_invalidate_column_stats.q and checked TAB_COL_STATS/PART_COL_STATS to make sure that the invalid column stats has been cleaned after alter table ..., alter table ... cascade, alter table partition ..., with sqldirect and ORM.
> 2. new qtest alter_table_invalidate_column_stats.q was added and the patch has been submitted to kick off precommitted build
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Re: Review Request 31978: HIVE-9720: Metastore does not properly migrate column stats when renaming a table across databases

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31978/#review76227
-----------------------------------------------------------



metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
<https://reviews.apache.org/r/31978/#comment123741>

    Can we replace '+' with '," so that we can propagate the exception stack? Same below.


- Xuefu Zhang


On March 12, 2015, 10:55 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31978/
> -----------------------------------------------------------
> 
> (Updated March 12, 2015, 10:55 a.m.)
> 
> 
> Review request for hive, Brock Noland, Chao Sun, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Alter table .. rename/change column did not change or invalidate the columns stats data in TAB_COL_STATS or PART_COL_STATS, which lead to inconsistent data in these tables and caused the issues as reported in HIVE-9720 and HIVE-9866. For example, if we do alter table .. rename and move the table to a different database, all related metadata has changed except those in TAB_COL_STATS/PART_COL_STATS. When we drop the moved table, Hive needs delete its column stats data first if it was computed, but it could not since the DB_NAME stored in TAB_COL_STATS does not match the actual DB_NAME, therefore causing the referiential violation seen in HIVE-9720. For another example, after we change a table column type, say from int to string using alter table ... change ..., and if the column stats is computed before and  after the change, you will find this column has the stats data for both int and string, which is not correct. 
> This patch is to fix these issues by removing invalid column stats data from TAB_COL_STATS/PART_COL_STATS after the change in db, table, partition and column type for a column.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java d99cfdf74d57071183e9385b5a3f2c5335e4ce60 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 612f927d5515fbd5c3257a04b85f9bcc4c6891f3 
>   ql/src/test/queries/clientpositive/alter_table_invalidate_column_stats.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_invalidate_column_stats.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/31978/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests:
> Went through cases in alter_table_invalidate_column_stats.q and checked TAB_COL_STATS/PART_COL_STATS to make sure that the invalid column stats has been cleaned after alter table ..., alter table ... cascade, alter table partition ..., with sqldirect and ORM.
> 2. new qtest alter_table_invalidate_column_stats.q was added and the patch has been submitted to kick off precommitted build
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Re: Review Request 31978: HIVE-9720: Metastore does not properly migrate column stats when renaming a table across databases

Posted by Chaoyu Tang <ct...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31978/
-----------------------------------------------------------

(Updated March 12, 2015, 4:08 p.m.)


Review request for hive, Brock Noland, Chao Sun, Szehon Ho, and Xuefu Zhang.


Changes
-------

upload a new version after review.


Repository: hive-git


Description
-------

Alter table .. rename/change column did not change or invalidate the columns stats data in TAB_COL_STATS or PART_COL_STATS, which lead to inconsistent data in these tables and caused the issues as reported in HIVE-9720 and HIVE-9866. For example, if we do alter table .. rename and move the table to a different database, all related metadata has changed except those in TAB_COL_STATS/PART_COL_STATS. When we drop the moved table, Hive needs delete its column stats data first if it was computed, but it could not since the DB_NAME stored in TAB_COL_STATS does not match the actual DB_NAME, therefore causing the referiential violation seen in HIVE-9720. For another example, after we change a table column type, say from int to string using alter table ... change ..., and if the column stats is computed before and  after the change, you will find this column has the stats data for both int and string, which is not correct. 
This patch is to fix these issues by removing invalid column stats data from TAB_COL_STATS/PART_COL_STATS after the change in db, table, partition and column type for a column.


Diffs (updated)
-----

  metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java d99cfdf74d57071183e9385b5a3f2c5335e4ce60 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 612f927d5515fbd5c3257a04b85f9bcc4c6891f3 
  ql/src/test/queries/clientpositive/alter_table_invalidate_column_stats.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter_table_invalidate_column_stats.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/31978/diff/


Testing
-------

1. Manual tests:
Went through cases in alter_table_invalidate_column_stats.q and checked TAB_COL_STATS/PART_COL_STATS to make sure that the invalid column stats has been cleaned after alter table ..., alter table ... cascade, alter table partition ..., with sqldirect and ORM.
2. new qtest alter_table_invalidate_column_stats.q was added and the patch has been submitted to kick off precommitted build


Thanks,

Chaoyu Tang


Re: Review Request 31978: HIVE-9720: Metastore does not properly migrate column stats when renaming a table across databases

Posted by Chaoyu Tang <ct...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31978/
-----------------------------------------------------------

(Updated March 12, 2015, 10:55 a.m.)


Review request for hive, Brock Noland, Chao Sun, Szehon Ho, and Xuefu Zhang.


Changes
-------

Change a typo in Summary


Summary (updated)
-----------------

HIVE-9720: Metastore does not properly migrate column stats when renaming a table across databases


Repository: hive-git


Description
-------

Alter table .. rename/change column did not change or invalidate the columns stats data in TAB_COL_STATS or PART_COL_STATS, which lead to inconsistent data in these tables and caused the issues as reported in HIVE-9720 and HIVE-9866. For example, if we do alter table .. rename and move the table to a different database, all related metadata has changed except those in TAB_COL_STATS/PART_COL_STATS. When we drop the moved table, Hive needs delete its column stats data first if it was computed, but it could not since the DB_NAME stored in TAB_COL_STATS does not match the actual DB_NAME, therefore causing the referiential violation seen in HIVE-9720. For another example, after we change a table column type, say from int to string using alter table ... change ..., and if the column stats is computed before and  after the change, you will find this column has the stats data for both int and string, which is not correct. 
This patch is to fix these issues by removing invalid column stats data from TAB_COL_STATS/PART_COL_STATS after the change in db, table, partition and column type for a column.


Diffs
-----

  metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java d99cfdf74d57071183e9385b5a3f2c5335e4ce60 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 612f927d5515fbd5c3257a04b85f9bcc4c6891f3 
  ql/src/test/queries/clientpositive/alter_table_invalidate_column_stats.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter_table_invalidate_column_stats.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/31978/diff/


Testing
-------

1. Manual tests:
Went through cases in alter_table_invalidate_column_stats.q and checked TAB_COL_STATS/PART_COL_STATS to make sure that the invalid column stats has been cleaned after alter table ..., alter table ... cascade, alter table partition ..., with sqldirect and ORM.
2. new qtest alter_table_invalidate_column_stats.q was added and the patch has been submitted to kick off precommitted build


Thanks,

Chaoyu Tang