You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Chaoyu Tang <ct...@gmail.com> on 2017/01/19 22:29:05 UTC

Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/
-----------------------------------------------------------

Review request for hive and pengcheng xiong.


Bugs: HIVE-15653
    https://issues.apache.org/jira/browse/HIVE-15653


Repository: hive-git


Description
-------

For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
  ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/55731/diff/


Testing
-------

1. Manual tests
2. new unit tests


Thanks,

Chaoyu Tang


Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/#review162903
-----------------------------------------------------------


Ship it!




Ship It!

- pengcheng xiong


On Jan. 25, 2017, 2:56 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55731/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2017, 2:56 a.m.)
> 
> 
> Review request for hive and pengcheng xiong.
> 
> 
> Bugs: HIVE-15653
>     https://issues.apache.org/jira/browse/HIVE-15653
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 4aea152 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0f472e7 
>   ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55731/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests
> 2. new unit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

Posted by Chaoyu Tang <ct...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/
-----------------------------------------------------------

(Updated Jan. 25, 2017, 2:56 a.m.)


Review request for hive and pengcheng xiong.


Changes
-------

PengCheng, please take a look at the revised patch. Thanks.


Bugs: HIVE-15653
    https://issues.apache.org/jira/browse/HIVE-15653


Repository: hive-git


Description
-------

For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.


Diffs (updated)
-----

  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 4aea152 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0f472e7 
  ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/55731/diff/


Testing
-------

1. Manual tests
2. new unit tests


Thanks,

Chaoyu Tang


Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

Posted by Chaoyu Tang <ct...@gmail.com>.

> On Jan. 24, 2017, 5:58 a.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3650
> > <https://reviews.apache.org/r/55731/diff/3/?file=1613301#file1613301line3650>
> >
> >     Thanks Chaoyu for the patch and comments. I still think we do not need the function hasStatsInParameters. We only care about row_count and raw_data_size, especially row_count. We use row_count to answer some query directly from metastore. The other parameters, totalSize etc do not matter. The other part of the patch looks good to me. Thanks.

Thanks, PengCheng. In current code, command "alter table .. set tblproperty" can also set the stats including row_count, raw_data_size, without setting the flag STATS_GENERATED to USER  like it does for "alter table .. update statistics" in DDLSemanticAnalyzer. So in this case, we need a way in DDLTask alterTableOrSinglePartition to determine whether this property change is related to the stats. I used hasStatsInParameters, though it also include the change of numFiles & totalSize into consideration.
I quite do not understand, if we only care about row_count/raw_data_size, why do we recalcualte the fastStats (numFiles/totalSize) in HMS in a lot of cases?
Also I want to confirm that currently COLUMN_STATS_ACCURATE and BASIC_STATS only reflect the stats accuracy of row_count and raw_data_size? If so, is there any use of numFiles/totalSize?


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/#review162771
-----------------------------------------------------------


On Jan. 24, 2017, 5:01 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55731/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 5:01 a.m.)
> 
> 
> Review request for hive and pengcheng xiong.
> 
> 
> Bugs: HIVE-15653
>     https://issues.apache.org/jira/browse/HIVE-15653
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 4aea152 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0f472e7 
>   ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55731/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests
> 2. new unit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

Posted by pengcheng xiong <px...@hortonworks.com>.

> On Jan. 24, 2017, 5:58 a.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3650
> > <https://reviews.apache.org/r/55731/diff/3/?file=1613301#file1613301line3650>
> >
> >     Thanks Chaoyu for the patch and comments. I still think we do not need the function hasStatsInParameters. We only care about row_count and raw_data_size, especially row_count. We use row_count to answer some query directly from metastore. The other parameters, totalSize etc do not matter. The other part of the patch looks good to me. Thanks.
> 
> Chaoyu Tang wrote:
>     Thanks, PengCheng. In current code, command "alter table .. set tblproperty" can also set the stats including row_count, raw_data_size, without setting the flag STATS_GENERATED to USER  like it does for "alter table .. update statistics" in DDLSemanticAnalyzer. So in this case, we need a way in DDLTask alterTableOrSinglePartition to determine whether this property change is related to the stats. I used hasStatsInParameters, though it also include the change of numFiles & totalSize into consideration.
>     I quite do not understand, if we only care about row_count/raw_data_size, why do we recalcualte the fastStats (numFiles/totalSize) in HMS in a lot of cases?
>     Also I want to confirm that currently COLUMN_STATS_ACCURATE and BASIC_STATS only reflect the stats accuracy of row_count and raw_data_size? If so, is there any use of numFiles/totalSize?

Thanks Chaoyu. You raised very good questions. (1) Although "alter table .. set tblproperty" can also set the stats including row_count, in current code, it will remove basic stats = true anyway. Thus it is safe. Now, "DO_NOT_UPDATE_STATS" is introduced in the patch. I think we can move (not remove) the function that you introduced to rewrite L1364 to L1392 in DDLSemanticAnalyzer. That is, if a user is going to change row_count, raw_data_size, no matter where it comes from, set tblproperty or update statistics, we will set it to be coming from the USER. Later, we can check it in alterTableOrSinglePartition. What do you think about this idea? (2) totalSize etc are only useful when row_count/raw_data_size are not available. For more details, you can refer to L156-L173. And, we only use row_count in StatsOptimizer. That is why "COLUMN_STATS_ACCURATE and BASIC_STATS only reflect the stats accuracy of row_count and raw_data_size" (actually, it should be only row_count because we only use 
 row_count in StatsOptimizer). Thanks!


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/#review162771
-----------------------------------------------------------


On Jan. 24, 2017, 5:01 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55731/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 5:01 a.m.)
> 
> 
> Review request for hive and pengcheng xiong.
> 
> 
> Bugs: HIVE-15653
>     https://issues.apache.org/jira/browse/HIVE-15653
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 4aea152 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0f472e7 
>   ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55731/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests
> 2. new unit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/#review162771
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java (line 3650)
<https://reviews.apache.org/r/55731/#comment234117>

    Thanks Chaoyu for the patch and comments. I still think we do not need the function hasStatsInParameters. We only care about row_count and raw_data_size, especially row_count. We use row_count to answer some query directly from metastore. The other parameters, totalSize etc do not matter. The other part of the patch looks good to me. Thanks.


- pengcheng xiong


On Jan. 24, 2017, 5:01 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55731/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 5:01 a.m.)
> 
> 
> Review request for hive and pengcheng xiong.
> 
> 
> Bugs: HIVE-15653
>     https://issues.apache.org/jira/browse/HIVE-15653
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 4aea152 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0f472e7 
>   ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55731/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests
> 2. new unit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

Posted by Chaoyu Tang <ct...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/
-----------------------------------------------------------

(Updated Jan. 24, 2017, 5:01 a.m.)


Review request for hive and pengcheng xiong.


Bugs: HIVE-15653
    https://issues.apache.org/jira/browse/HIVE-15653


Repository: hive-git


Description
-------

For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.


Diffs (updated)
-----

  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 4aea152 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0f472e7 
  ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/55731/diff/


Testing
-------

1. Manual tests
2. new unit tests


Thanks,

Chaoyu Tang


Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

Posted by Chaoyu Tang <ct...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/
-----------------------------------------------------------

(Updated Jan. 24, 2017, 4:59 a.m.)


Review request for hive and pengcheng xiong.


Changes
-------

revised patch based on the review.


Bugs: HIVE-15653
    https://issues.apache.org/jira/browse/HIVE-15653


Repository: hive-git


Description
-------

For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.


Diffs (updated)
-----

  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 4aea152 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0f472e7 
  ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/55731/diff/


Testing
-------

1. Manual tests
2. new unit tests


Thanks,

Chaoyu Tang


Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

Posted by pengcheng xiong <px...@hortonworks.com>.

> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3645
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609803#file1609803line3645>
> >
> >     If it is for user update stats, you will have     StatsSetupConst.STATS_GENERATED = StatsSetupConst.USER automatically. Thus it is not necessary to have/call hasStatsInParameters function.
> 
> Chaoyu Tang wrote:
>     Thanks PengCheng for reviewing the patch. I looked through the code related to this STATS status and feel a little confusing, here are some questions I need help from you. To my understanding:
>     1. For all operations which set STATS_GENERATED to TASK, we should set BASIC_STATS as TRUE, right?
>     2. But for the stats row_count, raw_data_size updated by the command alter table .. update statistics, the STATS_GENERATED is USER, should we set BASIC_STATS as TRUE or FALSE?
>     3. These stats could sometimes be set as parameters using command alter table .. settblproperties, in these cases, the BASIC_STATS should be set FALSE?

1. yes. what i mean here is that, you can just check if STATS_GENERATED==StatsSetupConst.USER to determine whether it hasStatsInParameters. 
2. set it to FALSE. But note that when we call "StatsSetupConst.setBasicStatsState(params, StatsSetupConst.FALSE);" we just remove the entry for "COLUMN_STATS_ACCURATE". Thus, set it to false is equal to remove the entry.
3. yes. it should be set to FALSE. And as a result, there is no entry for "COLUMN_STATS_ACCURATE"


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3657
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609803#file1609803line3657>
> >
> >     And, could u just remove the property rather than set it to false? We treat the missing of the property the same as set it to false.
> 
> Chaoyu Tang wrote:
>     alterTableOrSinglePartition could be called in a loop in the method alterTable(Hive db, AlterTableDesc alterTbl) (see DDLTask line 3408). Setting it to false could avoid it to be reset to true in the beginning of the method, I wonder if it makes sense.

Yes, i understand and I agree with the logic that you want to do. My point is that, rather than put <DO_NOT_UPDATE_STATS, FALSE> entry, we can just remove the key <DO_NOT_UPDATE_STATS>. We will check whether the property contains that key later. See Line 219 in MetaStoreUtils.java.


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/#review162357
-----------------------------------------------------------


On Jan. 19, 2017, 10:29 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55731/
> -----------------------------------------------------------
> 
> (Updated Jan. 19, 2017, 10:29 p.m.)
> 
> 
> Review request for hive and pengcheng xiong.
> 
> 
> Bugs: HIVE-15653
>     https://issues.apache.org/jira/browse/HIVE-15653
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
>   ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55731/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests
> 2. new unit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

Posted by Chaoyu Tang <ct...@gmail.com>.

> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> >

XiongPeng, I updated the 2nd patches and also replied you comments, please take a look to see if they make sense. Thanks


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3645
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609803#file1609803line3645>
> >
> >     If it is for user update stats, you will have     StatsSetupConst.STATS_GENERATED = StatsSetupConst.USER automatically. Thus it is not necessary to have/call hasStatsInParameters function.
> 
> Chaoyu Tang wrote:
>     Thanks PengCheng for reviewing the patch. I looked through the code related to this STATS status and feel a little confusing, here are some questions I need help from you. To my understanding:
>     1. For all operations which set STATS_GENERATED to TASK, we should set BASIC_STATS as TRUE, right?
>     2. But for the stats row_count, raw_data_size updated by the command alter table .. update statistics, the STATS_GENERATED is USER, should we set BASIC_STATS as TRUE or FALSE?
>     3. These stats could sometimes be set as parameters using command alter table .. settblproperties, in these cases, the BASIC_STATS should be set FALSE?
> 
> pengcheng xiong wrote:
>     1. yes. what i mean here is that, you can just check if STATS_GENERATED==StatsSetupConst.USER to determine whether it hasStatsInParameters. 
>     2. set it to FALSE. But note that when we call "StatsSetupConst.setBasicStatsState(params, StatsSetupConst.FALSE);" we just remove the entry for "COLUMN_STATS_ACCURATE". Thus, set it to false is equal to remove the entry.
>     3. yes. it should be set to FALSE. And as a result, there is no entry for "COLUMN_STATS_ACCURATE"

In item 3, these stats might include row_count, raw_data_size, numFiles, totalSize etc.


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3657
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609803#file1609803line3657>
> >
> >     And, could u just remove the property rather than set it to false? We treat the missing of the property the same as set it to false.
> 
> Chaoyu Tang wrote:
>     alterTableOrSinglePartition could be called in a loop in the method alterTable(Hive db, AlterTableDesc alterTbl) (see DDLTask line 3408). Setting it to false could avoid it to be reset to true in the beginning of the method, I wonder if it makes sense.
> 
> pengcheng xiong wrote:
>     Yes, i understand and I agree with the logic that you want to do. My point is that, rather than put <DO_NOT_UPDATE_STATS, FALSE> entry, we can just remove the key <DO_NOT_UPDATE_STATS>. We will check whether the property contains that key later. See Line 219 in MetaStoreUtils.java.

removing the key <DO_NOT_UPDATE_STATS> instead of setting it to FALSE.


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/test/queries/clientpositive/alter_table_stats_status.q, line 1
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609804#file1609804line1>
> >
> >     Could u add some more test cases for partition stats? Thanks.
> 
> Chaoyu Tang wrote:
>     Will provide soon in next patch after testing.

Add the tests for partition stats.


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3655
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609803#file1609803line3655>
> >
> >     The same as above

See comments in code:
      // STATS_GENERATED==StatsSetupConst.USER is set only when the stats row_count, raw_data_size
      // are updated using alter table .. update statistics .. But the stats parameters including
      // totalSize etc could also be set using command like alter table .. set tblproperties, in this
      // case, STATS_GENERATED will not be set, so we check hasStatsInParameters(alterTbl.getProps())
      // here which also covered the case were STATS_GENERATED==StatsSetupConst.USER
To see if it makes sense to use hasStatsInParameters instead of only checking STATS_GENERATED==StatsSetupConst.USER


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/#review162357
-----------------------------------------------------------


On Jan. 19, 2017, 10:29 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55731/
> -----------------------------------------------------------
> 
> (Updated Jan. 19, 2017, 10:29 p.m.)
> 
> 
> Review request for hive and pengcheng xiong.
> 
> 
> Bugs: HIVE-15653
>     https://issues.apache.org/jira/browse/HIVE-15653
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
>   ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55731/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests
> 2. new unit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

Posted by Chaoyu Tang <ct...@gmail.com>.

> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3657
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609803#file1609803line3657>
> >
> >     And, could u just remove the property rather than set it to false? We treat the missing of the property the same as set it to false.

alterTableOrSinglePartition could be called in a loop in the method alterTable(Hive db, AlterTableDesc alterTbl) (see DDLTask line 3408). Setting it to false could avoid it to be reset to true in the beginning of the method, I wonder if it makes sense.


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 3645
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609803#file1609803line3645>
> >
> >     If it is for user update stats, you will have     StatsSetupConst.STATS_GENERATED = StatsSetupConst.USER automatically. Thus it is not necessary to have/call hasStatsInParameters function.

Thanks PengCheng for reviewing the patch. I looked through the code related to this STATS status and feel a little confusing, here are some questions I need help from you. To my understanding:
1. For all operations which set STATS_GENERATED to TASK, we should set BASIC_STATS as TRUE, right?
2. But for the stats row_count, raw_data_size updated by the command alter table .. update statistics, the STATS_GENERATED is USER, should we set BASIC_STATS as TRUE or FALSE?
3. These stats could sometimes be set as parameters using command alter table .. settblproperties, in these cases, the BASIC_STATS should be set FALSE?


> On Jan. 19, 2017, 11:23 p.m., pengcheng xiong wrote:
> > ql/src/test/queries/clientpositive/alter_table_stats_status.q, line 1
> > <https://reviews.apache.org/r/55731/diff/1/?file=1609804#file1609804line1>
> >
> >     Could u add some more test cases for partition stats? Thanks.

Will provide soon in next patch after testing.


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/#review162357
-----------------------------------------------------------


On Jan. 19, 2017, 10:29 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55731/
> -----------------------------------------------------------
> 
> (Updated Jan. 19, 2017, 10:29 p.m.)
> 
> 
> Review request for hive and pengcheng xiong.
> 
> 
> Bugs: HIVE-15653
>     https://issues.apache.org/jira/browse/HIVE-15653
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
>   ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55731/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests
> 2. new unit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Re: Review Request 55731: HIVE-15653: Some ALTER TABLE commands drop table stats

Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55731/#review162357
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java (line 3645)
<https://reviews.apache.org/r/55731/#comment233713>

    If it is for user update stats, you will have     StatsSetupConst.STATS_GENERATED = StatsSetupConst.USER automatically. Thus it is not necessary to have/call hasStatsInParameters function.



ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java (line 3655)
<https://reviews.apache.org/r/55731/#comment233714>

    The same as above



ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java (line 3657)
<https://reviews.apache.org/r/55731/#comment233715>

    And, could u just remove the property rather than set it to false? We treat the missing of the property the same as set it to false.



ql/src/test/queries/clientpositive/alter_table_stats_status.q (line 1)
<https://reviews.apache.org/r/55731/#comment233718>

    Could u add some more test cases for partition stats? Thanks.


- pengcheng xiong


On Jan. 19, 2017, 10:29 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55731/
> -----------------------------------------------------------
> 
> (Updated Jan. 19, 2017, 10:29 p.m.)
> 
> 
> Review request for hive and pengcheng xiong.
> 
> 
> Bugs: HIVE-15653
>     https://issues.apache.org/jira/browse/HIVE-15653
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For most of alter table operations like table rename, add columns, change column type etc (besides the set table properties), the table stats status should not change. But for some other operations like update statistics, change location, the basic stats status should change.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java a1fb874 
>   ql/src/test/queries/clientpositive/alter_table_stats_status.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_table_stats_status.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55731/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests
> 2. new unit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>