You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Adam Szita via Review Board <no...@reviews.apache.org> on 2017/11/02 22:02:10 UTC

Review Request 63528: HIVE-17969: Metastore to alter table in batches of partitions when renaming table

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/
-----------------------------------------------------------

Review request for hive, Peter Vary and Barna Zsombor Klara.


Bugs: HIVE-17969
    https://issues.apache.org/jira/browse/HIVE-17969


Repository: hive-git


Description
-------

Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method


Diffs
-----

  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800 


Diff: https://reviews.apache.org/r/63528/diff/1/


Testing
-------


Thanks,

Adam Szita


Re: Review Request 63528: HIVE-17969: Metastore to alter table in batches of partitions when renaming table

Posted by Adam Szita via Review Board <no...@reviews.apache.org>.

> On Nov. 6, 2017, 6:51 p.m., Vihang Karajgaonkar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
> > Lines 263-264 (original), 275-278 (patched)
> > <https://reviews.apache.org/r/63528/diff/1/?file=1879637#file1879637line275>
> >
> >     Shouldn't the partValues contain only the values from partBatch here?

Good catch, thanks!


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review190193
-----------------------------------------------------------


On Nov. 7, 2017, 10:22 a.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2017, 10:22 a.m.)
> 
> 
> Review request for hive, Peter Vary and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-17969
>     https://issues.apache.org/jira/browse/HIVE-17969
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
> 
> 
> Diffs
> -----
> 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 7334a0c9fa87b1fe5b4f6c9d2073a477bc0f11ad 
> 
> 
> Diff: https://reviews.apache.org/r/63528/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Adam Szita
> 
>


Re: Review Request 63528: HIVE-17969: Metastore to alter table in batches of partitions when renaming table

Posted by Vihang Karajgaonkar via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review190193
-----------------------------------------------------------




standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
Lines 263-264 (original), 275-278 (patched)
<https://reviews.apache.org/r/63528/#comment267477>

    Shouldn't the partValues contain only the values from partBatch here?


- Vihang Karajgaonkar


On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2017, 10:02 p.m.)
> 
> 
> Review request for hive, Peter Vary and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-17969
>     https://issues.apache.org/jira/browse/HIVE-17969
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
> 
> 
> Diffs
> -----
> 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800 
> 
> 
> Diff: https://reviews.apache.org/r/63528/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Adam Szita
> 
>


Re: Review Request 63528: HIVE-17969: Metastore to alter table in batches of partitions when renaming table

Posted by Barna Zsombor Klara <kz...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review190166
-----------------------------------------------------------


Ship it!




Ship It!

- Barna Zsombor Klara


On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2017, 10:02 p.m.)
> 
> 
> Review request for hive, Peter Vary and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-17969
>     https://issues.apache.org/jira/browse/HIVE-17969
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
> 
> 
> Diffs
> -----
> 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800 
> 
> 
> Diff: https://reviews.apache.org/r/63528/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Adam Szita
> 
>


Re: Review Request 63528: HIVE-17969: Metastore to alter table in batches of partitions when renaming table

Posted by Adam Szita via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/
-----------------------------------------------------------

(Updated Nov. 7, 2017, 10:22 a.m.)


Review request for hive, Peter Vary and Barna Zsombor Klara.


Changes
-------

Adding missing null-check. Correcting issue pointed out by Vihang


Bugs: HIVE-17969
    https://issues.apache.org/jira/browse/HIVE-17969


Repository: hive-git


Description
-------

Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method


Diffs (updated)
-----

  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 7334a0c9fa87b1fe5b4f6c9d2073a477bc0f11ad 


Diff: https://reviews.apache.org/r/63528/diff/2/

Changes: https://reviews.apache.org/r/63528/diff/1-2/


Testing
-------


Thanks,

Adam Szita


Re: Review Request 63528: HIVE-17969: Metastore to alter table in batches of partitions when renaming table

Posted by Peter Vary via Review Board <no...@reviews.apache.org>.

> On Nov. 2, 2017, 10:40 p.m., Peter Vary wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
> > Lines 265 (patched)
> > <https://reviews.apache.org/r/63528/diff/1/?file=1879637#file1879637line265>
> >
> >     Shouldn't we use 
> >     parts = msdb.getPartitions(dbname, name, partitionBatchSize);
> >     instead
> >     parts = msdb.getPartitions(dbname, name, -1);
> >     
> >     This way we will load only the required amount of partitions, instead of all.
> 
> Adam Szita wrote:
>     Although msdb.getPartitions(dbname, name, partitionBatchSize) only loads partitionBatchSize amount of partitions, it will always return the same set of partitions. This is used in dropPartitions and it's not a problem there as we're also removing the requested set, so it is guaranteed to get new ones in the next call. For our use case here it is sadly not feasable to use this call. Taking a look at RawStore class I can see no iterative way to query the partitions from the DB behind HMS.

Still think it would be useful, to have the possibility to load only partial partition list into memory to decrease memory load. Shall we open a follow-up jira for this?
Thanks,
Peter


- Peter


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review189979
-----------------------------------------------------------


On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2017, 10:02 p.m.)
> 
> 
> Review request for hive, Peter Vary and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-17969
>     https://issues.apache.org/jira/browse/HIVE-17969
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
> 
> 
> Diffs
> -----
> 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800 
> 
> 
> Diff: https://reviews.apache.org/r/63528/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Adam Szita
> 
>


Re: Review Request 63528: HIVE-17969: Metastore to alter table in batches of partitions when renaming table

Posted by Adam Szita via Review Board <no...@reviews.apache.org>.

> On Nov. 2, 2017, 10:40 p.m., Peter Vary wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
> > Lines 265 (patched)
> > <https://reviews.apache.org/r/63528/diff/1/?file=1879637#file1879637line265>
> >
> >     Shouldn't we use 
> >     parts = msdb.getPartitions(dbname, name, partitionBatchSize);
> >     instead
> >     parts = msdb.getPartitions(dbname, name, -1);
> >     
> >     This way we will load only the required amount of partitions, instead of all.

Although msdb.getPartitions(dbname, name, partitionBatchSize) only loads partitionBatchSize amount of partitions, it will always return the same set of partitions. This is used in dropPartitions and it's not a problem there as we're also removing the requested set, so it is guaranteed to get new ones in the next call. For our use case here it is sadly not feasable to use this call. Taking a look at RawStore class I can see no iterative way to query the partitions from the DB behind HMS.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review189979
-----------------------------------------------------------


On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2017, 10:02 p.m.)
> 
> 
> Review request for hive, Peter Vary and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-17969
>     https://issues.apache.org/jira/browse/HIVE-17969
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
> 
> 
> Diffs
> -----
> 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800 
> 
> 
> Diff: https://reviews.apache.org/r/63528/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Adam Szita
> 
>


Re: Review Request 63528: HIVE-17969: Metastore to alter table in batches of partitions when renaming table

Posted by Adam Szita via Review Board <no...@reviews.apache.org>.

> On Nov. 2, 2017, 10:40 p.m., Peter Vary wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
> > Lines 265 (patched)
> > <https://reviews.apache.org/r/63528/diff/1/?file=1879637#file1879637line265>
> >
> >     Shouldn't we use 
> >     parts = msdb.getPartitions(dbname, name, partitionBatchSize);
> >     instead
> >     parts = msdb.getPartitions(dbname, name, -1);
> >     
> >     This way we will load only the required amount of partitions, instead of all.
> 
> Adam Szita wrote:
>     Although msdb.getPartitions(dbname, name, partitionBatchSize) only loads partitionBatchSize amount of partitions, it will always return the same set of partitions. This is used in dropPartitions and it's not a problem there as we're also removing the requested set, so it is guaranteed to get new ones in the next call. For our use case here it is sadly not feasable to use this call. Taking a look at RawStore class I can see no iterative way to query the partitions from the DB behind HMS.
> 
> Peter Vary wrote:
>     Still think it would be useful, to have the possibility to load only partial partition list into memory to decrease memory load. Shall we open a follow-up jira for this?
>     Thanks,
>     Peter

Looks like a fine idea, I've opened HIVE-17987 for this one.


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review189979
-----------------------------------------------------------


On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2017, 10:02 p.m.)
> 
> 
> Review request for hive, Peter Vary and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-17969
>     https://issues.apache.org/jira/browse/HIVE-17969
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
> 
> 
> Diffs
> -----
> 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800 
> 
> 
> Diff: https://reviews.apache.org/r/63528/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Adam Szita
> 
>


Re: Review Request 63528: HIVE-17969: Metastore to alter table in batches of partitions when renaming table

Posted by Peter Vary via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review189979
-----------------------------------------------------------



Looks good. One question only.
Thanks, Peter


standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
Lines 265 (patched)
<https://reviews.apache.org/r/63528/#comment267235>

    Shouldn't we use 
    parts = msdb.getPartitions(dbname, name, partitionBatchSize);
    instead
    parts = msdb.getPartitions(dbname, name, -1);
    
    This way we will load only the required amount of partitions, instead of all.


- Peter Vary


On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2017, 10:02 p.m.)
> 
> 
> Review request for hive, Peter Vary and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-17969
>     https://issues.apache.org/jira/browse/HIVE-17969
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
> 
> 
> Diffs
> -----
> 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800 
> 
> 
> Diff: https://reviews.apache.org/r/63528/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Adam Szita
> 
>