You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Adam Szita via Review Board <no...@reviews.apache.org> on 2017/11/02 22:02:10 UTC
Review Request 63528: HIVE-17969: Metastore to alter table in batches
of partitions when renaming table
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/
-----------------------------------------------------------
Review request for hive, Peter Vary and Barna Zsombor Klara.
Bugs: HIVE-17969
https://issues.apache.org/jira/browse/HIVE-17969
Repository: hive-git
Description
-------
Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
Diffs
-----
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800
Diff: https://reviews.apache.org/r/63528/diff/1/
Testing
-------
Thanks,
Adam Szita
Re: Review Request 63528: HIVE-17969: Metastore to alter table in
batches of partitions when renaming table
Posted by Adam Szita via Review Board <no...@reviews.apache.org>.
> On Nov. 6, 2017, 6:51 p.m., Vihang Karajgaonkar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
> > Lines 263-264 (original), 275-278 (patched)
> > <https://reviews.apache.org/r/63528/diff/1/?file=1879637#file1879637line275>
> >
> > Shouldn't the partValues contain only the values from partBatch here?
Good catch, thanks!
- Adam
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review190193
-----------------------------------------------------------
On Nov. 7, 2017, 10:22 a.m., Adam Szita wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
>
> (Updated Nov. 7, 2017, 10:22 a.m.)
>
>
> Review request for hive, Peter Vary and Barna Zsombor Klara.
>
>
> Bugs: HIVE-17969
> https://issues.apache.org/jira/browse/HIVE-17969
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
>
>
> Diffs
> -----
>
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 7334a0c9fa87b1fe5b4f6c9d2073a477bc0f11ad
>
>
> Diff: https://reviews.apache.org/r/63528/diff/2/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Adam Szita
>
>
Re: Review Request 63528: HIVE-17969: Metastore to alter table in
batches of partitions when renaming table
Posted by Vihang Karajgaonkar via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review190193
-----------------------------------------------------------
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
Lines 263-264 (original), 275-278 (patched)
<https://reviews.apache.org/r/63528/#comment267477>
Shouldn't the partValues contain only the values from partBatch here?
- Vihang Karajgaonkar
On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
>
> (Updated Nov. 2, 2017, 10:02 p.m.)
>
>
> Review request for hive, Peter Vary and Barna Zsombor Klara.
>
>
> Bugs: HIVE-17969
> https://issues.apache.org/jira/browse/HIVE-17969
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
>
>
> Diffs
> -----
>
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800
>
>
> Diff: https://reviews.apache.org/r/63528/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Adam Szita
>
>
Re: Review Request 63528: HIVE-17969: Metastore to alter table in
batches of partitions when renaming table
Posted by Barna Zsombor Klara <kz...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review190166
-----------------------------------------------------------
Ship it!
Ship It!
- Barna Zsombor Klara
On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
>
> (Updated Nov. 2, 2017, 10:02 p.m.)
>
>
> Review request for hive, Peter Vary and Barna Zsombor Klara.
>
>
> Bugs: HIVE-17969
> https://issues.apache.org/jira/browse/HIVE-17969
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
>
>
> Diffs
> -----
>
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800
>
>
> Diff: https://reviews.apache.org/r/63528/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Adam Szita
>
>
Re: Review Request 63528: HIVE-17969: Metastore to alter table in
batches of partitions when renaming table
Posted by Adam Szita via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/
-----------------------------------------------------------
(Updated Nov. 7, 2017, 10:22 a.m.)
Review request for hive, Peter Vary and Barna Zsombor Klara.
Changes
-------
Adding missing null-check. Correcting issue pointed out by Vihang
Bugs: HIVE-17969
https://issues.apache.org/jira/browse/HIVE-17969
Repository: hive-git
Description
-------
Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
Diffs (updated)
-----
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 7334a0c9fa87b1fe5b4f6c9d2073a477bc0f11ad
Diff: https://reviews.apache.org/r/63528/diff/2/
Changes: https://reviews.apache.org/r/63528/diff/1-2/
Testing
-------
Thanks,
Adam Szita
Re: Review Request 63528: HIVE-17969: Metastore to alter table in
batches of partitions when renaming table
Posted by Peter Vary via Review Board <no...@reviews.apache.org>.
> On Nov. 2, 2017, 10:40 p.m., Peter Vary wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
> > Lines 265 (patched)
> > <https://reviews.apache.org/r/63528/diff/1/?file=1879637#file1879637line265>
> >
> > Shouldn't we use
> > parts = msdb.getPartitions(dbname, name, partitionBatchSize);
> > instead
> > parts = msdb.getPartitions(dbname, name, -1);
> >
> > This way we will load only the required amount of partitions, instead of all.
>
> Adam Szita wrote:
> Although msdb.getPartitions(dbname, name, partitionBatchSize) only loads partitionBatchSize amount of partitions, it will always return the same set of partitions. This is used in dropPartitions and it's not a problem there as we're also removing the requested set, so it is guaranteed to get new ones in the next call. For our use case here it is sadly not feasable to use this call. Taking a look at RawStore class I can see no iterative way to query the partitions from the DB behind HMS.
Still think it would be useful, to have the possibility to load only partial partition list into memory to decrease memory load. Shall we open a follow-up jira for this?
Thanks,
Peter
- Peter
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review189979
-----------------------------------------------------------
On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
>
> (Updated Nov. 2, 2017, 10:02 p.m.)
>
>
> Review request for hive, Peter Vary and Barna Zsombor Klara.
>
>
> Bugs: HIVE-17969
> https://issues.apache.org/jira/browse/HIVE-17969
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
>
>
> Diffs
> -----
>
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800
>
>
> Diff: https://reviews.apache.org/r/63528/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Adam Szita
>
>
Re: Review Request 63528: HIVE-17969: Metastore to alter table in
batches of partitions when renaming table
Posted by Adam Szita via Review Board <no...@reviews.apache.org>.
> On Nov. 2, 2017, 10:40 p.m., Peter Vary wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
> > Lines 265 (patched)
> > <https://reviews.apache.org/r/63528/diff/1/?file=1879637#file1879637line265>
> >
> > Shouldn't we use
> > parts = msdb.getPartitions(dbname, name, partitionBatchSize);
> > instead
> > parts = msdb.getPartitions(dbname, name, -1);
> >
> > This way we will load only the required amount of partitions, instead of all.
Although msdb.getPartitions(dbname, name, partitionBatchSize) only loads partitionBatchSize amount of partitions, it will always return the same set of partitions. This is used in dropPartitions and it's not a problem there as we're also removing the requested set, so it is guaranteed to get new ones in the next call. For our use case here it is sadly not feasable to use this call. Taking a look at RawStore class I can see no iterative way to query the partitions from the DB behind HMS.
- Adam
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review189979
-----------------------------------------------------------
On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
>
> (Updated Nov. 2, 2017, 10:02 p.m.)
>
>
> Review request for hive, Peter Vary and Barna Zsombor Klara.
>
>
> Bugs: HIVE-17969
> https://issues.apache.org/jira/browse/HIVE-17969
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
>
>
> Diffs
> -----
>
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800
>
>
> Diff: https://reviews.apache.org/r/63528/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Adam Szita
>
>
Re: Review Request 63528: HIVE-17969: Metastore to alter table in
batches of partitions when renaming table
Posted by Adam Szita via Review Board <no...@reviews.apache.org>.
> On Nov. 2, 2017, 10:40 p.m., Peter Vary wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
> > Lines 265 (patched)
> > <https://reviews.apache.org/r/63528/diff/1/?file=1879637#file1879637line265>
> >
> > Shouldn't we use
> > parts = msdb.getPartitions(dbname, name, partitionBatchSize);
> > instead
> > parts = msdb.getPartitions(dbname, name, -1);
> >
> > This way we will load only the required amount of partitions, instead of all.
>
> Adam Szita wrote:
> Although msdb.getPartitions(dbname, name, partitionBatchSize) only loads partitionBatchSize amount of partitions, it will always return the same set of partitions. This is used in dropPartitions and it's not a problem there as we're also removing the requested set, so it is guaranteed to get new ones in the next call. For our use case here it is sadly not feasable to use this call. Taking a look at RawStore class I can see no iterative way to query the partitions from the DB behind HMS.
>
> Peter Vary wrote:
> Still think it would be useful, to have the possibility to load only partial partition list into memory to decrease memory load. Shall we open a follow-up jira for this?
> Thanks,
> Peter
Looks like a fine idea, I've opened HIVE-17987 for this one.
- Adam
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review189979
-----------------------------------------------------------
On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
>
> (Updated Nov. 2, 2017, 10:02 p.m.)
>
>
> Review request for hive, Peter Vary and Barna Zsombor Klara.
>
>
> Bugs: HIVE-17969
> https://issues.apache.org/jira/browse/HIVE-17969
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
>
>
> Diffs
> -----
>
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800
>
>
> Diff: https://reviews.apache.org/r/63528/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Adam Szita
>
>
Re: Review Request 63528: HIVE-17969: Metastore to alter table in
batches of partitions when renaming table
Posted by Peter Vary via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63528/#review189979
-----------------------------------------------------------
Looks good. One question only.
Thanks, Peter
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
Lines 265 (patched)
<https://reviews.apache.org/r/63528/#comment267235>
Shouldn't we use
parts = msdb.getPartitions(dbname, name, partitionBatchSize);
instead
parts = msdb.getPartitions(dbname, name, -1);
This way we will load only the required amount of partitions, instead of all.
- Peter Vary
On Nov. 2, 2017, 10:02 p.m., Adam Szita wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63528/
> -----------------------------------------------------------
>
> (Updated Nov. 2, 2017, 10:02 p.m.)
>
>
> Review request for hive, Peter Vary and Barna Zsombor Klara.
>
>
> Bugs: HIVE-17969
> https://issues.apache.org/jira/browse/HIVE-17969
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Refactoring alter table code to use batching of partitions when calling the heavy removeUnusedColumnDescriptor method
>
>
> Diffs
> -----
>
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java ccadac1ada6aaae884ab39f5d99e91b8c542404e
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 62801c53853dbafb7c425cff943ec819dcee4800
>
>
> Diff: https://reviews.apache.org/r/63528/diff/1/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Adam Szita
>
>