You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Peter Vary via Review Board <no...@reviews.apache.org> on 2018/05/29 10:53:37 UTC
Review Request 67351: HIVE-19718 Adding partitions in bulk also
fetches table for each partition
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/
-----------------------------------------------------------
Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
Bugs: HIVE-19718
https://issues.apache.org/jira/browse/HIVE-19718
Repository: hive-git
Description
-------
Various optimization for addPartitions call:
- Push down table object to convertToMPart
- Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
- The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
Diffs
-----
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 3d6fda6
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java c1d25db
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 13ccdb1
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java ce7d286
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java b223920
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java f6899be
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 98a85cc
Diff: https://reviews.apache.org/r/67351/diff/1/
Testing
-------
Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
Thanks,
Peter Vary
Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also
fetches table for each partition
Posted by Peter Vary via Review Board <no...@reviews.apache.org>.
> On June 4, 2018, 6:21 p.m., Vihang Karajgaonkar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2411-2416 (original)
> > <https://reviews.apache.org/r/67351/diff/1/?file=2031431#file2031431line2412>
> >
> > Why do we need to remove these lines?
Good point.
Added openTransaction back.
Thanks for noticing
> On June 4, 2018, 6:21 p.m., Vihang Karajgaonkar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2422 (original)
> > <https://reviews.apache.org/r/67351/diff/1/?file=2031431#file2031431line2423>
> >
> > Why do we need to remove this line?
Good point.
Added commitTransaction back.
Thanks for noticing
> On June 4, 2018, 6:21 p.m., Vihang Karajgaonkar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2426 (patched)
> > <https://reviews.apache.org/r/67351/diff/1/?file=2031431#file2031431line2433>
> >
> > Adding a javadoc would be great. esp. mentioning that the advantage of using this method and when its better to use it.
Added javadoc
- Peter
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204262
-----------------------------------------------------------
On June 5, 2018, 7:54 a.m., Peter Vary wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> -----------------------------------------------------------
>
> (Updated June 5, 2018, 7:54 a.m.)
>
>
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
>
>
> Bugs: HIVE-19718
> https://issues.apache.org/jira/browse/HIVE-19718
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
>
>
> Diffs
> -----
>
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d8b8414
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4
>
>
> Diff: https://reviews.apache.org/r/67351/diff/2/
>
>
> Testing
> -------
>
> Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
>
>
> Thanks,
>
> Peter Vary
>
>
Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also
fetches table for each partition
Posted by Vihang Karajgaonkar via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204262
-----------------------------------------------------------
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2411-2416 (original)
<https://reviews.apache.org/r/67351/#comment286661>
Why do we need to remove these lines?
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2422 (original)
<https://reviews.apache.org/r/67351/#comment286662>
Why do we need to remove this line?
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2426 (patched)
<https://reviews.apache.org/r/67351/#comment286663>
Adding a javadoc would be great. esp. mentioning that the advantage of using this method and when its better to use it.
- Vihang Karajgaonkar
On May 29, 2018, 10:53 a.m., Peter Vary wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> -----------------------------------------------------------
>
> (Updated May 29, 2018, 10:53 a.m.)
>
>
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
>
>
> Bugs: HIVE-19718
> https://issues.apache.org/jira/browse/HIVE-19718
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
>
>
> Diffs
> -----
>
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 3d6fda6
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java c1d25db
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 13ccdb1
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java ce7d286
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java b223920
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java f6899be
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 98a85cc
>
>
> Diff: https://reviews.apache.org/r/67351/diff/1/
>
>
> Testing
> -------
>
> Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
>
>
> Thanks,
>
> Peter Vary
>
>
Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also
fetches table for each partition
Posted by Peter Vary via Review Board <no...@reviews.apache.org>.
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> >
Thanks for the review Sasha!
Dropped the ones I think are should not be part of this change, but feel free to reopen them if you think otherwise.
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2417 (original), 2428 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2428>
> >
> > Do we need a version of this that can be called when we already have the MTable object?
Yes. There are several public methods, which call the original version and the table object is not available at hand in the caller method.
To keep the change manageable changed only where the addPartitions benefited from the it.
We can file a follow-up jira to push down the partition keys on other places too.
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2428 (original), 2434 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2440>
> >
> > Note that this may throw an exception which you need to catch and rollback your transaction. SO immediately after openTransaction you need a try-finally block.
Is it good practice to handle RunTimeException-s as well? Looking at the getMTable definition, it should not throw any exception by design. It will return null, if the table is not found.
Anyway it feels more correct if we handle open/commit/rollback in one place, so I added it.
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2430 (original)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2442>
> >
> > missing commitTransaction()
It is there after getting the partitions. The goal is here to make sure that getting the table info, and getting the partition info is done in one transaction
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2467 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2474>
> >
> > Strictly speaking we do not need openTransaction() inside try block
I am not sure what is the general usage pattern for transactions and exceptions in ObjectStore.
If the query fails, I think it is a good idea to roll back the whole transaction, including the enclosing ones. At least this is the general behaviour I have seen.
Feel free to reopen if you disagree.
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2446 (original), 2477 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2484>
> >
> > The rest can be done outside of try block - we already committed the transaction.
Keeping this part of the code unchanged helps with backports a lot - As a general rule I try to avoid changes which does not have immediate impact.
Refactoring the code outside the try/catch makes the code would be more readeable, but if it does not have other serious impact, I would be compelled to keep it as-is.
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2449 (original), 2480 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2487>
> >
> > Can we do this with setUnique(true) instead?
I think this part of the code has to do with the comment:
// We need to compare partition name with requested name since some DBs
// (like MySQL, Derby) considers 'a' = 'a ' whereas others like (Postgres,
// Oracle) doesn't exhibit this problem.
So if we want to do this change then it should be in a separate jira.
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2456 (original), 2487 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2494>
> >
> > Can we just return mpart here?
Keeping this part of the code unchanged helps with backports a lot - As a general rule I try to avoid changes which does not have immediate impact.
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2535 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2542>
> >
> > Can you document this behavior?
Sure! Thanks! :)
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2487 (original), 2538 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2545>
> >
> > Can you document this behavior?
Sure! Thanks! :)
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 9120 (original), 9171 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line9178>
> >
> > I'm wondering whether we can make things faster a bit by using the fact that we only need to know whether partition exists and don't care about fetching any actual values from partitions - we can have a potentially faster query that just checks for partition existence.
This is mostly used to make sure that the partition does not exists. So usually we return null here which means we do not really fetching any partition data.
> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
> > Line 331 (original), 331 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035171#file2035171line331>
> >
> > Please update Javadoc as well
Good catch!
Thanks for the review!!!
Peter
- Peter
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204338
-----------------------------------------------------------
On June 5, 2018, 7:54 a.m., Peter Vary wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> -----------------------------------------------------------
>
> (Updated June 5, 2018, 7:54 a.m.)
>
>
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
>
>
> Bugs: HIVE-19718
> https://issues.apache.org/jira/browse/HIVE-19718
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
>
>
> Diffs
> -----
>
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d8b8414
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4
>
>
> Diff: https://reviews.apache.org/r/67351/diff/2/
>
>
> Testing
> -------
>
> Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
>
>
> Thanks,
>
> Peter Vary
>
>
Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also
fetches table for each partition
Posted by Alexander Kolbasov <ak...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204338
-----------------------------------------------------------
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2417 (original), 2428 (patched)
<https://reviews.apache.org/r/67351/#comment286787>
Do we need a version of this that can be called when we already have the MTable object?
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2428 (original), 2434 (patched)
<https://reviews.apache.org/r/67351/#comment286783>
Note that this may throw an exception which you need to catch and rollback your transaction. SO immediately after openTransaction you need a try-finally block.
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2430 (original)
<https://reviews.apache.org/r/67351/#comment286784>
missing commitTransaction()
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2467 (patched)
<https://reviews.apache.org/r/67351/#comment286788>
Strictly speaking we do not need openTransaction() inside try block
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2446 (original), 2477 (patched)
<https://reviews.apache.org/r/67351/#comment286791>
The rest can be done outside of try block - we already committed the transaction.
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2449 (original), 2480 (patched)
<https://reviews.apache.org/r/67351/#comment286794>
Can we do this with setUnique(true) instead?
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2456 (original), 2487 (patched)
<https://reviews.apache.org/r/67351/#comment286796>
Can we just return mpart here?
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2535 (patched)
<https://reviews.apache.org/r/67351/#comment286798>
Can you document this behavior?
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2487 (original), 2538 (patched)
<https://reviews.apache.org/r/67351/#comment286799>
Can you document this behavior?
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 9120 (original), 9171 (patched)
<https://reviews.apache.org/r/67351/#comment286800>
I'm wondering whether we can make things faster a bit by using the fact that we only need to know whether partition exists and don't care about fetching any actual values from partitions - we can have a potentially faster query that just checks for partition existence.
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
Line 331 (original), 331 (patched)
<https://reviews.apache.org/r/67351/#comment286779>
Please update Javadoc as well
- Alexander Kolbasov
On June 5, 2018, 7:54 a.m., Peter Vary wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> -----------------------------------------------------------
>
> (Updated June 5, 2018, 7:54 a.m.)
>
>
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
>
>
> Bugs: HIVE-19718
> https://issues.apache.org/jira/browse/HIVE-19718
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
>
>
> Diffs
> -----
>
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d8b8414
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4
>
>
> Diff: https://reviews.apache.org/r/67351/diff/2/
>
>
> Testing
> -------
>
> Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
>
>
> Thanks,
>
> Peter Vary
>
>
Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also
fetches table for each partition
Posted by Vihang Karajgaonkar via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204697
-----------------------------------------------------------
Ship it!
Ship It!
- Vihang Karajgaonkar
On June 6, 2018, 11:05 a.m., Peter Vary wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> -----------------------------------------------------------
>
> (Updated June 6, 2018, 11:05 a.m.)
>
>
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
>
>
> Bugs: HIVE-19718
> https://issues.apache.org/jira/browse/HIVE-19718
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
>
>
> Diffs
> -----
>
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 77ed2b4
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4
>
>
> Diff: https://reviews.apache.org/r/67351/diff/3/
>
>
> Testing
> -------
>
> Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
>
>
> Thanks,
>
> Peter Vary
>
>
Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also
fetches table for each partition
Posted by Peter Vary via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/
-----------------------------------------------------------
(Updated June 6, 2018, 11:05 a.m.)
Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
Changes
-------
Updated the diff based on Sasha's comments
Bugs: HIVE-19718
https://issues.apache.org/jira/browse/HIVE-19718
Repository: hive-git
Description
-------
Various optimization for addPartitions call:
- Push down table object to convertToMPart
- Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
- The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
Diffs (updated)
-----
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 77ed2b4
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4
Diff: https://reviews.apache.org/r/67351/diff/3/
Changes: https://reviews.apache.org/r/67351/diff/2-3/
Testing
-------
Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
Thanks,
Peter Vary
Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also
fetches table for each partition
Posted by Peter Vary via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/
-----------------------------------------------------------
(Updated June 5, 2018, 7:54 a.m.)
Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
Changes
-------
Addressed Vihang's comment, and fixed test failures:
- Added transaction to the original getMPartition method
- Added javadoc
- Removed accidentally added extra line from HiveMetaStore.java
Bugs: HIVE-19718
https://issues.apache.org/jira/browse/HIVE-19718
Repository: hive-git
Description
-------
Various optimization for addPartitions call:
- Push down table object to convertToMPart
- Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
- The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
Diffs (updated)
-----
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d8b8414
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4
Diff: https://reviews.apache.org/r/67351/diff/2/
Changes: https://reviews.apache.org/r/67351/diff/1-2/
Testing
-------
Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
Thanks,
Peter Vary