You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Peter Vary via Review Board <no...@reviews.apache.org> on 2018/05/29 10:53:37 UTC

Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/
-----------------------------------------------------------

Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.


Bugs: HIVE-19718
    https://issues.apache.org/jira/browse/HIVE-19718


Repository: hive-git


Description
-------

Various optimization for addPartitions call:
- Push down table object to convertToMPart
- Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
- The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.


Diffs
-----

  itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 3d6fda6 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java c1d25db 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 13ccdb1 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java ce7d286 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java b223920 
  standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java f6899be 
  standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 98a85cc 


Diff: https://reviews.apache.org/r/67351/diff/1/


Testing
-------

Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime


Thanks,

Peter Vary


Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

Posted by Peter Vary via Review Board <no...@reviews.apache.org>.

> On June 4, 2018, 6:21 p.m., Vihang Karajgaonkar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2411-2416 (original)
> > <https://reviews.apache.org/r/67351/diff/1/?file=2031431#file2031431line2412>
> >
> >     Why do we need to remove these lines?

Good point.
Added openTransaction back.
Thanks for noticing


> On June 4, 2018, 6:21 p.m., Vihang Karajgaonkar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2422 (original)
> > <https://reviews.apache.org/r/67351/diff/1/?file=2031431#file2031431line2423>
> >
> >     Why do we need to remove this line?

Good point.
Added commitTransaction back.
Thanks for noticing


> On June 4, 2018, 6:21 p.m., Vihang Karajgaonkar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2426 (patched)
> > <https://reviews.apache.org/r/67351/diff/1/?file=2031431#file2031431line2433>
> >
> >     Adding a javadoc would be great. esp. mentioning that the advantage of using this method and when its better to use it.

Added javadoc


- Peter


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204262
-----------------------------------------------------------


On June 5, 2018, 7:54 a.m., Peter Vary wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> -----------------------------------------------------------
> 
> (Updated June 5, 2018, 7:54 a.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19718
>     https://issues.apache.org/jira/browse/HIVE-19718
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
> 
> 
> Diffs
> -----
> 
>   itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d8b8414 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4 
> 
> 
> Diff: https://reviews.apache.org/r/67351/diff/2/
> 
> 
> Testing
> -------
> 
> Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
> 
> 
> Thanks,
> 
> Peter Vary
> 
>


Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

Posted by Vihang Karajgaonkar via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204262
-----------------------------------------------------------




standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2411-2416 (original)
<https://reviews.apache.org/r/67351/#comment286661>

    Why do we need to remove these lines?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2422 (original)
<https://reviews.apache.org/r/67351/#comment286662>

    Why do we need to remove this line?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2426 (patched)
<https://reviews.apache.org/r/67351/#comment286663>

    Adding a javadoc would be great. esp. mentioning that the advantage of using this method and when its better to use it.


- Vihang Karajgaonkar


On May 29, 2018, 10:53 a.m., Peter Vary wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> -----------------------------------------------------------
> 
> (Updated May 29, 2018, 10:53 a.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19718
>     https://issues.apache.org/jira/browse/HIVE-19718
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
> 
> 
> Diffs
> -----
> 
>   itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 3d6fda6 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java c1d25db 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java 13ccdb1 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java ce7d286 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java b223920 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java f6899be 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 98a85cc 
> 
> 
> Diff: https://reviews.apache.org/r/67351/diff/1/
> 
> 
> Testing
> -------
> 
> Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
> 
> 
> Thanks,
> 
> Peter Vary
> 
>


Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

Posted by Peter Vary via Review Board <no...@reviews.apache.org>.

> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> >

Thanks for the review Sasha!

Dropped the ones I think are should not be part of this change, but feel free to reopen them if you think otherwise.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2417 (original), 2428 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2428>
> >
> >     Do we need a version of this that can be called when we already have the MTable object?

Yes. There are several public methods, which call the original version and the table object is not available at hand in the caller method.
To keep the change manageable changed only where the addPartitions benefited from the it.
We can file a follow-up jira to push down the partition keys on other places too.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2428 (original), 2434 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2440>
> >
> >     Note that this may throw an exception which you need to catch and rollback your transaction. SO immediately after openTransaction you need a try-finally block.

Is it good practice to handle RunTimeException-s as well? Looking at the getMTable definition, it should not throw any exception by design. It will return null, if the table is not found.
Anyway it feels more correct if we handle open/commit/rollback in one place, so I added it.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2430 (original)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2442>
> >
> >     missing commitTransaction()

It is there after getting the partitions. The goal is here to make sure that getting the table info, and getting the partition info is done in one transaction


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2467 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2474>
> >
> >     Strictly speaking we do not need openTransaction() inside try block

I am not sure what is the general usage pattern for transactions and exceptions in ObjectStore.
If the query fails, I think it is a good idea to roll back the whole transaction, including the enclosing ones. At least this is the general behaviour I have seen.
Feel free to reopen if you disagree.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2446 (original), 2477 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2484>
> >
> >     The rest can be done outside of try block - we already committed the transaction.

Keeping this part of the code unchanged helps with backports a lot - As a general rule I try to avoid changes which does not have immediate impact.
Refactoring the code outside the try/catch makes the code would be more readeable, but if it does not have other serious impact, I would be compelled to keep it as-is.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2449 (original), 2480 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2487>
> >
> >     Can we do this with setUnique(true) instead?

I think this part of the code has to do with the comment:
      // We need to compare partition name with requested name since some DBs
      // (like MySQL, Derby) considers 'a' = 'a ' whereas others like (Postgres,
      // Oracle) doesn't exhibit this problem.
      
So if we want to do this change then it should be in a separate jira.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2456 (original), 2487 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2494>
> >
> >     Can we just return mpart here?

Keeping this part of the code unchanged helps with backports a lot - As a general rule I try to avoid changes which does not have immediate impact.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2535 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2542>
> >
> >     Can you document this behavior?

Sure! Thanks! :)


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2487 (original), 2538 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line2545>
> >
> >     Can you document this behavior?

Sure! Thanks! :)


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 9120 (original), 9171 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035170#file2035170line9178>
> >
> >     I'm wondering whether we can make things faster a bit by using the fact that we only need to know whether partition exists and don't care about fetching any actual values from partitions - we can have a potentially faster query that just checks for partition existence.

This is mostly used to make sure that the partition does not exists. So usually we return null here which means we do not really fetching any partition data.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
> > Line 331 (original), 331 (patched)
> > <https://reviews.apache.org/r/67351/diff/2/?file=2035171#file2035171line331>
> >
> >     Please update Javadoc as well

Good catch!

Thanks for the review!!!
Peter


- Peter


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204338
-----------------------------------------------------------


On June 5, 2018, 7:54 a.m., Peter Vary wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> -----------------------------------------------------------
> 
> (Updated June 5, 2018, 7:54 a.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19718
>     https://issues.apache.org/jira/browse/HIVE-19718
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
> 
> 
> Diffs
> -----
> 
>   itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d8b8414 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4 
> 
> 
> Diff: https://reviews.apache.org/r/67351/diff/2/
> 
> 
> Testing
> -------
> 
> Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
> 
> 
> Thanks,
> 
> Peter Vary
> 
>


Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

Posted by Alexander Kolbasov <ak...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204338
-----------------------------------------------------------




standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2417 (original), 2428 (patched)
<https://reviews.apache.org/r/67351/#comment286787>

    Do we need a version of this that can be called when we already have the MTable object?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2428 (original), 2434 (patched)
<https://reviews.apache.org/r/67351/#comment286783>

    Note that this may throw an exception which you need to catch and rollback your transaction. SO immediately after openTransaction you need a try-finally block.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2430 (original)
<https://reviews.apache.org/r/67351/#comment286784>

    missing commitTransaction()



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2467 (patched)
<https://reviews.apache.org/r/67351/#comment286788>

    Strictly speaking we do not need openTransaction() inside try block



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2446 (original), 2477 (patched)
<https://reviews.apache.org/r/67351/#comment286791>

    The rest can be done outside of try block - we already committed the transaction.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2449 (original), 2480 (patched)
<https://reviews.apache.org/r/67351/#comment286794>

    Can we do this with setUnique(true) instead?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2456 (original), 2487 (patched)
<https://reviews.apache.org/r/67351/#comment286796>

    Can we just return mpart here?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2535 (patched)
<https://reviews.apache.org/r/67351/#comment286798>

    Can you document this behavior?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2487 (original), 2538 (patched)
<https://reviews.apache.org/r/67351/#comment286799>

    Can you document this behavior?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 9120 (original), 9171 (patched)
<https://reviews.apache.org/r/67351/#comment286800>

    I'm wondering whether we can make things faster a bit by using the fact that we only need to know whether partition exists and don't care about fetching any actual values from partitions - we can have a potentially faster query that just checks for partition existence.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
Line 331 (original), 331 (patched)
<https://reviews.apache.org/r/67351/#comment286779>

    Please update Javadoc as well


- Alexander Kolbasov


On June 5, 2018, 7:54 a.m., Peter Vary wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> -----------------------------------------------------------
> 
> (Updated June 5, 2018, 7:54 a.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19718
>     https://issues.apache.org/jira/browse/HIVE-19718
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
> 
> 
> Diffs
> -----
> 
>   itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d8b8414 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4 
> 
> 
> Diff: https://reviews.apache.org/r/67351/diff/2/
> 
> 
> Testing
> -------
> 
> Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
> 
> 
> Thanks,
> 
> Peter Vary
> 
>


Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

Posted by Vihang Karajgaonkar via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204697
-----------------------------------------------------------


Ship it!




Ship It!

- Vihang Karajgaonkar


On June 6, 2018, 11:05 a.m., Peter Vary wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> -----------------------------------------------------------
> 
> (Updated June 6, 2018, 11:05 a.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19718
>     https://issues.apache.org/jira/browse/HIVE-19718
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.
> 
> 
> Diffs
> -----
> 
>   itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 77ed2b4 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4 
> 
> 
> Diff: https://reviews.apache.org/r/67351/diff/3/
> 
> 
> Testing
> -------
> 
> Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime
> 
> 
> Thanks,
> 
> Peter Vary
> 
>


Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

Posted by Peter Vary via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/
-----------------------------------------------------------

(Updated June 6, 2018, 11:05 a.m.)


Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.


Changes
-------

Updated the diff based on Sasha's comments


Bugs: HIVE-19718
    https://issues.apache.org/jira/browse/HIVE-19718


Repository: hive-git


Description
-------

Various optimization for addPartitions call:
- Push down table object to convertToMPart
- Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
- The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.


Diffs (updated)
-----

  itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 77ed2b4 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72 
  standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e 
  standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4 


Diff: https://reviews.apache.org/r/67351/diff/3/

Changes: https://reviews.apache.org/r/67351/diff/2-3/


Testing
-------

Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime


Thanks,

Peter Vary


Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

Posted by Peter Vary via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/
-----------------------------------------------------------

(Updated June 5, 2018, 7:54 a.m.)


Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.


Changes
-------

Addressed Vihang's comment, and fixed test failures:
- Added transaction to the original getMPartition method
- Added javadoc
- Removed accidentally added extra line from HiveMetaStore.java


Bugs: HIVE-19718
    https://issues.apache.org/jira/browse/HIVE-19718


Repository: hive-git


Description
-------

Various optimization for addPartitions call:
- Push down table object to convertToMPart
- Push down partitionKeys to startAddPartition -> doesPartitionExist -> getMPartition, so it does not have to query the table object for every time if we add multiple partitions for the same table
- The original getMPartition used to query the table every time. Created a new version of getMPartition, which can use the provided partitionKeys instead of querying it again.


Diffs (updated)
-----

  itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java 0cc0ae5 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java d8b8414 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b15d89d 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java 283798c 
  standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 9da8d72 
  standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 0461c4e 
  standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java b71eda4 


Diff: https://reviews.apache.org/r/67351/diff/2/

Changes: https://reviews.apache.org/r/67351/diff/1-2/


Testing
-------

Run several performance tests with Sasha's performance tool. These optimisations shave of ~10% of the runtime


Thanks,

Peter Vary