You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Vaibhav Gumashta <vg...@hortonworks.com> on 2015/04/20 20:44:37 UTC

Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-10382
    https://issues.apache.org/jira/browse/HIVE-10382


Repository: hive-git


Description
-------

Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains.


Diffs
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
  common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
  common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
  common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION 
  common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 
  metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
  ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab 
  ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e 

Diff: https://reviews.apache.org/r/33367/diff/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

Posted by Thejas Nair <th...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/#review81119
-----------------------------------------------------------



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131370>

    this is not being set to false, which means the cleaner would run only once.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131371>

    For tracking how the cache is performing, would be useful to have an INFO level message about how many entries were there and how many were removed due to expiry and if the eviction based on LRU is going to be triggered.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131372>

    evicting one LRU node at a time is expensive.
    I think we should just reduce the TTL to 0.9*TTL , 0.8*TTL etc and call this function again. Can be done in a follow up jira.
    
    Ideally, in the long term, we should think of using both the frequency of use and cost of re-computing the stats while deciding which ones to evict.


- Thejas Nair


On April 20, 2015, 6:44 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33367/
> -----------------------------------------------------------
> 
> (Updated April 20, 2015, 6:44 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-10382
>     https://issues.apache.org/jira/browse/HIVE-10382
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
>   common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
>   common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e 
> 
> Diff: https://reviews.apache.org/r/33367/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

Posted by Thejas Nair <th...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/#review81714
-----------------------------------------------------------

Ship it!


Ship It!

- Thejas Nair


On April 27, 2015, 6:34 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33367/
> -----------------------------------------------------------
> 
> (Updated April 27, 2015, 6:34 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-10382
>     https://issues.apache.org/jira/browse/HIVE-10382
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
>   common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
>   common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e 
>   ql/src/test/queries/clientpositive/annotate_stats_part.q fcfe566 
> 
> Diff: https://reviews.apache.org/r/33367/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/
-----------------------------------------------------------

(Updated April 27, 2015, 6:34 p.m.)


Review request for hive.


Bugs: HIVE-10382
    https://issues.apache.org/jira/browse/HIVE-10382


Repository: hive-git


Description
-------

Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
  common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
  common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
  common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION 
  common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 
  metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
  ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab 
  ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e 
  ql/src/test/queries/clientpositive/annotate_stats_part.q fcfe566 

Diff: https://reviews.apache.org/r/33367/diff/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/
-----------------------------------------------------------

(Updated April 26, 2015, 8:44 p.m.)


Review request for hive.


Bugs: HIVE-10382
    https://issues.apache.org/jira/browse/HIVE-10382


Repository: hive-git


Description
-------

Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
  common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
  common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
  common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION 
  common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 
  metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
  ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab 
  ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e 

Diff: https://reviews.apache.org/r/33367/diff/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

Posted by Mostafa Mokhtar <mm...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/#review81170
-----------------------------------------------------------



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131427>

    Can you keeep track of hit ratio?



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131425>

    Can you add logging on how many nodes where cleaned in how long.
    Also print number of nodes before/after


- Mostafa Mokhtar


On April 20, 2015, 6:44 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33367/
> -----------------------------------------------------------
> 
> (Updated April 20, 2015, 6:44 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-10382
>     https://issues.apache.org/jira/browse/HIVE-10382
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
>   common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
>   common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e 
> 
> Diff: https://reviews.apache.org/r/33367/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

Posted by Mostafa Mokhtar <mm...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/#review81127
-----------------------------------------------------------



common/src/java/org/apache/hive/common/util/BloomFilter.java
<https://reviews.apache.org/r/33367/#comment131382>

    There already is a BloomFilter implementaion in org.apache.hadoop.hive.ql.io.filters.
    Can this be used instead of the new one?


- Mostafa Mokhtar


On April 20, 2015, 6:44 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33367/
> -----------------------------------------------------------
> 
> (Updated April 20, 2015, 6:44 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-10382
>     https://issues.apache.org/jira/browse/HIVE-10382
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
>   common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
>   common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e 
> 
> Diff: https://reviews.apache.org/r/33367/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

Posted by Thejas Nair <th...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/#review81122
-----------------------------------------------------------



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131375>

    i would expect this map entry to be non null for vast majority of this call done from here.
    Creating a new (arraylist) object is usually considered expensive, so i feel it would better to error on the side of having to make an additional put call.



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
<https://reviews.apache.org/r/33367/#comment131379>

    looking up in the db separately for each column will be expensive. We should make a single call for columns not in cache and then create the aggregated results.
    
    Again a perf improvement that we can track in a followup jira.


- Thejas Nair


On April 20, 2015, 6:44 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33367/
> -----------------------------------------------------------
> 
> (Updated April 20, 2015, 6:44 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-10382
>     https://issues.apache.org/jira/browse/HIVE-10382
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
>   common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
>   common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e 
> 
> Diff: https://reviews.apache.org/r/33367/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

Posted by Thejas Nair <th...@hortonworks.com>.

> On April 22, 2015, 3:47 a.m., Thejas Nair wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java, line 489
> > <https://reviews.apache.org/r/33367/diff/1/?file=935572#file935572line489>
> >
> >     this is not being used anywhere, can be removed.

never mind, i see its used in evictOne


- Thejas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/#review81115
-----------------------------------------------------------


On April 20, 2015, 6:44 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33367/
> -----------------------------------------------------------
> 
> (Updated April 20, 2015, 6:44 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-10382
>     https://issues.apache.org/jira/browse/HIVE-10382
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
>   common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
>   common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e 
> 
> Diff: https://reviews.apache.org/r/33367/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 33367: Aggregate stats cache for RDBMS based metastore codepath

Posted by Thejas Nair <th...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33367/#review81115
-----------------------------------------------------------



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131359>

    I don't think this comment is applicable.
    
    From http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/package-summary.html -
    
    The memory effects for accesses and updates of atomics generally follow the rules for volatiles, as stated in section 17.4 of The Java™ Language Specification.
    
        get has the memory effects of reading a volatile variable.
        set has the memory effects of writing (assigning) a volatile variable.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131364>

    I don't think we really need the locks at the level of candidate list, it can be made a more finer lock by using ConcurrentLinkedQueue or something similar.
    The only mutable part of AggrColStatsCached is already stored in a volatile member.
    
    Can you please open a follow up jira to explore that ?



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131360>

    match has to be null if this exception is thrown. (unnecessary also is unintuitive.)



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131361>

    can we just skip these instead of adding them as potential candidates ?



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131362>

    as discussed offline, there is potential for improving the performance here by avoiding two loops. That can be done in a follow up jira.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131358>

    spawnCleaner() or startCleaner() might be a better name.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131363>

    we should give this thread a name (for ease of debugging).



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131367>

    this is not being used anywhere, can be removed.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131368>

    the name of this class is too similar to the outer class. I feel it would better to name it just AggrColStats or so



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131366>

    the time is already being updated from findBestMatch, so this isn't necessary.



metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java
<https://reviews.apache.org/r/33367/#comment131365>

    lets removed these unused classes.


- Thejas Nair


On April 20, 2015, 6:44 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33367/
> -----------------------------------------------------------
> 
> (Updated April 20, 2015, 6:44 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-10382
>     https://issues.apache.org/jira/browse/HIVE-10382
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Similar to the work done on the HBase branch (HIVE-9693), the stats cache can potentially have performance gains.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65ec1b9 
>   common/src/java/org/apache/hive/common/util/BloomFilter.java PRE-CREATION 
>   common/src/java/org/apache/hive/common/util/Murmur3.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestBloomFilter.java PRE-CREATION 
>   common/src/test/org/apache/hive/common/util/TestMurmur3.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/AggregateStatsCache.java PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java bf169c9 
>   metastore/src/test/org/apache/hadoop/hive/metastore/TestAggregateStatsCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilter.java 6ab0270 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/BloomFilterIO.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/filters/Murmur3.java e733892 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 7bfd781 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 49a8e80 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java bde9fc2 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java a319204 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestBloomFilter.java 32b95ab 
>   ql/src/test/org/apache/hadoop/hive/ql/io/filters/TestMurmur3.java d92a3ce 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java d0f3a5e 
> 
> Diff: https://reviews.apache.org/r/33367/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>