You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "BELUGA BEHR (JIRA)" <ji...@apache.org> on 2018/03/15 14:46:00 UTC

[jira] [Comment Edited] (HIVE-16879) Improve Cache Key

    [ https://issues.apache.org/jira/browse/HIVE-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400493#comment-16400493 ] 

BELUGA BEHR edited comment on HIVE-16879 at 3/15/18 2:45 PM:
-------------------------------------------------------------

[~misha@cloudera.com] I will see if I can't pull up some information...

_Edit:_ I submitted my comment accidentally.  Re-writing now.

The idea here is that, when we're storing keys, we're storing...
 
||database||table||column||
|default|mytable|column1|
|default|mytable|column2|
|default|mytable|column3|
|default|mytable|column4|
|default|mytable|column5|
|default|user|first_name|
|default|user|last_name|
|default|user|age|
|default|user|creation_date|

So, what we can see here is that most of the variability of a key is in the column name, so when we're checking for key equality, we should start by comparing column names.  If we start with comparing database names, we spend "a lot" (relative) of time comparing the same string, again and again.  In terms of caching, we can say that same thing, it is not worthwhile to cache the column name, because there are unlikely to be many duplicates.  However, the database and tables names are likely to be duplicated many times and therefore could be cached.

 


was (Author: belugabehr):
[~misha@cloudera.com] I will see if I can't pull up some information...

 

The idea here is that, when we're storing keys, we're storing...

 
||database||table||column||
|default|mytable|column1|
|default|mytable|column1|
|default|mytable|column1|

 

 

> Improve Cache Key
> -----------------
>
>                 Key: HIVE-16879
>                 URL: https://issues.apache.org/jira/browse/HIVE-16879
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 3.0.0
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Trivial
>         Attachments: HIVE-16879.1.patch, HIVE-16879.2.patch
>
>
> Improve cache key for cache implemented in {{org.apache.hadoop.hive.metastore.AggregateStatsCache}}.
> # Cache some of the key components themselves (db name, table name) using {{String}} intern method to conserve memory for repeated keys, to improve {{equals}} method as now references can be used for equality, and hashcodes will be cached as well as per {{String}} clash hashcode method.
> # Upgrade _debug_ logging to not generate text unless required
> # Changed _equals_ method to check first for the item most likely to be different, column name



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)