You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/08/21 16:40:00 UTC

[jira] [Commented] (IMPALA-7437) Simple granular caching of partition metadata in impalad

    [ https://issues.apache.org/jira/browse/IMPALA-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587707#comment-16587707 ] 

ASF subversion and git services commented on IMPALA-7437:
---------------------------------------------------------

Commit 3fa05604aca2d8f65b3ded4950df8f38fffe43d5 in impala's branch refs/heads/master from [~tlipcon]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=3fa0560 ]

IMPALA-7437. LRU caching of partitions in impalad

This changes the CatalogdMetaProvider to use a Guava-based LRU cache.
The eviction strategy is currently time-based (1 hour), and it only
performs caching of some basic items like partition information, the
null-partition-key-value, and table column statistics. It does not
cache the table entries themselves, which means that we don't need to do
any invalidation propagation via the statestore quite yet. Instead,
every query will do an initial fetch of the table metadata in order to
know the current version number. That version number is then used as
part of the cache key for all further metadata, so when the version
number changes, all of the prior cache entries become "unreachable" and
effectively evicted.

Initially, I attempted to implement this by adding a new MetaProvider
implementation that would transparently wrap another MetaProvider
implementation (either catalogd-based or direct-from-source). However, I
found that I wanted to use catalogd-based implementation details like
the version number in the cache key, and trying to abstract this behind
an interface wasn't very clear. So, I elected to just embed the caching
logic into the CatalogdMetaProvider itself.

Note that this patch upgrades the Guava reference in the pom from 11.0.2
to 14.0.1. In fact, I found that Guava 14.0.1 was already leaking onto
the classpath by being included in hive-exec.jar, so it was ending up
picking one or the other in a somewhat unpredictable fashion. The
CacheBuilder class had a small API change between v11 and v14 so I
needed to ensure a specific version so that Eclipse and Maven agreed on
which version to build against.

This includes some basic unit testing and I also verified that some
query tests like TPCH pass.

Change-Id: I9a57521ad851da605604a1e7c48d3d6627da5df5
Reviewed-on: http://gerrit.cloudera.org:8080/11208
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Vuk Ercegovac <ve...@cloudera.com>


> Simple granular caching of partition metadata in impalad
> --------------------------------------------------------
>
>                 Key: IMPALA-7437
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7437
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Todd Lipcon
>            Priority: Major
>
> This JIRA tracks adding a simple cache to the catalog implementation in the impalad to cache table partitions and their file metadata. The initial cut will not cache other objects like functions, databases, table names, etc, so that we can avoid having to do more complex invalidation at first. Additionally, a simple time-based expiration will be used, to be replaced later with size-based eviction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org