You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2020/06/19 00:38:00 UTC

[jira] [Resolved] (IMPALA-7533) Optimize fetch-from-catalog by caching partitions across table versions

     [ https://issues.apache.org/jira/browse/IMPALA-7533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang resolved IMPALA-7533.
------------------------------------
    Fix Version/s: Impala 4.0
       Resolution: Fixed

> Optimize fetch-from-catalog by caching partitions across table versions
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-7533
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7533
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Todd Lipcon
>            Assignee: Quanlong Huang
>            Priority: Major
>              Labels: catalog-v2
>             Fix For: Impala 4.0
>
>
> Currently, the cached partition-level information in CatalogdMetaProvider is tied to a particular version number of its containing table. This means that if the table is modified in any way (eg even a comment changes) all of the partitions are effectively invalidated and need to be re-loaded from catalogd.
> We could avoid this invalidation-and-refetch in a couple ways:
> 1) make partitions immutable given an ID. Instead of modifying partitions in place, we could drop the partition and add a new one with a new ID. This is already done in several code paths, but not all. If we did this, then we'd just need to invalidate the partition _list_ for a table, and when we fetched the new list, we'd see which partitions changed and need to be reloaded.
> 2) add a partition-level version/sequence number which is modified whenever the partition is mutated in place. If we fetched that as part of the partition list, and used it as part of the cache key, we could avoid invalidating partitions when nothing changed. This would have the cost of 4 or 8 bytes per partition (perhaps manageable considering the hundreds of bytes saved by recent patches)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org