You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Ye Zihao (Jira)" <ji...@apache.org> on 2023/01/31 09:47:00 UTC

[jira] [Created] (IMPALA-11884) Additional data cache partition for caching footers

Ye Zihao created IMPALA-11884:
---------------------------------

             Summary: Additional data cache partition for caching footers
                 Key: IMPALA-11884
                 URL: https://issues.apache.org/jira/browse/IMPALA-11884
             Project: IMPALA
          Issue Type: New Feature
          Components: Backend
            Reporter: Ye Zihao
            Assignee: Ye Zihao


[IMPALA-4568|https://issues.apache.org/jira/browse/IMPALA-4568] proposed the idea of caching the Parquet footer to improve scan performance, and [IMPALA-8341|https://issues.apache.org/jira/browse/IMPALA-8341] implemented the data cache for caching remote reads, which can solved partly.
However, data cache does not currently distinguish between the footer data and other data, which means that in the case of a large scan, the more valuable footer data is likely to be evicted from cache.
It seems like a good idea to have a separate cache partition for caching footer data, so that footer data doesn't compete with other data for cache space. For servers with limited disk space, it is even possible to prepare only the footer cache partition, which can also help with scan performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org