You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2024/03/14 17:53:00 UTC

[jira] [Created] (IMPALA-12905) Implement disk-based tuple caching

Joe McDonnell created IMPALA-12905:
--------------------------------------

             Summary: Implement disk-based tuple caching
                 Key: IMPALA-12905
                 URL: https://issues.apache.org/jira/browse/IMPALA-12905
             Project: IMPALA
          Issue Type: Task
          Components: Backend
    Affects Versions: Impala 4.4.0
            Reporter: Joe McDonnell


The TupleCacheNode caches tuples to be reused later for equivalent queries. This tracks implementing a version that serializes tuples and stores them as files on local disk. 

This will have a few parts:
 # There is a TupleCacheMgr that keeps track of what entries exist in the cache and evicts entries as needed to make space for new entries. This will be configured using startup flags to specify the directory, size, and cache eviction policy.
 # The TupleCacheNode will interact with the TupleCacheMgr to determine if the entry is available. If it is, it reads the associated tuple cache file and returns the RowBatches. If the entry does not exist, it reads RowBatches from its child and stores them to a new file in the cache.
 # The TupleReader / TupleWriter implement serialization / deserialization of RowBatches to/from a local file. This uses the existing serialization used for KRPC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org