You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Ye Zihao (Jira)" <ji...@apache.org> on 2023/02/01 02:33:00 UTC

[jira] [Created] (IMPALA-11886) Data cache should support asynchronous writes

Ye Zihao created IMPALA-11886:
---------------------------------

             Summary: Data cache should support asynchronous writes
                 Key: IMPALA-11886
                 URL: https://issues.apache.org/jira/browse/IMPALA-11886
             Project: IMPALA
          Issue Type: Improvement
    Affects Versions: Impala 4.3.0
            Reporter: Ye Zihao
            Assignee: Ye Zihao


Currently, writes to the data cache are synchronized with hdfs file reads, and both are handled by remote hdfs IO threads. In other words, if a cache miss occurs, the IO thread needs to take additional responsibility for cache writes, which will lead to query performance deterioration in some cases.
Therefore, the data cache should be able to defer the writes to another thread(or thread pool) which writes asynchronously, allowing the IO thread to copy the data into the temporary buffer and immediately return it to the Scanner. Also need to bound the extra memory consumption for holding the temporary buffer though.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)