You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Juan Yu (JIRA)" <ji...@apache.org> on 2018/01/02 22:18:00 UTC

[jira] [Created] (IMPALA-6361) File handle cache should be shared across multiple IO threads

Juan Yu created IMPALA-6361:
-------------------------------

             Summary: File handle cache should be shared across multiple IO threads
                 Key: IMPALA-6361
                 URL: https://issues.apache.org/jira/browse/IMPALA-6361
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 2.10.0
            Reporter: Juan Yu


A file handle can only be used by one thread at a time, cannot be shared across multiple IO threads due to statistics tracking issue. This leads to multiple file handle cache been created and added to cache. This still adds NN load and reduce the number of files can be cached.
We should investigate a way to share a file handle across threads while maintaining appropriate statistics
Another thing to improve is to improve the efficiency of the file handle cache. For example, reducing the size of the HDFS file handle itself would reduce the memory footprint and allow the cache to hold more entries in the same memory. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)