You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/01/25 17:56:00 UTC

[jira] [Commented] (IMPALA-10147) Avoid getting a file handle for data cache hits

    [ https://issues.apache.org/jira/browse/IMPALA-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271578#comment-17271578 ] 

ASF subversion and git services commented on IMPALA-10147:
----------------------------------------------------------

Commit 2644203d1cbdd124a75a3da80fc176a447f3164c in impala's branch refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2644203 ]

IMPALA-10147: Avoid getting a file handle for data cache hits

When reading from the data cache, the disk IO thread first gets a file
handle, then it checks the data cache for a hit. The file handle is only
used if there is a data cache miss. It is not used when data cache hit
and in turns becomes an overhead. This patch move the file handle
retrieval later when data cache miss hapens.

Testing:
- Add custom cluster test test_no_fd_caching_on_cached_data.
- Pass core tests.

Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91
Reviewed-on: http://gerrit.cloudera.org:8080/16963
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Avoid getting a file handle for data cache hits
> -----------------------------------------------
>
>                 Key: IMPALA-10147
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10147
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Joe McDonnell
>            Assignee: Riza Suminto
>            Priority: Critical
>
> When reading from the data cache, the DiskIo thread first gets a file handle, then it checks the data cache for a hit. If there is a cache hit, then the file handle is not actually used. It is only used if there is a cache miss. There is no real reason to have the file handle open for cache hits. It doesn't really serve any additional purpose, and it adds overhead to cache hits.
> For platforms that do not have the file handle cache, this can be a significant overhead.
> We should only open the file handle after we have checked the data cache and know that we need to read from regular storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org