You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (JIRA)" <ji...@apache.org> on 2019/01/09 19:11:00 UTC

[jira] [Commented] (IMPALA-7928) Investigate consistent placement of remote scan ranges

    [ https://issues.apache.org/jira/browse/IMPALA-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738560#comment-16738560 ] 

Joe McDonnell commented on IMPALA-7928:
---------------------------------------

After some experimentation, it is clear that this is important for the effectiveness of the remote file handle caching. Upgrading the priority.

> Investigate consistent placement of remote scan ranges
> ------------------------------------------------------
>
>                 Key: IMPALA-7928
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7928
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>
> With the file handle cache, it is useful for repeated scans of the same file to go to the same node, as that node will already have a file handle cached.
> When scheduling remote ranges, the scheduler introduces randomness that can spread reads across all of the nodes. Repeated executions of queries on the same set of files will not schedule the remote reads on the same nodes. This causes a large amount of duplication across file handle caches on different nodes. This reduces the efficiency of the cache significantly.
> It may be useful for the scheduler to introduce some determinism in scheduling remote reads to take advantage of the file handle cache. This is a variation on the well-known tradeoff between skew and locality.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org