You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Lars Volker (JIRA)" <ji...@apache.org> on 2018/08/14 17:48:00 UTC

[jira] [Commented] (IMPALA-6403) Enable file handle reuse for multiple scan ranges within the same file for an HDFS Scan node

    [ https://issues.apache.org/jira/browse/IMPALA-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580154#comment-16580154 ] 

Lars Volker commented on IMPALA-6403:
-------------------------------------

Our current implementation calls {{hdfsUnbufferFile()}} when returning a handle to the cache. For non-short-circuit reads, this causes the file buffer to be emptied. However, read() still buffers the whole block upon each call to read, leading to overhead on the DN. This is not a big issue currently because local reads use SCR and remote reads don't use the file handle cache. When adding support for the FHC for remote reads, we should either not unbuffer them, or switch to using {{pread()}} (IMPALA-5212).

> Enable file handle reuse for multiple scan ranges within the same file for an HDFS Scan node
> --------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-6403
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6403
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Mostafa Mokhtar
>            Assignee: Joe McDonnell
>            Priority: Major
>
> Impala creates a file handle per scan range, for queries that read multiple columns per scan range un-necessarily large load is added to the HDFS NameNode which limits scalability on large clusters.
> For a given set of scan ranges against a file within a Scan Node a single file handle should be created an reused to avoid excessive RPCs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org