You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Alex Rodoni (Jira)" <ji...@apache.org> on 2019/08/28 21:59:02 UTC

[jira] [Updated] (IMPALA-8490) Impala Doc: the file handle cache now supports S3

     [ https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Rodoni updated IMPALA-8490:
--------------------------------
    Labels: in_33  (was: future_release_doc in_33)

> Impala Doc: the file handle cache now supports S3
> -------------------------------------------------
>
>                 Key: IMPALA-8490
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8490
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Docs
>            Reporter: Sahil Takiar
>            Assignee: Alex Rodoni
>            Priority: Major
>              Labels: in_33
>             Fix For: Impala 3.3.0
>
>
> https://impala.apache.org/docs/build/html/topics/impala_scalability.html state:
> {quote}
> Because this feature only involves HDFS data files, it does not apply to non-HDFS tables, such as Kudu or HBase tables, or tables that store their data on cloud services such as S3 or ADLS.
> {quote}
> This section should be updated because the file handle cache now supports S3 files.
> We should add a section to the docs similar to what we added when support for remote HDFS files was added to the file handle cache:
> {quote}
> In Impala 3.2 and higher, file handle caching also applies to remote HDFS file handles. This is controlled by the cache_remote_file_handles flag for an impalad. It is recommended that you use the default value of true as this caching prevents your NameNode from overloading when your cluster has many remote HDFS reads.
> {quote}
> Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has been added as an impalad startup option (the flag is enabled by default).
> Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode from overloading when your cluster has many remote HDFS reads" should be changed to something like "avoids an unnecessary call to S3AFileSystem#getFileStatus() which reduces the number of API calls made to S3."



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org