You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by "wesleydeng (邓威)" <we...@tencent.com> on 2020/06/02 03:22:26 UTC

Impala fail to cache FileHandle When using hdfs centralized cache

I enable Hdfs Cache througth setting one partition with cached 。

> alter table xxx partition (ds=20200518)  set cached in ‘my_pool_xxx' with replication = 3;


After execute several times counting sql , table stats show that cache is enable .

[cid:_Foxmail.1@3779ccf4-de46-7436-c2e5-c320ae82db4c]


But sql execution  with cache   elapsed  much longger than no cache before .

No Cache: about 15 second

With Cache: about 3 minutes


Reading the profile , we found that “CachedFileHandlesMissCount”  is seriously high, and impala waste too much time in opening  files.

Picture Below show the difference between With-Cache (left) and On-Cache (right)

[cid:_Foxmail.1@1d25f035-448b-24db-1b77-a49d6de006a3]


Is impala fail to cache File Handle when enable hdfs cache ?

Re: Impala fail to cache FileHandle When using hdfs centralized cache

Posted by Tim Armstrong <ta...@cloudera.com>.
Hi Wesley,
  I would expect the HDFS cache and file handle cache to work together in
general.

For what it's worth, HDFS caching is mostly not that important for
performance - the OS buffer cache is generally effective in practice at
keeping hot data in memory. It's sometimes useful to control the
replication of a hot table or partition and to avoid hot-spots. We don't
enable HDFS caching when we do perf benchmarks.

One reason why the file handle cache hit rate might be lower with HDFS
caching is because the scheduling is more randomised - for each file, we'll
pick between all of the cached replicas instead of consistently scheduling
reads at the first replica (like we generally would with non-cached files).

On Mon, Jun 1, 2020 at 8:23 PM wesleydeng(邓威) <we...@tencent.com>
wrote:

> I enable Hdfs Cache througth setting one partition with cached 。
>
> > alter table xxx partition (ds=20200518)  set cached in ‘my_pool_xxx'
> with replication = 3;
>
>
> After execute several times counting sql , table stats show that cache is
> enable .
>
>
> But sql execution  with cache   elapsed  much longger than no cache before
> .
>
> No Cache: about 15 second
>
> With Cache: about 3 minutes
>
>
> Reading the profile , we found that “CachedFileHandlesMissCount”  is
> seriously high, and impala waste too much time in opening  files.
>
> Picture Below show the difference between With-Cache (left) and On-Cache
> (right)
>
>
> Is impala fail to cache File Handle when enable hdfs cache ?
>