You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Raajay <ra...@gmail.com> on 2015/12/01 22:17:11 UTC

Caching intermediate data in tez object registry

Hello,

My setup is Hive on Tez.  I find that for most of my queries, the map stage
takes the longest. Is it possible to use the Tez Shared Object Registry to
cache the intermediate data to improve performance of recurring queries ?

If yes, how would I do it ? Assuming that the nodes I run on have
sufficient RAM to store all intermediate data.

Raajay

Re: Caching intermediate data in tez object registry

Posted by Bing Jiang <ji...@gmail.com>.
hi, Raajay.

https://issues.apache.org/jira/browse/HIVE-7313 provides a potential
solutions to store intermediate data into Memory/SSD. But it relies on the
hdfs feature of multiple StorageType (
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
)

2015-12-02 5:17 GMT+08:00 Raajay <ra...@gmail.com>:

> Hello,
>
> My setup is Hive on Tez.  I find that for most of my queries, the map
> stage takes the longest. Is it possible to use the Tez Shared Object
> Registry to cache the intermediate data to improve performance of recurring
> queries ?
>
> If yes, how would I do it ? Assuming that the nodes I run on have
> sufficient RAM to store all intermediate data.
>
> Raajay
>



-- 
Bing Jiang