You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Anup Tiwari <an...@gmail.com> on 2018/04/16 14:33:10 UTC

Ways to reduce launching time of query in Hive 2.2.1

Hi All,

We have a use case where we need to return output in < 10 sec. We have
evaluated different set of tool for execution and they work find but they
do not cover all cases as well as they are not reliable(since they are in
evolving phase). But Hive works well in this context.

Using Hive LLAP, we have reduced query time to 6-7sec. But query launching
takes ~12-15 sec due to which response time becomes 18-21 sec.

Is there any way we can reduce this launching time?

Please note that we have tried prewarm containers but when we are launching
query from hive client then it is not picking containers from already
initialized containers rather it launches its own.

Please let me know how can we overcome this issue since this is the only
problem which is stopping us from using Hive. Any links/description is
really appreciated.


Regards,
Anup Tiwari

Re: Ways to reduce launching time of query in Hive 2.2.1

Posted by Sungwoo Park <gl...@gmail.com>.

Do you use Tez session pool along with LLAP (as Thai suggests in the
previous reply)? If a new query finds an idle AM in Tez session pool, there
will be no launch cost for AM. If no idle AM is found or if you specify a
queue name, a new AM should start in order to serve the query. This is
explained in detail in the following article (see 'Understanding #4'):

https://community.hortonworks.com/articles/56636/hive-understanding-concurrent-sessions-queue-alloc.html

Hence, if not enough AMs are available in Tez session pool, new queries
will have to wait until old queries are finished. If there are not many
concurrent queries, I guess using Tez session pool will solve your issue.

In a highly concurrent setting, Hive-MR3 practically eliminates this
limitation. In Hive-MR3, HiveServer2 in shared session mode launches a
single AppMaster to be shared by all incoming queries, so there is no
launch cost. Containers are also shared by all queries and thus run like
daemons.

https://mr3.postech.ac.kr/hivemr3/features/hiveserver2/

Hive-MR3 0.1 does not support LLAP IO yet, but Hive-MR3 0.2 will support
LLAP IO (which will be released by the end of this month.)

--- Sungwoo Park

On Mon, Apr 16, 2018 at 11:33 PM, Anup Tiwari <an...@gmail.com>
wrote:

> Hi All,
>
> We have a use case where we need to return output in < 10 sec. We have
> evaluated different set of tool for execution and they work find but they
> do not cover all cases as well as they are not reliable(since they are in
> evolving phase). But Hive works well in this context.
>
> Using Hive LLAP, we have reduced query time to 6-7sec. But query launching
> takes ~12-15 sec due to which response time becomes 18-21 sec.
>
> Is there any way we can reduce this launching time?
>
> Please note that we have tried prewarm containers but when we are
> launching query from hive client then it is not picking containers from
> already initialized containers rather it launches its own.
>
> Please let me know how can we overcome this issue since this is the only
> problem which is stopping us from using Hive. Any links/description is
> really appreciated.
>
>
> Regards,
> Anup Tiwari
>

Re: Ways to reduce launching time of query in Hive 2.2.1

Posted by Thai Bui <bl...@gmail.com>.

The best approach would be to use a demonized containers such as Hive LLAP
+ Tez session pool or Spark on Hive.

I’m not that familiar with Spark on Hive so I can’t comment on it but Hive
on LLAP has worked really well for me when coupled with Tez session pool.
You’ll have to specify how many Tez AMs initialized per LLAP pool when
HiveServer2 started, and those AMs will be used for all the queries in that
pool.

The actual Tez containers are “replaced” by LLAP daemons that are always
running so there’s no start up cost as well. The underline execution engine
is still Tez but it is executed in a special LLAP mode and this could
potentially give you sub second response time.

In my experience, when Hive LLAP is used, IO cache is enabled and the file
format is ORC, I can get under 1s for small queries when the cage is hit
(equivalent to in-memory database at at time). Parquet is slower since the
LLAP mode doesn’t support efficient IO caching and vectorized execution.

On Mon, Apr 16, 2018 at 9:33 AM Anup Tiwari <an...@gmail.com> wrote:

> Hi All,
>
> We have a use case where we need to return output in < 10 sec. We have
> evaluated different set of tool for execution and they work find but they
> do not cover all cases as well as they are not reliable(since they are in
> evolving phase). But Hive works well in this context.
>
> Using Hive LLAP, we have reduced query time to 6-7sec. But query launching
> takes ~12-15 sec due to which response time becomes 18-21 sec.
>
> Is there any way we can reduce this launching time?
>
> Please note that we have tried prewarm containers but when we are
> launching query from hive client then it is not picking containers from
> already initialized containers rather it launches its own.
>
> Please let me know how can we overcome this issue since this is the only
> problem which is stopping us from using Hive. Any links/description is
> really appreciated.
>
>
> Regards,
> Anup Tiwari
>
-- 
Thai