You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Varun Rao <va...@cloudera.com> on 2019/05/07 19:00:02 UTC

Regarding hive.exec.parallel

Hello,

We were wondering what the benefits are of setting hive.exec.parallel to
true. I know that this will execute any possible tasks in parallel. For
example MapReduce stages, sampling stages, merge stages, limit stages etc
will be executed in parallel allowing for the overall job to be completed
more quickly. However, my co worker and I decided to run a 400 line long
query in separate Tez sessions. I would set hive.exec.parallel=true and he
would set it to false. However, we see almost no improvement in speed. I am
assuming that is because the stages of my query is dependent on one another
(join a with b with c with d). Is this  the case? Can you give me examples
of queries where there would be improvement in speed (perhaps in TPC-DS)
when one sets hive.exec.parallel to true

Thanks

Yours Truly,
Varun Rao

Re: Regarding hive.exec.parallel

Posted by Zoltan Haindrich <ki...@rxd.hu>.
Hey Varun!

I'm not sure about you actual query; but hive.exec.parallel enable to execute stages in parallel
The full tez dag is usually "one stage" of the execution (but you should take a look at the explain).
If you would be using mr engine there might have been some speedup; but in case of tez the parallel execution of independent tasks are happening inside tez independently 
from this setting.

cheers,
Zoltan

On 5/7/19 9:00 PM, Varun Rao wrote:
> Hello,
> 
> We were wondering what the benefits are of setting hive.exec.parallel to true. I know that this will execute any possible tasks in parallel. For example MapReduce stages, 
> sampling stages, merge stages, limit stages etc will be executed in parallel allowing for the overall job to be completed more quickly. However, my co worker and I decided 
> to run a 400 line long query in separate Tez sessions. I would set hive.exec.parallel=true and he would set it to false. However, we see almost no improvement in speed. I 
> am assuming that is because the stages of my query is dependent on one another (join a with b with c with d). Is thisĀ  the case? Can you give me examples of queries where 
> there would be improvement in speed (perhaps in TPC-DS) when one sets hive.exec.parallel to true
> 
> Thanks
> 
> Yours Truly,
> Varun Rao