You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Alexandre Fouche <al...@cleverscale.com> on 2012/12/06 14:24:21 UTC

Hive parallel execution deadlocks, need restart of yarn-nodemanager

Is there a known deadlock issue or bug when using Hive parallel execution with more parallel hive threads than there are computing nodemanagers ?

On my test cluster, i have set Hive parallel excution to 2 or 3 threads, and have only 1 computing nodemanager with 5 cpu cores.

When i run a hive request with a lot of unions that decomposes in a lot of jobs to be executed in parallel, after a few jobs done, it always endup deadlocking on 0% at mapping for all parallel jobs (from Hive0server2 logs). If i restart hadoop-yarn-nodemanager on the nodemanager server, Hive gets out of its deadlock and continues, until getting deadlocked a bit later again.

Alex

Re: Hive parallel execution deadlocks, need restart of yarn-nodemanager

Posted by Alexandre Fouche <al...@cleverscale.com>.

Ah i see, i had missed the fact that each MR jobs had an ApplicationManager that was taking a container, there were none free to run mappers (my jobs usually have only one mapper due to small input data). I understood that thanks to your explanations and using more nodes with a greater concurrency, and like before all containers were running an ApplicationManager !

Thank you very much ! 


--
Alexandre Fouche
Lead operations engineer, cloud architect
http://www.cleverscale.com | @cleverscale
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday 6 December 2012 at 21:08, Vinod Kumar Vavilapalli wrote:

> 
> You mentioned you only have one NodeManager.
> 
> So, is hive generating 3 MapReduce jobs? And how many map and reduce tasks for each job?
> 
> What is your yarn.nodemanager.resource.memory-mb? That determines the maximum number of containers you can run.
> 
> You are running into an issue where all the jobs are running in parallel, and because job now has one 'ApplicationMaster' which also occupies a container, the jobs are getting into a scheduling livelock. On single node you will not have enough capacity to run many jobs in parallel.
> Thanks,
> +Vinod
> 
> On Dec 6, 2012, at 5:24 AM, Alexandre Fouche wrote:
> > Is there a known deadlock issue or bug when using Hive parallel execution with more parallel hive threads than there are computing nodemanagers ?
> > 
> > On my test cluster, i have set Hive parallel excution to 2 or 3 threads, and have only 1 computing nodemanager with 5 cpu cores.
> > 
> > When i run a hive request with a lot of unions that decomposes in a lot of jobs to be executed in parallel, after a few jobs done, it always endup deadlocking on 0% at mapping for all parallel jobs (from Hive0server2 logs). If i restart hadoop-yarn-nodemanager on the nodemanager server, Hive gets out of its deadlock and continues, until getting deadlocked a bit later again.
> > 
> > Alex

Re: Hive parallel execution deadlocks, need restart of yarn-nodemanager

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

You mentioned you only have one NodeManager.

So, is hive generating 3 MapReduce jobs? And how many map and reduce tasks for each job?

What is your yarn.nodemanager.resource.memory-mb? That determines the maximum number of containers you can run.

You are running into an issue where all the jobs are running in parallel, and because job now has one 'ApplicationMaster' which also occupies a container, the jobs are getting into a scheduling livelock. On single node you will not have enough capacity to run many jobs in parallel.

Thanks,
+Vinod

On Dec 6, 2012, at 5:24 AM, Alexandre Fouche wrote:

> Is there a known deadlock issue or bug when using Hive parallel execution with more parallel hive threads than there are computing nodemanagers ?
> 
> On my test cluster, i have set Hive parallel excution to 2 or 3 threads, and have only 1 computing nodemanager with 5 cpu cores.
> 
> When i run a hive request with a lot of unions that decomposes in a lot of jobs to be executed in parallel, after a few jobs done, it always endup deadlocking on 0% at mapping for all parallel jobs (from Hive0server2 logs). If i restart hadoop-yarn-nodemanager on the nodemanager server, Hive gets out of its deadlock and continues, until getting deadlocked a bit later again.
> 
> Alex