You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Gang Luo <lg...@yahoo.com.cn> on 2010/04/14 21:42:46 UTC

how Hive generate jobs

Hi All,
so, after Hive generates the DAG with each node representing a MepReduce job, it will sort it in topological order. All the MapReduce jobs, in the form of a sequence, will be packed into a jar file and submitted to Hadoop for execution. In order to solve the dependencies among jobs, there will be wait() inserted between two depend jobs. The question is, if two job are peer without any dependency and they are next to each other in the topological sequence, is there also a wait() between them?

For example, job1 is the father and has two children job2 and job3. The topological sequence is job1--job2--job3, even job3 doesn't depend on job2. Can job3 run before job 2 finishes?

Thanks,
-Gang