You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Michael Segel <mi...@hotmail.com> on 2010/05/14 23:32:54 UTC

Chaining jobs that have dependencies on jars...


We have a series of hadoop map/reduce jobs that need to be run.

In between each job, we have to do some logic and depending on the results the next job gets called.

So in the chain of events, job A runs. At the end of the job, some value is evaluated. Depending on its result, we want to run either job B or job C.

So we can use the Tool Interface and load the class and run.

The catch is that some of the jobs have dependencies. 
When launched from the hadoop command line as a standalone jar, if there are any dependencies and the jar file has a /lib directory, those jars will be loaded.

When you use the tool interface, those jars in the /lib directory will not be loaded.

Outside of using the distributed cache, is there a way to launch a job so that it will pick up the jar file?

Thx

-Mike

 		 	   		  
_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. 
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5