You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by David Parks <da...@yahoo.com> on 2013/04/25 11:00:03 UTC

Multiple Inputs (2+ mappers) in oozie - how to?

I’m new to Oozie, and migrating a bunch of jobs from AWS where we launched
all jobs programmatically. Most of our jobs have multiple inputs (2 or more
mappers), and often complex inputs (such as scan a directory for the last
created dir with an _SUCCESS and use the files in there as the input to a
job).

 

So firstly.. I don’t see *any* reference to multiple inputs in the oozie
docs, and google isn’t much wiser. There is a way to do this right?

 

And secondly, I would love a hint on how best to identify the “part-r-#####”
input as seen below (latest directory by date with an _SUCCESS file). We use
the previous results (1st mapper) and merge it with the input (2nd mapper),
and this is a pretty common paradigm in our jobs. All easy to do when you
manage things programmatically, but it’s throwing me for a loop in trying to
migrate the stuff into Oozie/CDH4

 

·         JobX-2013-04-24

o   _SUCCESS

o   output

§  …

o   input

§  …

·         JobX-2013-04-25

o   _SUCCESS

o   output

§  part-r-00000

§  part-r-00001

§  part-r-00002

o   input

§  …

·         JobX-2013-04-26

o   input

§