You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by David Parks <da...@yahoo.com> on 2013/04/25 11:00:03 UTC
Multiple Inputs (2+ mappers) in oozie - how to?
Im new to Oozie, and migrating a bunch of jobs from AWS where we launched
all jobs programmatically. Most of our jobs have multiple inputs (2 or more
mappers), and often complex inputs (such as scan a directory for the last
created dir with an _SUCCESS and use the files in there as the input to a
job).
So firstly.. I dont see *any* reference to multiple inputs in the oozie
docs, and google isnt much wiser. There is a way to do this right?
And secondly, I would love a hint on how best to identify the part-r-#####
input as seen below (latest directory by date with an _SUCCESS file). We use
the previous results (1st mapper) and merge it with the input (2nd mapper),
and this is a pretty common paradigm in our jobs. All easy to do when you
manage things programmatically, but its throwing me for a loop in trying to
migrate the stuff into Oozie/CDH4
· JobX-2013-04-24
o _SUCCESS
o output
§
o input
§
· JobX-2013-04-25
o _SUCCESS
o output
§ part-r-00000
§ part-r-00001
§ part-r-00002
o input
§
· JobX-2013-04-26
o input
§