You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by "Heller, Chris" <ch...@akamai.com> on 2014/03/13 19:01:34 UTC

Constructing a complex system in Oozie.

I have three workflows which I wish to coordinate.

* WF-A partitions a single input into multiple outputs
* WF-B aggregates the partitions of all WF-A workflows at the time it is run
* WF-C processes a single aggregate partition created by WF-B
There are some more constraints on this system:

* WF-A is started by an external process. Its start time is random. Each
WF-A is independent of the others.
* WF-B cannot run concurrently with another WF-B.
* Each WF-C is independent of the others, except that no two WF-C can
process the same partition simultaneously, and if a WF-C is successful
another WF-C will not reprocess its data again.
* The entire system should be driven by the external process which launches
WF-A (I.e there is no clock in this system)
I feel like this system may be expressible with Oozie using coordinators
(and perhaps bundles), and some custom Map Reduce actions. However I would
appreciate some thoughts on how I might construct this, as it isnĀ¹t
completely clear to me how to proceed.

Thanks,
Chris