You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Purshotam Shah (JIRA)" <ji...@apache.org> on 2014/08/19 19:49:18 UTC

[jira] [Comment Edited] (OOZIE-1976) Specifying coordinator input datasets in more logical ways

    [ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102527#comment-14102527 ] 

Purshotam Shah edited comment on OOZIE-1976 at 8/19/14 5:47 PM:
----------------------------------------------------------------

Some more suggestion.
1. Can you add option to force start coord action?
Some time user like to start coord action even if few dependencies are missing.

2. Can you add explain option for waiting coord action?
oozie job -explain <jobID_with cord>
oozie job -explain <jobID> -action <list>

Since we are adding more option for dataset, it may confuse user more.
explain option should explain why coord action is still waiting.
 a> total input range is 30. Min is 10. Found 8 and waiting of or 2 missing input dependencies.
 b> total input range is 30. Wait time is 10min. Found 8 and waited 8 min, will wait for 2 min.



was (Author: puru):
Some more suggestion.
1. Can you add option to force start coord action?
Some time user like to start coord action even if few dependencies are missing.

2. Can you add explain option for waiting coord action?
oozie job -explain <jobID_with cord>
oozie job -explain <jobID> -action <list>

Since we are adding more option for dataset, they may confuse user more.
explain option should explain why coord action is still waiting.
 a> total input range is 30. Min is 10. Found 8 and waiting of or 2 missing input dependencies.
 b> total input range is 30. Wait time is 10min. Found 8 and waited 8 min, will wait for 2 min.


> Specifying coordinator input datasets in more logical ways
> ----------------------------------------------------------
>
>                 Key: OOZIE-1976
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1976
>             Project: Oozie
>          Issue Type: New Feature
>          Components: coordinator
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>            Assignee: Mona Chitnis
>             Fix For: trunk
>
>         Attachments: OOZIE-1976-rough-design.pdf
>
>
> All dataset instances specified as input to coordinator, currently work on AND logic i.e. ALL of them should be available for workflow to start. We should enhance this to include more logical ways of specifying availability criteria e.g.
>  * OR between instances
>  * minimum N out of K instances
>  * delta datasets (process data incrementally)
> Use-cases for this:
>  * Different datasets are BCP, and workflow can run with either, whichever arrives earlier.
>  * Data is not guaranteed, and while $coord:latest allows skipping to available ones, workflow will never trigger unless mentioned number of instances are found.
>  * Workflow is like a ‘refining’ algorithm which should run after minimum required datasets are ready, and should only process the delta for efficiency.
> This JIRA is to discuss the design and then the review the implementation for some or all of the above features.



--
This message was sent by Atlassian JIRA
(v6.2#6252)