You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Robert Kanter (JIRA)" <ji...@apache.org> on 2013/09/20 02:01:51 UTC

[jira] [Commented] (OOZIE-1550) Create a safeguard to kill errant recursive workflows before they bring down oozie

    [ https://issues.apache.org/jira/browse/OOZIE-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772492#comment-13772492 ] 

Robert Kanter commented on OOZIE-1550:
--------------------------------------

We were discussing this and had a couple ideas on ways to prevent this from happening:
# Before running the subworkflow, Oozie could check if the subworkflow's path matches the path of any of its ancestors and prevent it from running
#- Some users rely on this functionally to create a loop, so this would have to be disabled by default (i.e. the current behavior), but an oozie-site config could turn it on
# Add a limit to the "depth" of subworkflows
#- e.g. if configured to 3, the if wfA calls wfB calls wfC, it won't allow wfC to call wfD, regardless of the workflows themselves

#1 is the technically more correct way because as-is, it essentially violates the fact that workflows should be DAGs.  So I think that's the way to go.  Thoughts?
                
> Create a safeguard to kill errant recursive workflows before they bring down oozie
> ----------------------------------------------------------------------------------
>
>                 Key: OOZIE-1550
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1550
>             Project: Oozie
>          Issue Type: Improvement
>          Components: workflow
>    Affects Versions: 3.3.2, 4.0.0
>            Reporter: Robert Justice
>              Labels: features
>
> If a user creates an errant workflow with a sub-workflow that calls the workflow again, without a proper decision node to exit the workflow, it will continue to create numerous jobs until the oozie server is saturated.  A user recently had 400,000 running jobs and oozie was non-responsive.  I would suggest we have some method of preventing a user from taking out oozie, such as a max running jobs 
> parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira