You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Peter Bacsko (Jira)" <ji...@apache.org> on 2019/11/22 07:00:00 UTC

[jira] [Comment Edited] (OOZIE-3561) Forkjoin validation is slow when there are many actions in chain

    [ https://issues.apache.org/jira/browse/OOZIE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979310#comment-16979310 ] 

Peter Bacsko edited comment on OOZIE-3561 at 11/22/19 6:59 AM:
---------------------------------------------------------------

I refactored the validator 3 years ago, so I had to check it again how it works:

1. Basic validation makes sure that the workflow is not acyclic. That's definitely fast.
2. Fork-join validation: it was more tricky. Multiple fork-joins did cause problems because paths were re-walked unnecessarily - this had exponential runtime with regards to the number of fork-join pairs. However, OOZIE-1978 made sure that no unnecessary walks take place by making sure that we stop the recursion when we encounter a join. 

Right now I don't see what could go wrong.


was (Author: pbacsko):
I refactored the validator 3 years ago, so I had to check it again how it works:

1. Basic validation makes sure that the workflow is acyclic. That's definitely fast.
2. Fork-join validation: it was more tricky. Multiple fork-joins did cause problems because paths were re-walked unnecessarily - this had exponential runtime with regards to the number of fork-join pairs. However, OOZIE-1978 made sure that no unnecessary walks take place by making sure that we stop the recursion when we encounter a join. 

Right now I don't see what could go wrong.

> Forkjoin validation is slow when there are many actions in chain
> ----------------------------------------------------------------
>
>                 Key: OOZIE-3561
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3561
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.1.0
>            Reporter: Denes Bodo
>            Assignee: Denes Bodo
>            Priority: Critical
>              Labels: performance
>
> In case we have a workflow which has, let's say, 80 actions after each other:
> {{a1 -> a2 -> ... a80}}
> then the validator code "never" finishes.
> Currently the validation (in my understanding) does depth first checks from the start node and runs in time of n! . This is confirmed as when we split this huge workflow into two 40-element workflow then we get 2x ~40!-step in validation instead of ~80! steps.
> Guys, could you please confirm or disprove my theory?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)