You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org> on 2012/01/13 18:40:43 UTC

[jira] [Commented] (OOZIE-636) Check fork and join in the workflow in the submission time

    [ https://issues.apache.org/jira/browse/OOZIE-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185691#comment-13185691 ] 

jiraposter@reviews.apache.org commented on OOZIE-636:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3486/
-----------------------------------------------------------

Review request for oozie, Mohammad Islam and Angelo K. Huang.


Summary
-------

Validate fork and join at wf submission time
https://issues.apache.org/jira/browse/OOZIE-636

Brief description of algo:

A modified dfs algorithm is used. Two stacks, one for dfs traversal and other for maintaining fork join status, are kept.  When a fork is encountered during traversal, it is added to the forkjoin stack and number of paths associated with the fork is also stored.  When a node’s child is seen as a join, the join is added to the forkJoin stack and the no. of paths to it is updated. When the number of paths for fork and join are equal, then the fork/join pair is removed from the forkJoin stack and join is pushed to the dfsStack.

Nodes other than fork and join are only pushed to the dfs stack.
If a action node is seen, only the node's "ok-to" transition is considered


While(!stack.isEmpty()){
	Node n = DfsStack.pop()
        n.traversed =  true;
		If(n.type==fork){
			ForkJoinStack.push(new Element(n, n.paths) );
		}
		List<Node> childs = getUnvisitedChildnodes(n)	
		For(Node n: childs){
			If (n.type==join){
			Boolean b=isForkJoinCleared(ForkJoinStack)	
			If(!b){
				Continue;
			}
			stack.push(n);
			n.traversed =  true;
		}				
}


This addresses bug OOZIE-636.
    https://issues.apache.org/jira/browse/OOZIE-636


Diffs
-----

  trunk/core/src/main/java/org/apache/oozie/ErrorCode.java 1230856 
  trunk/core/src/main/java/org/apache/oozie/workflow/lite/ForkJoinElement.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/oozie/workflow/lite/LiteWorkflowApp.java 1230856 
  trunk/core/src/main/java/org/apache/oozie/workflow/lite/LiteWorkflowAppParser.java 1230856 
  trunk/core/src/test/java/org/apache/oozie/service/TestLiteWorkflowAppService.java 1230856 
  trunk/core/src/test/java/org/apache/oozie/workflow/lite/TestLiteWorkflowApp.java PRE-CREATION 
  trunk/core/src/test/resources/wf-schema-valid.xml 1230856 

Diff: https://reviews.apache.org/r/3486/diff


Testing
-------

Test case to validate fork-join added


Thanks,

Virag


                
> Check fork and join in the workflow in the submission time 
> -----------------------------------------------------------
>
>                 Key: OOZIE-636
>                 URL: https://issues.apache.org/jira/browse/OOZIE-636
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Virag Kothari
>
> Enhancement: Oozie should check that the fork node and join node are correct in pair when user submits the job. This should be a static check, not when the workflow is running.
> Current logic bug:
> A workflow with different number of forks and joins was run. The wf job should have been killed but it succeeded. Also, strangely, the action was killed. 
> Following are the different types of tests run and their results with varying delays.
> test1: wf job SUCCEEDED, action java12 KILLED.
> delay11=11
> delay12=12
> delay121=1
> delay122=2
> delay21=1
> delay22=1
> test2: wf job SUCCEEDED, action java12 KILLED. 
> delay11=1
> delay12=12
> delay121=1
> delay122=2
> delay21=1
> delay22=1
> test3: wf job SUCCEEED, all actions OK. question: why wf job always pass in this scenario, even when fork-join not in
> pair?
> delay11=10
> delay12=10
> delay121=15
> delay122=15
> delay21=20
> delay22=20
> workflow.xml
> ============
> <workflow-app xmlns='uri:oozie:workflow:0.1' name='fork-join-4735180-wf'>
>     <start to='fork1' />
>     <fork name="fork1">
>         <path start="java11" />
>         <path start="fork12" />
>     </fork>
>     <action name='java11'>
>         <java>
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
>             <configuration>
>                 <property>
>                     <name>mapred.job.queue.name</name>
>                     <value>${queueName}</value>
>                 </property>
>             </configuration>
>             <main-class>qa.test.tests.testsleep</main-class>
>             <arg>${delay11}</arg>
>         </java>
>         <ok to="java12" />
>         <error to="fail" />
>     </action>
>     <action name='java12'>
>         <java>
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
>             <configuration>
>                 <property>
>                     <name>mapred.job.queue.name</name>
>                     <value>${queueName}</value>
>                 </property>
>             </configuration>
>             <main-class>qa.test.tests.testsleep</main-class>
>             <arg>${delay12}</arg>
>         </java>
>         <ok to="join1" />
>         <error to="fail" />
>     </action>
>     <fork name="fork12">
>         <path start="java121" />
>         <path start="java122" />
>     </fork>
>     <action name='java121'>
>         <java>
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
>             <configuration>
>                 <property>
>                     <name>mapred.job.queue.name</name>
>                     <value>${queueName}</value>
>                 </property>
>             </configuration>
>             <main-class>qa.test.tests.testsleep</main-class>
>             <arg>${delay121}</arg>
>         </java>
>         <ok to="join12" />
>         <error to="fail" />
>     </action>
>     <action name='java122'>
>         <java>
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
>             <configuration>
>                 <property>
>                     <name>mapred.job.queue.name</name>
>                     <value>${queueName}</value>
>                 </property>
>             </configuration>
>             <main-class>qa.test.tests.testsleep</main-class>
>             <arg>${delay122}</arg>
>         </java>
>         <ok to="join12" />
>         <error to="fail" />
>     </action>
>     <join name="join12" to="fork2" />
>     <fork name="fork2">
>         <path start="java21" />
>         <path start="java22" />
>     </fork>
>     <action name='java21'>
>         <java>
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
>             <configuration>
>                 <property>
>                     <name>mapred.job.queue.name</name>
>                     <value>${queueName}</value>
>                 </property>
>             </configuration>
>             <main-class>qa.test.tests.testsleep</main-class>
>             <arg>${delay21}</arg>
>         </java>
>         <ok to="join1" />
>         <error to="fail" />
>     </action>
>     <action name='java22'>
>         <java>
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
>             <configuration>
>                 <property>
>                     <name>mapred.job.queue.name</name>
>                     <value>${queueName}</value>
>                 </property>
>             </configuration>
>             <main-class>qa.test.tests.testsleep</main-class>
>             <arg>${delay22}</arg>
>         </java>
>         <ok to="join1" />
>         <error to="fail" />
>     </action>
>     <join name="join1" to="end" />
>     <kill name="fail">
>         <message>Streaming Map/Reduce failed, error
> message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>     <end name='end' />
> </workflow-app>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira