You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by "Angelo K. Huang (JIRA)" <ji...@apache.org> on 2011/08/05 03:24:27 UTC

[jira] [Created] (OOZIE-10) workflow action allow user auto retry

workflow action allow user auto retry
-------------------------------------

                 Key: OOZIE-10
                 URL: https://issues.apache.org/jira/browse/OOZIE-10
             Project: Apache Oozie (Incubating)
          Issue Type: New Feature
            Reporter: Angelo K. Huang
            Assignee: Angelo K. Huang


Workflow action only allows transient error retry currently. User often wants to control retry in each action level, such as define custom retry count for each action. For a FAILED action, the possible reason could be startData or endData not set or EL exception. The potential problem worth to retry is when Oozie not able to get running job with a hadoop id. For a ERROR action, most of errors come from job application error such as failed to parse action conf, buffer overflow in ssh executor, or file not existed in fs action executor.

The solution is to define 0.3 workflow schema with new attributes in action level to get user defined retry and to add default Oozie conf for system level max user-retry. EX:

workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.3" name="test-wf">
<action name="a" retry-max="2" retry-interval="1">

</action>

oozie-default.xml

   <!-- Workflow Action Automatic Retry -->

    <property>
        <name>oozie.service.LiteWorkflowStoreService.user.retry.max</name>
        <value>3</value>
        <description>
            Automatic retry max count for workflow action is 3 in default.
        </description>
    </property>
   
    <property>
        
<name>oozie.service.LiteWorkflowStoreService.user.retry.inteval</name>
        <value>10</value>
        <description>
            Automatic retry interval for workflow action is in minutes 
and the default value is 10 minutes.
        </description>
    </property>
   
    <property>
        
<name>oozie.service.LiteWorkflowStoreService.user.retry.error.code</name>
        <value>
            JA017
        </value>
        <description>
            Automatic retry interval for workflow action is handled for 
these specified error code.
        </description>
    </property>
   
    <property>
        
<name>oozie.service.LiteWorkflowStoreService.user.retry.error.code.ext</name>
        <value> </value>
        <description>
            Automatic retry interval for workflow action is handled for 
these specified extra error code.
        </description>
</property>





--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-10) workflow action allow user auto retry

Posted by "Angelo K. Huang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080255#comment-13080255 ] 

Angelo K. Huang commented on OOZIE-10:
--------------------------------------

https://github.com/yahoo/oozie/pull/782

> workflow action allow user auto retry
> -------------------------------------
>
>                 Key: OOZIE-10
>                 URL: https://issues.apache.org/jira/browse/OOZIE-10
>             Project: Apache Oozie (Incubating)
>          Issue Type: New Feature
>            Reporter: Angelo K. Huang
>            Assignee: Angelo K. Huang
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Workflow action only allows transient error retry currently. User often wants to control retry in each action level, such as define custom retry count for each action. For a FAILED action, the possible reason could be startData or endData not set or EL exception. The potential problem worth to retry is when Oozie not able to get running job with a hadoop id. For a ERROR action, most of errors come from job application error such as failed to parse action conf, buffer overflow in ssh executor, or file not existed in fs action executor.
> The solution is to define 0.3 workflow schema with new attributes in action level to get user defined retry and to add default Oozie conf for system level max user-retry. EX:
> workflow.xml
> <workflow-app xmlns="uri:oozie:workflow:0.3" name="test-wf">
> <action name="a" retry-max="2" retry-interval="1">
> </action>
> oozie-default.xml
>    <!-- Workflow Action Automatic Retry -->
>     <property>
>         <name>oozie.service.LiteWorkflowStoreService.user.retry.max</name>
>         <value>3</value>
>         <description>
>             Automatic retry max count for workflow action is 3 in default.
>         </description>
>     </property>
>    
>     <property>
>         
> <name>oozie.service.LiteWorkflowStoreService.user.retry.inteval</name>
>         <value>10</value>
>         <description>
>             Automatic retry interval for workflow action is in minutes 
> and the default value is 10 minutes.
>         </description>
>     </property>
>    
>     <property>
>         
> <name>oozie.service.LiteWorkflowStoreService.user.retry.error.code</name>
>         <value>
>             JA017
>         </value>
>         <description>
>             Automatic retry interval for workflow action is handled for 
> these specified error code.
>         </description>
>     </property>
>    
>     <property>
>         
> <name>oozie.service.LiteWorkflowStoreService.user.retry.error.code.ext</name>
>         <value> </value>
>         <description>
>             Automatic retry interval for workflow action is handled for 
> these specified extra error code.
>         </description>
> </property>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (OOZIE-10) workflow action allow user auto retry

Posted by "Angelo K. Huang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Angelo K. Huang resolved OOZIE-10.
----------------------------------

    Resolution: Fixed

> workflow action allow user auto retry
> -------------------------------------
>
>                 Key: OOZIE-10
>                 URL: https://issues.apache.org/jira/browse/OOZIE-10
>             Project: Apache Oozie (Incubating)
>          Issue Type: New Feature
>            Reporter: Angelo K. Huang
>            Assignee: Angelo K. Huang
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Workflow action only allows transient error retry currently. User often wants to control retry in each action level, such as define custom retry count for each action. For a FAILED action, the possible reason could be startData or endData not set or EL exception. The potential problem worth to retry is when Oozie not able to get running job with a hadoop id. For a ERROR action, most of errors come from job application error such as failed to parse action conf, buffer overflow in ssh executor, or file not existed in fs action executor.
> The solution is to define 0.3 workflow schema with new attributes in action level to get user defined retry and to add default Oozie conf for system level max user-retry. EX:
> workflow.xml
> <workflow-app xmlns="uri:oozie:workflow:0.3" name="test-wf">
> <action name="a" retry-max="2" retry-interval="1">
> </action>
> oozie-default.xml
>    <!-- Workflow Action Automatic Retry -->
>     <property>
>         <name>oozie.service.LiteWorkflowStoreService.user.retry.max</name>
>         <value>3</value>
>         <description>
>             Automatic retry max count for workflow action is 3 in default.
>         </description>
>     </property>
>    
>     <property>
>         
> <name>oozie.service.LiteWorkflowStoreService.user.retry.inteval</name>
>         <value>10</value>
>         <description>
>             Automatic retry interval for workflow action is in minutes 
> and the default value is 10 minutes.
>         </description>
>     </property>
>    
>     <property>
>         
> <name>oozie.service.LiteWorkflowStoreService.user.retry.error.code</name>
>         <value>
>             JA017
>         </value>
>         <description>
>             Automatic retry interval for workflow action is handled for 
> these specified error code.
>         </description>
>     </property>
>    
>     <property>
>         
> <name>oozie.service.LiteWorkflowStoreService.user.retry.error.code.ext</name>
>         <value> </value>
>         <description>
>             Automatic retry interval for workflow action is handled for 
> these specified extra error code.
>         </description>
> </property>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira