You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2011/09/08 06:27:09 UTC

[jira] [Created] (OOZIE-89) GH-49: Supporting bundle in oozie

GH-49: Supporting bundle in oozie
---------------------------------

                 Key: OOZIE-89
                 URL: https://issues.apache.org/jira/browse/OOZIE-89
             Project: Oozie
          Issue Type: Bug
            Reporter: Hadoop QA


Oozie currently has two level of abstractions:
1. Workflow that execute DAG of actions.
2. Coordinator that executes workflow periodically when the specified set of data directories are available.

This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.

******* The proposed high-level requirements to support bundle are enumerated below:

1. This feature will allow user to specify a list of coordinator applications in XML file format.

2. The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.

3. User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>

4. Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.

5. User can also submit a bundle job through WS API.

7. User will be able to define variables /parameters for each coordinator application.

8. All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.

9. User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.

10. Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.

11. Oozie will not support any partial bundle submission.

12. When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.

13. User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.

14. User will be able to combine submit and start into run that will start the bundle immediately.

15. User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.

16. User will be able to query Oozie for its status through CLI and WS API.

17. User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.

18. User will be able to kill a bundle id that will kill all spawned coordinator jobs.

19. User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.

20. User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.

21. User will be able to resume a bundle id that will resume all spawned coordinator jobs.

22. Bundle rerun requirements TBD. 

This is a sample bundle XML :
=========================

    <bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
     
      <controls>
           <kick-off-time>2009-02-02T00:00Z</kick-off-time>
      </controls>

       <coordinator>
           <configuration>
             <property>
                  <name>START_TIME</name>
                  <value>2009-02-01T00:00Z</value>
              <property>
              .................
              ...............
          </configuration>
          <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
       <coordinator>

       <coordinator>
           <configuration>
             <property>
                  <name>END_TIME</name>
                  <value>2010-02-01T00:00Z</value>
              <property>
              .................
              ...............
          </configuration>
          <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
       <coordinator>          
    </bundle-app>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-89) GH-49: Supporting bundle in oozie

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101885#comment-13101885 ] 

Hadoop QA commented on OOZIE-89:
--------------------------------

mislam77 remarked:
closing

> GH-49: Supporting bundle in oozie
> ---------------------------------
>
>                 Key: OOZIE-89
>                 URL: https://issues.apache.org/jira/browse/OOZIE-89
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Oozie currently has two level of abstractions:
> 1. Workflow that execute DAG of actions.
> 2. Coordinator that executes workflow periodically when the specified set of data directories are available.
> This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.
> ******* The proposed high-level requirements to support bundle are enumerated below:
> 1. This feature will allow user to specify a list of coordinator applications in XML file format.
> 2. The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.
> 3. User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>
> 4. Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.
> 5. User can also submit a bundle job through WS API.
> 7. User will be able to define variables /parameters for each coordinator application.
> 8. All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.
> 9. User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.
> 10. Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.
> 11. Oozie will not support any partial bundle submission.
> 12. When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.
> 13. User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.
> 14. User will be able to combine submit and start into run that will start the bundle immediately.
> 15. User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.
> 16. User will be able to query Oozie for its status through CLI and WS API.
> 17. User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.
> 18. User will be able to kill a bundle id that will kill all spawned coordinator jobs.
> 19. User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.
> 20. User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.
> 21. User will be able to resume a bundle id that will resume all spawned coordinator jobs.
> 22. Bundle rerun requirements TBD. 
> This is a sample bundle XML :
> =========================
>     <bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
>      
>       <controls>
>            <kick-off-time>2009-02-02T00:00Z</kick-off-time>
>       </controls>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>START_TIME</name>
>                   <value>2009-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>END_TIME</name>
>                   <value>2010-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>          
>     </bundle-app>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-89) GH-49: Supporting bundle in oozie

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101883#comment-13101883 ] 

Hadoop QA commented on OOZIE-89:
--------------------------------

mislam77 remarked:
agree.
Is there any open issue with this requirement other than naming of kick-off/ start time?

> GH-49: Supporting bundle in oozie
> ---------------------------------
>
>                 Key: OOZIE-89
>                 URL: https://issues.apache.org/jira/browse/OOZIE-89
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Oozie currently has two level of abstractions:
> 1. Workflow that execute DAG of actions.
> 2. Coordinator that executes workflow periodically when the specified set of data directories are available.
> This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.
> ******* The proposed high-level requirements to support bundle are enumerated below:
> 1. This feature will allow user to specify a list of coordinator applications in XML file format.
> 2. The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.
> 3. User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>
> 4. Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.
> 5. User can also submit a bundle job through WS API.
> 7. User will be able to define variables /parameters for each coordinator application.
> 8. All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.
> 9. User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.
> 10. Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.
> 11. Oozie will not support any partial bundle submission.
> 12. When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.
> 13. User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.
> 14. User will be able to combine submit and start into run that will start the bundle immediately.
> 15. User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.
> 16. User will be able to query Oozie for its status through CLI and WS API.
> 17. User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.
> 18. User will be able to kill a bundle id that will kill all spawned coordinator jobs.
> 19. User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.
> 20. User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.
> 21. User will be able to resume a bundle id that will resume all spawned coordinator jobs.
> 22. Bundle rerun requirements TBD. 
> This is a sample bundle XML :
> =========================
>     <bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
>      
>       <controls>
>            <kick-off-time>2009-02-02T00:00Z</kick-off-time>
>       </controls>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>START_TIME</name>
>                   <value>2009-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>END_TIME</name>
>                   <value>2010-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>          
>     </bundle-app>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-89) GH-49: Supporting bundle in oozie

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101882#comment-13101882 ] 

Hadoop QA commented on OOZIE-89:
--------------------------------

tucu00 remarked:
I do 3 different issues, one for each type of job. they should all target the same release.

There are other tweaks I have in mind for WF XML, I'll open an issue for it.

> GH-49: Supporting bundle in oozie
> ---------------------------------
>
>                 Key: OOZIE-89
>                 URL: https://issues.apache.org/jira/browse/OOZIE-89
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Oozie currently has two level of abstractions:
> 1. Workflow that execute DAG of actions.
> 2. Coordinator that executes workflow periodically when the specified set of data directories are available.
> This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.
> ******* The proposed high-level requirements to support bundle are enumerated below:
> 1. This feature will allow user to specify a list of coordinator applications in XML file format.
> 2. The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.
> 3. User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>
> 4. Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.
> 5. User can also submit a bundle job through WS API.
> 7. User will be able to define variables /parameters for each coordinator application.
> 8. All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.
> 9. User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.
> 10. Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.
> 11. Oozie will not support any partial bundle submission.
> 12. When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.
> 13. User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.
> 14. User will be able to combine submit and start into run that will start the bundle immediately.
> 15. User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.
> 16. User will be able to query Oozie for its status through CLI and WS API.
> 17. User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.
> 18. User will be able to kill a bundle id that will kill all spawned coordinator jobs.
> 19. User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.
> 20. User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.
> 21. User will be able to resume a bundle id that will resume all spawned coordinator jobs.
> 22. Bundle rerun requirements TBD. 
> This is a sample bundle XML :
> =========================
>     <bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
>      
>       <controls>
>            <kick-off-time>2009-02-02T00:00Z</kick-off-time>
>       </controls>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>START_TIME</name>
>                   <value>2009-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>END_TIME</name>
>                   <value>2010-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>          
>     </bundle-app>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-89) GH-49: Supporting bundle in oozie

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101879#comment-13101879 ] 

Hadoop QA commented on OOZIE-89:
--------------------------------

tucu00 remarked:
#7 typo, I think you meant 'For any UNresolved variable'

#9, don't understand this one

On the XML elements:

* controls: 

I'm OK with this element as a placeholder for future commands.

* kick-off-time: 

Given that all coordinator jobs have their own start/end time, why do we need a bundle start/kick-off-time?

And if we need it, i'd rather call it 'start'

* Formal parameters definition with inline default values

[This is something I'm thinking for the next revamp of the COORD/WF XML schemas as well]

Enforcing formal parameters a comprehensive error checking can be done at submission time for EL expressions through out the application XML definition.

  <bundle-app>
  <parameters>
    <!-- with no default value -->
    <property>
      <name>input</name>
    </property>
    <!-- with default value -->
    <property>
      <name>output</name> 
      <value>${input}.out</name>
    </property>
    ...
  </parameters>
  ...
  </bundle-app>

* Application paths

Should we enforce absolute path with no scheme://host:port ?

And have a <global> section where the name-node is defined?

[Again, this is something I'm thinking for the next revamp of the COORD/WF XML schemas as well]

  <workflow-app>
  <parameters>
    ...
  </parameters>
  <global>
    <name-node>hdfs://foo:8020</name-node>
    ...
  </global>
  ...
  </workflow-app>

* Global properties

[Again, this is something I'm thinking for the next revamp of the WF XML schema as well]

Currently each action node must define its JT/NN and common configuration values. Adding a global section would avoid repeating these values over and over, reducing the length of workflow applications and making easier to modify them in a singlle place.

  <workflow-app>
  <parameters>
    ...
  </parameters>
  <global>
    <job-tracker>bar:8010</job-tracker>
    <name-node>hdfs://foo:8020</name-node>
    <configuration>
      <property>
        <name>mapred.queue.name</name>
        <value>elt-queue</name>
      </property>
      ...
    </configuration>
  </global>
  ...
  </workflow-app>

> GH-49: Supporting bundle in oozie
> ---------------------------------
>
>                 Key: OOZIE-89
>                 URL: https://issues.apache.org/jira/browse/OOZIE-89
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Oozie currently has two level of abstractions:
> 1. Workflow that execute DAG of actions.
> 2. Coordinator that executes workflow periodically when the specified set of data directories are available.
> This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.
> ******* The proposed high-level requirements to support bundle are enumerated below:
> 1. This feature will allow user to specify a list of coordinator applications in XML file format.
> 2. The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.
> 3. User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>
> 4. Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.
> 5. User can also submit a bundle job through WS API.
> 7. User will be able to define variables /parameters for each coordinator application.
> 8. All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.
> 9. User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.
> 10. Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.
> 11. Oozie will not support any partial bundle submission.
> 12. When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.
> 13. User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.
> 14. User will be able to combine submit and start into run that will start the bundle immediately.
> 15. User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.
> 16. User will be able to query Oozie for its status through CLI and WS API.
> 17. User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.
> 18. User will be able to kill a bundle id that will kill all spawned coordinator jobs.
> 19. User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.
> 20. User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.
> 21. User will be able to resume a bundle id that will resume all spawned coordinator jobs.
> 22. Bundle rerun requirements TBD. 
> This is a sample bundle XML :
> =========================
>     <bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
>      
>       <controls>
>            <kick-off-time>2009-02-02T00:00Z</kick-off-time>
>       </controls>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>START_TIME</name>
>                   <value>2009-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>END_TIME</name>
>                   <value>2010-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>          
>     </bundle-app>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-89) GH-49: Supporting bundle in oozie

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099787#comment-13099787 ] 

Hadoop QA commented on OOZIE-89:
--------------------------------

tucu00 remarked:
It sounds 100% reasonable.

However, before focusing on this new feature, I think we should fix the coordinator engine to make it stable. Plus, there are many usability and functional issues with coordinator applications. To name the most important ones:

* webconsole and CLI don't provide meaningful information
* errors messages are not clear
* status of coordinator jobs and coordinator actions is not consistent complete
* rerun of coordinator jobs is not fully implemented (or at least not documented)
* using time ranges instead of frequency ranges would be much more natural to users
* database purging logic for coordinator data is broken
* coordinator commands update ALWAYS all bean properties even if they don't change, creating huge database logs

I strongly vote for making current functionality solid before adding new functionality.

> GH-49: Supporting bundle in oozie
> ---------------------------------
>
>                 Key: OOZIE-89
>                 URL: https://issues.apache.org/jira/browse/OOZIE-89
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Oozie currently has two level of abstractions:
> 1. Workflow that execute DAG of actions.
> 2. Coordinator that executes workflow periodically when the specified set of data directories are available.
> This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.
> ******* The proposed high-level requirements to support bundle are enumerated below:
> 1. This feature will allow user to specify a list of coordinator applications in XML file format.
> 2. The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.
> 3. User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>
> 4. Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.
> 5. User can also submit a bundle job through WS API.
> 7. User will be able to define variables /parameters for each coordinator application.
> 8. All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.
> 9. User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.
> 10. Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.
> 11. Oozie will not support any partial bundle submission.
> 12. When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.
> 13. User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.
> 14. User will be able to combine submit and start into run that will start the bundle immediately.
> 15. User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.
> 16. User will be able to query Oozie for its status through CLI and WS API.
> 17. User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.
> 18. User will be able to kill a bundle id that will kill all spawned coordinator jobs.
> 19. User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.
> 20. User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.
> 21. User will be able to resume a bundle id that will resume all spawned coordinator jobs.
> 22. Bundle rerun requirements TBD. 
> This is a sample bundle XML :
> =========================
>     <bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
>      
>       <controls>
>            <kick-off-time>2009-02-02T00:00Z</kick-off-time>
>       </controls>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>START_TIME</name>
>                   <value>2009-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>END_TIME</name>
>                   <value>2010-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>          
>     </bundle-app>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-89) GH-49: Supporting bundle in oozie

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101880#comment-13101880 ] 

Hadoop QA commented on OOZIE-89:
--------------------------------

mislam77 remarked:
Yes 7 is typo.

In 9, I meant that there is not dependency (explicit) between the coordinator application referred in the bundle.xml.  In other words, user should not expect that after coordinator 1 finishes coordinator 2 should start. This could be done implicitly through data dependency. Since some user asked the question, I mention it explicitly.

It is not necessarily true that bundle start/kick-off-time should be the same as coord start-time. OPS might want a know when to process their application.
I will discuss about naming kick-off-time to start-time at bundle level.

I will also discuss  the formal parameter and global section within our team. However, last time , I heard why do we need global section when there is a workaround to do the same. 

App-path: When there will be multiple namenodes support, how will it work?

> GH-49: Supporting bundle in oozie
> ---------------------------------
>
>                 Key: OOZIE-89
>                 URL: https://issues.apache.org/jira/browse/OOZIE-89
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Oozie currently has two level of abstractions:
> 1. Workflow that execute DAG of actions.
> 2. Coordinator that executes workflow periodically when the specified set of data directories are available.
> This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.
> ******* The proposed high-level requirements to support bundle are enumerated below:
> 1. This feature will allow user to specify a list of coordinator applications in XML file format.
> 2. The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.
> 3. User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>
> 4. Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.
> 5. User can also submit a bundle job through WS API.
> 7. User will be able to define variables /parameters for each coordinator application.
> 8. All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.
> 9. User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.
> 10. Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.
> 11. Oozie will not support any partial bundle submission.
> 12. When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.
> 13. User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.
> 14. User will be able to combine submit and start into run that will start the bundle immediately.
> 15. User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.
> 16. User will be able to query Oozie for its status through CLI and WS API.
> 17. User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.
> 18. User will be able to kill a bundle id that will kill all spawned coordinator jobs.
> 19. User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.
> 20. User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.
> 21. User will be able to resume a bundle id that will resume all spawned coordinator jobs.
> 22. Bundle rerun requirements TBD. 
> This is a sample bundle XML :
> =========================
>     <bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
>      
>       <controls>
>            <kick-off-time>2009-02-02T00:00Z</kick-off-time>
>       </controls>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>START_TIME</name>
>                   <value>2009-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>END_TIME</name>
>                   <value>2010-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>          
>     </bundle-app>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (OOZIE-89) GH-49: Supporting bundle in oozie

Posted by "Roman Shaposhnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roman Shaposhnik closed OOZIE-89.
---------------------------------

    Resolution: Fixed

> GH-49: Supporting bundle in oozie
> ---------------------------------
>
>                 Key: OOZIE-89
>                 URL: https://issues.apache.org/jira/browse/OOZIE-89
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Oozie currently has two level of abstractions:
> 1. Workflow that execute DAG of actions.
> 2. Coordinator that executes workflow periodically when the specified set of data directories are available.
> This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.
> ******* The proposed high-level requirements to support bundle are enumerated below:
> 1. This feature will allow user to specify a list of coordinator applications in XML file format.
> 2. The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.
> 3. User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>
> 4. Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.
> 5. User can also submit a bundle job through WS API.
> 7. User will be able to define variables /parameters for each coordinator application.
> 8. All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.
> 9. User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.
> 10. Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.
> 11. Oozie will not support any partial bundle submission.
> 12. When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.
> 13. User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.
> 14. User will be able to combine submit and start into run that will start the bundle immediately.
> 15. User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.
> 16. User will be able to query Oozie for its status through CLI and WS API.
> 17. User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.
> 18. User will be able to kill a bundle id that will kill all spawned coordinator jobs.
> 19. User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.
> 20. User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.
> 21. User will be able to resume a bundle id that will resume all spawned coordinator jobs.
> 22. Bundle rerun requirements TBD. 
> This is a sample bundle XML :
> =========================
>     <bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
>      
>       <controls>
>            <kick-off-time>2009-02-02T00:00Z</kick-off-time>
>       </controls>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>START_TIME</name>
>                   <value>2009-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>END_TIME</name>
>                   <value>2010-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>          
>     </bundle-app>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-89) GH-49: Supporting bundle in oozie

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101884#comment-13101884 ] 

Hadoop QA commented on OOZIE-89:
--------------------------------

tucu00 remarked:
it would be useful to get a complete picture of the lifecycle of bundles/coordinators/workflows and how job-management (submit/suspend/resume/rerun/kill/fail/etc) affects them.

> GH-49: Supporting bundle in oozie
> ---------------------------------
>
>                 Key: OOZIE-89
>                 URL: https://issues.apache.org/jira/browse/OOZIE-89
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Oozie currently has two level of abstractions:
> 1. Workflow that execute DAG of actions.
> 2. Coordinator that executes workflow periodically when the specified set of data directories are available.
> This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.
> ******* The proposed high-level requirements to support bundle are enumerated below:
> 1. This feature will allow user to specify a list of coordinator applications in XML file format.
> 2. The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.
> 3. User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>
> 4. Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.
> 5. User can also submit a bundle job through WS API.
> 7. User will be able to define variables /parameters for each coordinator application.
> 8. All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.
> 9. User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.
> 10. Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.
> 11. Oozie will not support any partial bundle submission.
> 12. When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.
> 13. User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.
> 14. User will be able to combine submit and start into run that will start the bundle immediately.
> 15. User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.
> 16. User will be able to query Oozie for its status through CLI and WS API.
> 17. User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.
> 18. User will be able to kill a bundle id that will kill all spawned coordinator jobs.
> 19. User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.
> 20. User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.
> 21. User will be able to resume a bundle id that will resume all spawned coordinator jobs.
> 22. Bundle rerun requirements TBD. 
> This is a sample bundle XML :
> =========================
>     <bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
>      
>       <controls>
>            <kick-off-time>2009-02-02T00:00Z</kick-off-time>
>       </controls>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>START_TIME</name>
>                   <value>2009-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>END_TIME</name>
>                   <value>2010-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>          
>     </bundle-app>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-89) GH-49: Supporting bundle in oozie

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101881#comment-13101881 ] 

Hadoop QA commented on OOZIE-89:
--------------------------------

mislam77 remarked:
I had an internal  discussion about global parameter and formal parameter.
We are conceptually agreed that it will be nice feature for each level bundle/coord/WF.
However there is different opinions of when to implement it. The most notable one is: to add this into all 3 levels at the same time. And we could do it after our first pass of bundle implementation. Otherwise , if we implement it only bundle level,  user will expect the same in WF/coord level too.

> GH-49: Supporting bundle in oozie
> ---------------------------------
>
>                 Key: OOZIE-89
>                 URL: https://issues.apache.org/jira/browse/OOZIE-89
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Hadoop QA
>
> Oozie currently has two level of abstractions:
> 1. Workflow that execute DAG of actions.
> 2. Coordinator that executes workflow periodically when the specified set of data directories are available.
> This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.
> ******* The proposed high-level requirements to support bundle are enumerated below:
> 1. This feature will allow user to specify a list of coordinator applications in XML file format.
> 2. The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.
> 3. User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>
> 4. Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.
> 5. User can also submit a bundle job through WS API.
> 7. User will be able to define variables /parameters for each coordinator application.
> 8. All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.
> 9. User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.
> 10. Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.
> 11. Oozie will not support any partial bundle submission.
> 12. When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.
> 13. User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.
> 14. User will be able to combine submit and start into run that will start the bundle immediately.
> 15. User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.
> 16. User will be able to query Oozie for its status through CLI and WS API.
> 17. User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.
> 18. User will be able to kill a bundle id that will kill all spawned coordinator jobs.
> 19. User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.
> 20. User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.
> 21. User will be able to resume a bundle id that will resume all spawned coordinator jobs.
> 22. Bundle rerun requirements TBD. 
> This is a sample bundle XML :
> =========================
>     <bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
>      
>       <controls>
>            <kick-off-time>2009-02-02T00:00Z</kick-off-time>
>       </controls>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>START_TIME</name>
>                   <value>2009-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>
>        <coordinator>
>            <configuration>
>              <property>
>                   <name>END_TIME</name>
>                   <value>2010-02-01T00:00Z</value>
>               <property>
>               .................
>               ...............
>           </configuration>
>           <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
>        <coordinator>          
>     </bundle-app>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira