You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2011/09/08 06:27:09 UTC
[jira] [Commented] (OOZIE-89) GH-49: Supporting bundle in oozie
[ https://issues.apache.org/jira/browse/OOZIE-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099787#comment-13099787 ]
Hadoop QA commented on OOZIE-89:
--------------------------------
tucu00 remarked:
It sounds 100% reasonable.
However, before focusing on this new feature, I think we should fix the coordinator engine to make it stable. Plus, there are many usability and functional issues with coordinator applications. To name the most important ones:
* webconsole and CLI don't provide meaningful information
* errors messages are not clear
* status of coordinator jobs and coordinator actions is not consistent complete
* rerun of coordinator jobs is not fully implemented (or at least not documented)
* using time ranges instead of frequency ranges would be much more natural to users
* database purging logic for coordinator data is broken
* coordinator commands update ALWAYS all bean properties even if they don't change, creating huge database logs
I strongly vote for making current functionality solid before adding new functionality.
> GH-49: Supporting bundle in oozie
> ---------------------------------
>
> Key: OOZIE-89
> URL: https://issues.apache.org/jira/browse/OOZIE-89
> Project: Oozie
> Issue Type: Bug
> Reporter: Hadoop QA
>
> Oozie currently has two level of abstractions:
> 1. Workflow that execute DAG of actions.
> 2. Coordinator that executes workflow periodically when the specified set of data directories are available.
> This issue proposes another abstraction called 'bundle' that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level.
> ******* The proposed high-level requirements to support bundle are enumerated below:
> 1. This feature will allow user to specify a list of coordinator applications in XML file format.
> 2. The name of the bundle xml file is not hard-coded. User can specify any name as bundle file.
> 3. User will submit a bundle by specifying the bundle application path in config file . An example command is: oozie job -run -config <bundle.properties>
> 4. Bundle application path is defined in config file as property "oozie.application.bundle.path" with a value of full path to bundle xml in the hdfs.
> 5. User can also submit a bundle job through WS API.
> 7. User will be able to define variables /parameters for each coordinator application.
> 8. All variables should be resolved during job submission. For any resolved variable, oozie will throw an Exception.
> 9. User will be able to submit a bundle with an user-defined external id to avoid duplicate submissions in case of Timeout in first submission.
> 10. Oozie will not support any explicit dependencies among the coordinator XML in bundle definition.
> 11. Oozie will not support any partial bundle submission.
> 12. When user will submit a bundle , it will get a bundle id to track. Oozie will put the bundle job into PREP state.
> 13. User will be able to start a bundle using bundle id. It will put the bundle job into RUNNING state.
> 14. User will be able to combine submit and start into run that will start the bundle immediately.
> 15. User will be able to optionally specify the kick-off time to determine when to start a bundle. The bundle will not run until kick-off time reached.
> 16. User will be able to query Oozie for its status through CLI and WS API.
> 17. User will be able to query Oozie for all coordinator jobs that it started through CLI and WS API.
> 18. User will be able to kill a bundle id that will kill all spawned coordinator jobs.
> 19. User will be able to suspend a bundle id that will suspend all spawned coordinator jobs.
> 20. User will be able to pause a bundle id with a future time that will pause all spawned coordinator jobs.
> 21. User will be able to resume a bundle id that will resume all spawned coordinator jobs.
> 22. Bundle rerun requirements TBD.
> This is a sample bundle XML :
> =========================
> <bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
>
> <controls>
> <kick-off-time>2009-02-02T00:00Z</kick-off-time>
> </controls>
> <coordinator>
> <configuration>
> <property>
> <name>START_TIME</name>
> <value>2009-02-01T00:00Z</value>
> <property>
> .................
> ...............
> </configuration>
> <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
> <coordinator>
> <coordinator>
> <configuration>
> <property>
> <name>END_TIME</name>
> <value>2010-02-01T00:00Z</value>
> <property>
> .................
> ...............
> </configuration>
> <app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
> <coordinator>
> </bundle-app>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira