You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2012/09/12 19:16:07 UTC

[jira] [Commented] (OOZIE-983) [Design] Automatic Oozie application deployment using WebHDFS

    [ https://issues.apache.org/jira/browse/OOZIE-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454136#comment-13454136 ] 

Alejandro Abdelnur commented on OOZIE-983:
------------------------------------------

In the original design proposal of Oozie we had the concept of self contained Oozie applications, OAR files. This OAR files were to be deployed(uploaded) via Oozie.

This mechanism was discarded (during an arch review) in favor of just using expanded HDFS directories without any oozie involvement. The benefits of keeping Oozie outside of the deployment loop are:

* Oozie it is not involved in copying/transferring bits from client to HDFS (several MBs per app)
* Oozie it is not involved in versioning applications
* Oozie does not have to worry about a running application being overriden (it is the app dev responsibility)
* Each team defines how they manage different versions of their deployed applications based on their needs

Now, if the issue is that a user must login to a machine that has access to cluster in order to copy his/her Oozie APP to HDFS, then we the problem can be easily solved as follow:

* Install an HttpFS server (webhdfs HTTP API compatibile) on the firewall
* Configure HttpFS server for Kerberos or custom authentication (HttpFS uses the same auth mechanism as Oozie, if you have a custom authenticationhandler for Oozie you can use it in HttpFS)
* Use *hadoop fs -fs webhdfs://<HTTPFS_HOST>* to copy/delete/move Oozie APPs in HDFS.

Note that if you are planing to use the webhdfs service of the cluster (instead standalone HttpFS), your clients MUST see (over HTTP) ALL nodes of the cluster.

IMO, this mechanism gives today's flexibility, all over HTTP (no RPC accesss required), you can use your corporate authentication mechanism for this, and you keep Oozie outside of the app deployment/versioning of applications business (which is not trivial and different teams do it differently)


                
> [Design] Automatic Oozie application deployment using WebHDFS
> -------------------------------------------------------------
>
>                 Key: OOZIE-983
>                 URL: https://issues.apache.org/jira/browse/OOZIE-983
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Mohammad Kamrul Islam
>            Assignee: Mona Chitnis
>
> Problem:
> 1. A user can't upload the oozie application from his dev box. User needs to access to a specialized box (such as gateway) to run those hadoop commands. It is inconvenient which requires to follow multiple steps and restrictions.
> 2. Automatic Oozie application versioning. If a user wants to deploy a new version of Oozie application, he needs to run multiple commands. In addition, there is no standard for this.
> Proposal:
> 1. Oozie will provide a tool that will automatically deploy the application and maintained a rigid version mechanism.
> 2. It could be a new script (e.g. oozie-deply) or it can extend the existing oozie command (e.g. oozie  -deply....."). TBD
> 3. The new script will get the necessary information to launch a WebHDFS command from the user and upload the necessary files. It includes: WebHDFS end point, security token (for secured version), local application directory and remote application base path.
> 4. Using the appropriate WebHDFS REST API, the tool will deploy the application.  User can choose whether to override an existing application path. 
> 5. User can ask to upload a new version of application. The new version could be user provided or auto created by the script. For auto version selection, oozie tools will check the existing application path with pattern "v?". Then select the new version number.
> 6. For uploading a new application version, the oozie tool will first upload the application and then kill the old job (How to get the old job id?). At last, submit the new application. 
> Open question:
> 1. How to pass the kerberos token? Specially from a dev box.
> 2. Who will determine the new version? user or automatic?
> Other key points:
> 1. Only supported for Hadoop 1.0.2+ 
> 2. Need to use/develop some wrapper tools which can hide most of the WebHDFS details. There are already two such tools:  a) for python :  https://github.com/drelu/webhdfs-py b) for Ruby,  https://github.com/zenja/webhdfs-ruby. At this point the options are: 
>   * Write a new Java wrapper class.
>   * Write a new wrapper tool using pure shell commands.
>   * Reuse python or Ruby libraries.
> Overall, we need to do it correctly from the beginning. The comments from others are highly appreciated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira