You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@whirr.apache.org by "Adrian Cole (JIRA)" <ji...@apache.org> on 2010/10/28 06:07:21 UTC
[jira] Commented: (WHIRR-119) Job Submission and dynamic provisioning framework for Hadoop Clouds

    [ https://issues.apache.org/jira/browse/WHIRR-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925681#action_12925681 ] 

Adrian Cole commented on WHIRR-119:
-----------------------------------

Looks very compelling, and would be really useful across clouds.

That said, I think we shouldn't skip first base on the way to the home run.  I would like to see this as a library before becoming a service.  In other words, we can start implementing this as pojos for in-vm execution.  State of jobs, billing, etc. could be held in an arbitrary blobstore, dissolving the need to stand-up rest services or even leave the jvm in simple case.

A job management platform could also be built out of these core pieces (or similar ones) and be deployed as a virtual appliance as you mention.

> Job Submission and dynamic provisioning framework for Hadoop Clouds
> -------------------------------------------------------------------
>
>                 Key: WHIRR-119
>                 URL: https://issues.apache.org/jira/browse/WHIRR-119
>             Project: Whirr
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 0.2.0
>            Reporter: Krishna Sankar
>
> A thin framework that can submit a MR job, run it and report results. Some thoughts:
> # Most probably it will be a server-side daemon 
> # JSON over HTTP with REST semantics
> # Functions - top level preliminary
> ## Accept a job and it's components at a well known URL
> ## Parse & create MR workflow
> ## Create & store a job context - ID, security artifacts et al
> ## Return a status URL (can be used to query status or kill the job) This is the REST model
> ## Run the job (might include dynamic elastic cloud provisioning for example OpenStack)
> ## As the job runs, collect and store in the job context
> ## If client queries return status
> ## Once job is done, store status and return results (most probably pointers to files and so forth)
> ## Calculate & store performance metrics
> ## Calculate & store charge back in generic units (eg: CPU,Memory,Network,storage
> ## As and when the client asks, return job results
> # Some thoughts on implementation
> ## Store context et al in HBase
> ## A Clojure implementation ?
> ## Packaging like OVF ? (with embedded pointers to VM, data and so forth)
> ## For 1st release assume a homogeneous Hadoop infrastructure in a cloud
> ## Customer reporter/context counters?
> ## Distributed cache for framework artifacts and run time monitoring ?
> ## Most probably might have to use taskrunner ?
> ## Extend classes with submission framework setup and teardown code ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.