You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@whirr.apache.org by "Krishna Sankar (JIRA)" <ji...@apache.org> on 2010/10/28 04:35:20 UTC

[jira] Created: (WHIRR-119) Job Submission and dynamic provisioning framework for Hadoop Clouds

Job Submission and dynamic provisioning framework for Hadoop Clouds
-------------------------------------------------------------------

                 Key: WHIRR-119
                 URL: https://issues.apache.org/jira/browse/WHIRR-119
             Project: Whirr
          Issue Type: New Feature
          Components: core
    Affects Versions: 0.2.0
            Reporter: Krishna Sankar


A thin framework that can submit a MR job, run it and report results. Some thoughts:
# Most probably it will be a server-side daemon 
# JSON over HTTP with REST semantics
# Functions - top level preliminary
## Accept a job and it's components at a well known URL
## Parse & create MR workflow
## Create & store a job context - ID, security artifacts et al
## Return a status URL (can be used to query status or kill the job) This is the REST model
## Run the job (might include dynamic elastic cloud provisioning for example OpenStack)
## As the job runs, collect and store in the job context
## If client queries return status
## Once job is done, store status and return results (most probably pointers to files and so forth)
## Calculate & store performance metrics
## Calculate & store charge back in generic units (eg: CPU,Memory,Network,storage
## As and when the client asks, return job results
# Some thoughts on implementation
## Store context et al in HBase
## A Clojure implementation ?
## Packaging like OVF ? (with embedded pointers to VM, data and so forth)
## For 1st release assume a homogeneous Hadoop infrastructure in a cloud
## Customer reporter/context counters?
## Distributed cache for framework artifacts and run time monitoring ?
## Most probably might have to use taskrunner ?
## Extend classes with submission framework setup and teardown code ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (WHIRR-119) Job Submission and dynamic provisioning framework for Hadoop Clouds

Posted by "Adrian Cole (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/WHIRR-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925681#action_12925681 ] 

Adrian Cole commented on WHIRR-119:
-----------------------------------

Looks very compelling, and would be really useful across clouds.

That said, I think we shouldn't skip first base on the way to the home run.  I would like to see this as a library before becoming a service.  In other words, we can start implementing this as pojos for in-vm execution.  State of jobs, billing, etc. could be held in an arbitrary blobstore, dissolving the need to stand-up rest services or even leave the jvm in simple case.

A job management platform could also be built out of these core pieces (or similar ones) and be deployed as a virtual appliance as you mention.

> Job Submission and dynamic provisioning framework for Hadoop Clouds
> -------------------------------------------------------------------
>
>                 Key: WHIRR-119
>                 URL: https://issues.apache.org/jira/browse/WHIRR-119
>             Project: Whirr
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 0.2.0
>            Reporter: Krishna Sankar
>
> A thin framework that can submit a MR job, run it and report results. Some thoughts:
> # Most probably it will be a server-side daemon 
> # JSON over HTTP with REST semantics
> # Functions - top level preliminary
> ## Accept a job and it's components at a well known URL
> ## Parse & create MR workflow
> ## Create & store a job context - ID, security artifacts et al
> ## Return a status URL (can be used to query status or kill the job) This is the REST model
> ## Run the job (might include dynamic elastic cloud provisioning for example OpenStack)
> ## As the job runs, collect and store in the job context
> ## If client queries return status
> ## Once job is done, store status and return results (most probably pointers to files and so forth)
> ## Calculate & store performance metrics
> ## Calculate & store charge back in generic units (eg: CPU,Memory,Network,storage
> ## As and when the client asks, return job results
> # Some thoughts on implementation
> ## Store context et al in HBase
> ## A Clojure implementation ?
> ## Packaging like OVF ? (with embedded pointers to VM, data and so forth)
> ## For 1st release assume a homogeneous Hadoop infrastructure in a cloud
> ## Customer reporter/context counters?
> ## Distributed cache for framework artifacts and run time monitoring ?
> ## Most probably might have to use taskrunner ?
> ## Extend classes with submission framework setup and teardown code ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (WHIRR-119) Job Submission and dynamic provisioning framework for Hadoop Clouds

Posted by Krishna Sankar <ks...@gmail.com>.

Agreed. A first cut can be integrating Oozie and adding the gap
functionalities. If this works fine, that would be fine ... AM sure we can
also add to Ozzie and required functionalities ....
If there are no volunteers, let me start digging through Ozzie and propose a
framework through the Orchestration-Automation-Provisioning-Configuration
layers ...

Cheers
<k/>


On 10/29/10 Fri Oct 29, 10, "Tom White (JIRA)" <ji...@apache.org> wrote:

> 
>     [ 
> https://issues.apache.org/jira/browse/WHIRR-119?page=com.atlassian.jira.plugin
> .system.issuetabpanels:comment-tabpanel&focusedCommentId=12926400#action_12926
> 400 ] 
> 
> Tom White commented on WHIRR-119:
> ---------------------------------
> 
> I think Oozie (http://yahoo.github.com/oozie/) provides a lot of what you
> describe. It would be great to have Whirr able to run an Oozie service that
> integrates with the Hadoop service.
> 
>> Job Submission and dynamic provisioning framework for Hadoop Clouds
>> -------------------------------------------------------------------
>> 
>>                 Key: WHIRR-119
>>                 URL: https://issues.apache.org/jira/browse/WHIRR-119
>>             Project: Whirr
>>          Issue Type: New Feature
>>          Components: core
>>    Affects Versions: 0.2.0
>>            Reporter: Krishna Sankar
>> 
>> A thin framework that can submit a MR job, run it and report results. Some
>> thoughts:
>> # Most probably it will be a server-side daemon
>> # JSON over HTTP with REST semantics
>> # Functions - top level preliminary
>> ## Accept a job and it's components at a well known URL
>> ## Parse & create MR workflow
>> ## Create & store a job context - ID, security artifacts et al
>> ## Return a status URL (can be used to query status or kill the job) This is
>> the REST model
>> ## Run the job (might include dynamic elastic cloud provisioning for example
>> OpenStack)
>> ## As the job runs, collect and store in the job context
>> ## If client queries return status
>> ## Once job is done, store status and return results (most probably pointers
>> to files and so forth)
>> ## Calculate & store performance metrics
>> ## Calculate & store charge back in generic units (eg:
>> CPU,Memory,Network,storage
>> ## As and when the client asks, return job results
>> # Some thoughts on implementation
>> ## Store context et al in HBase
>> ## A Clojure implementation ?
>> ## Packaging like OVF ? (with embedded pointers to VM, data and so forth)
>> ## For 1st release assume a homogeneous Hadoop infrastructure in a cloud
>> ## Customer reporter/context counters?
>> ## Distributed cache for framework artifacts and run time monitoring ?
>> ## Most probably might have to use taskrunner ?
>> ## Extend classes with submission framework setup and teardown code ?

[jira] Commented: (WHIRR-119) Job Submission and dynamic provisioning framework for Hadoop Clouds

Posted by "Tom White (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/WHIRR-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926400#action_12926400 ] 

Tom White commented on WHIRR-119:
---------------------------------

I think Oozie (http://yahoo.github.com/oozie/) provides a lot of what you describe. It would be great to have Whirr able to run an Oozie service that integrates with the Hadoop service.

> Job Submission and dynamic provisioning framework for Hadoop Clouds
> -------------------------------------------------------------------
>
>                 Key: WHIRR-119
>                 URL: https://issues.apache.org/jira/browse/WHIRR-119
>             Project: Whirr
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 0.2.0
>            Reporter: Krishna Sankar
>
> A thin framework that can submit a MR job, run it and report results. Some thoughts:
> # Most probably it will be a server-side daemon 
> # JSON over HTTP with REST semantics
> # Functions - top level preliminary
> ## Accept a job and it's components at a well known URL
> ## Parse & create MR workflow
> ## Create & store a job context - ID, security artifacts et al
> ## Return a status URL (can be used to query status or kill the job) This is the REST model
> ## Run the job (might include dynamic elastic cloud provisioning for example OpenStack)
> ## As the job runs, collect and store in the job context
> ## If client queries return status
> ## Once job is done, store status and return results (most probably pointers to files and so forth)
> ## Calculate & store performance metrics
> ## Calculate & store charge back in generic units (eg: CPU,Memory,Network,storage
> ## As and when the client asks, return job results
> # Some thoughts on implementation
> ## Store context et al in HBase
> ## A Clojure implementation ?
> ## Packaging like OVF ? (with embedded pointers to VM, data and so forth)
> ## For 1st release assume a homogeneous Hadoop infrastructure in a cloud
> ## Customer reporter/context counters?
> ## Distributed cache for framework artifacts and run time monitoring ?
> ## Most probably might have to use taskrunner ?
> ## Extend classes with submission framework setup and teardown code ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.