You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@whirr.apache.org by David Alves <da...@gmail.com> on 2011/09/22 23:03:46 UTC

launching a one job cluster

Hi All
	
	I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
	Is there any "nice" way to do this, or do you have any suggestions?
	On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?

thanks
-david
	

Re: launching a one job cluster

Posted by Alex Heneveld <Al...@CloudsoftCorp.com>.
Hi David,

David, what type of job is it?  Can the shell scripts call you back when 
they're done or is there some out-of-band event that would have to be 
subscribed to?

I'm also interested in the wrapping mechanism.  You could write bash 
scripts to wrap whirr start/stap and watch for the trigger in between, 
but scraping the IP addresses and configuring credentials will get 
tedious.  +1 to something more elegant ... would a JVM language script 
be interesting?  Am thinking a simple management layer with embedded 
Whirr to start/stop but also able to monitor your processes in between, 
finding out about them programmatically via Whirr.

--A


On 22/09/2011 15:29, Andrei Savu wrote:
> Sorry for the confusion :) When I see job I think about Hadoop.
>
> For arbitrary scripts I think jclouds provides some ways of doing this
> as you already know. To make this process of checking if a script is
> running low latency I think you need some sort of server side daemon
> but I can't recommend one.
>
> -- Andrei
>
> On Thu, Sep 22, 2011 at 3:24 PM, David Alves<da...@gmail.com>  wrote:
>> As I said the thing is I'm NOT using hadoop :)
>> I'm just running generic scripts/ssh commands.
>>
>> -david
>>
>> On Sep 22, 2011, at 5:20 PM, Andrei Savu wrote:
>>
>>> I don't know that much about how to manage jobs in Hadoop using the
>>> API. Maybe Tom can provide a good answer to this. I completely
>>> understand the elegance part :)
>>>
>>> -- Andrei Savu
>>>
>>> On Thu, Sep 22, 2011 at 3:17 PM, David Alves<da...@gmail.com>  wrote:
>>>> First there is the question of accuracy, as I said I am collecting metrics that I'd like to be as accurate as possible.
>>>> Second there is the matter of elegance. I always like to avoid polls whenever possible.
>>>>
>>>> That being said, I don't wan't to embark in some odyssey just to avoid poll, so if it really is too much trouble I am ok with letting it go.
>>>> Anyhow even with poll is there something already implemented that enables it in generic cases?
>>>>
>>>> thanks
>>>> -david
>>>>
>>>> On Sep 22, 2011, at 5:09 PM, Andrei Savu wrote:
>>>>
>>>>> Why is so important to avoid having a poll? The cost is low and almost
>>>>> any job is running at least for a few minutes.
>>>>>
>>>>> -- Andrei
>>>>>
>>>>> On Thu, Sep 22, 2011 at 3:07 PM, David Alves<da...@gmail.com>  wrote:
>>>>>> Hi Andrei
>>>>>>
>>>>>>         I know…
>>>>>>         The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
>>>>>>         I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
>>>>>>         In particular as I'm collecting accurate metrics I'd like a non poll based technique.
>>>>>>         Even if there is none I can always try and code it, so all ideas are welcome.
>>>>>>
>>>>>> thanks
>>>>>> david
>>>>>>
>>>>>>
>>>>>> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>>>>>>
>>>>>>> This is exactly what the example code is doing (and the hadoop
>>>>>>> integration test). The job running code is blocking while the job is
>>>>>>> executing.
>>>>>>>
>>>>>>> -- Andrei Savu / andreisavu.ro
>>>>>>>
>>>>>>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves<da...@gmail.com>  wrote:
>>>>>>>> Hi All
>>>>>>>>
>>>>>>>>         I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>>>>>>>         Is there any "nice" way to do this, or do you have any suggestions?
>>>>>>>>         On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> -david
>>>>>>>>
>>>>>>
>>>>
>>
> .
>


Re: launching a one job cluster

Posted by Andrei Savu <sa...@gmail.com>.
Sorry for the confusion :) When I see job I think about Hadoop.

For arbitrary scripts I think jclouds provides some ways of doing this
as you already know. To make this process of checking if a script is
running low latency I think you need some sort of server side daemon
but I can't recommend one.

-- Andrei

On Thu, Sep 22, 2011 at 3:24 PM, David Alves <da...@gmail.com> wrote:
> As I said the thing is I'm NOT using hadoop :)
> I'm just running generic scripts/ssh commands.
>
> -david
>
> On Sep 22, 2011, at 5:20 PM, Andrei Savu wrote:
>
>> I don't know that much about how to manage jobs in Hadoop using the
>> API. Maybe Tom can provide a good answer to this. I completely
>> understand the elegance part :)
>>
>> -- Andrei Savu
>>
>> On Thu, Sep 22, 2011 at 3:17 PM, David Alves <da...@gmail.com> wrote:
>>> First there is the question of accuracy, as I said I am collecting metrics that I'd like to be as accurate as possible.
>>> Second there is the matter of elegance. I always like to avoid polls whenever possible.
>>>
>>> That being said, I don't wan't to embark in some odyssey just to avoid poll, so if it really is too much trouble I am ok with letting it go.
>>> Anyhow even with poll is there something already implemented that enables it in generic cases?
>>>
>>> thanks
>>> -david
>>>
>>> On Sep 22, 2011, at 5:09 PM, Andrei Savu wrote:
>>>
>>>> Why is so important to avoid having a poll? The cost is low and almost
>>>> any job is running at least for a few minutes.
>>>>
>>>> -- Andrei
>>>>
>>>> On Thu, Sep 22, 2011 at 3:07 PM, David Alves <da...@gmail.com> wrote:
>>>>> Hi Andrei
>>>>>
>>>>>        I know…
>>>>>        The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
>>>>>        I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
>>>>>        In particular as I'm collecting accurate metrics I'd like a non poll based technique.
>>>>>        Even if there is none I can always try and code it, so all ideas are welcome.
>>>>>
>>>>> thanks
>>>>> david
>>>>>
>>>>>
>>>>> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>>>>>
>>>>>> This is exactly what the example code is doing (and the hadoop
>>>>>> integration test). The job running code is blocking while the job is
>>>>>> executing.
>>>>>>
>>>>>> -- Andrei Savu / andreisavu.ro
>>>>>>
>>>>>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>>>>>>> Hi All
>>>>>>>
>>>>>>>        I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>>>>>>        Is there any "nice" way to do this, or do you have any suggestions?
>>>>>>>        On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>>>>>
>>>>>>> thanks
>>>>>>> -david
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

Re: launching a one job cluster

Posted by David Alves <da...@gmail.com>.
As I said the thing is I'm NOT using hadoop :)
I'm just running generic scripts/ssh commands.

-david

On Sep 22, 2011, at 5:20 PM, Andrei Savu wrote:

> I don't know that much about how to manage jobs in Hadoop using the
> API. Maybe Tom can provide a good answer to this. I completely
> understand the elegance part :)
> 
> -- Andrei Savu
> 
> On Thu, Sep 22, 2011 at 3:17 PM, David Alves <da...@gmail.com> wrote:
>> First there is the question of accuracy, as I said I am collecting metrics that I'd like to be as accurate as possible.
>> Second there is the matter of elegance. I always like to avoid polls whenever possible.
>> 
>> That being said, I don't wan't to embark in some odyssey just to avoid poll, so if it really is too much trouble I am ok with letting it go.
>> Anyhow even with poll is there something already implemented that enables it in generic cases?
>> 
>> thanks
>> -david
>> 
>> On Sep 22, 2011, at 5:09 PM, Andrei Savu wrote:
>> 
>>> Why is so important to avoid having a poll? The cost is low and almost
>>> any job is running at least for a few minutes.
>>> 
>>> -- Andrei
>>> 
>>> On Thu, Sep 22, 2011 at 3:07 PM, David Alves <da...@gmail.com> wrote:
>>>> Hi Andrei
>>>> 
>>>>        I know…
>>>>        The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
>>>>        I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
>>>>        In particular as I'm collecting accurate metrics I'd like a non poll based technique.
>>>>        Even if there is none I can always try and code it, so all ideas are welcome.
>>>> 
>>>> thanks
>>>> david
>>>> 
>>>> 
>>>> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>>>> 
>>>>> This is exactly what the example code is doing (and the hadoop
>>>>> integration test). The job running code is blocking while the job is
>>>>> executing.
>>>>> 
>>>>> -- Andrei Savu / andreisavu.ro
>>>>> 
>>>>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>>>>>> Hi All
>>>>>> 
>>>>>>        I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>>>>>        Is there any "nice" way to do this, or do you have any suggestions?
>>>>>>        On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>>>> 
>>>>>> thanks
>>>>>> -david
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: launching a one job cluster

Posted by Andrei Savu <sa...@gmail.com>.
I don't know that much about how to manage jobs in Hadoop using the
API. Maybe Tom can provide a good answer to this. I completely
understand the elegance part :)

-- Andrei Savu

On Thu, Sep 22, 2011 at 3:17 PM, David Alves <da...@gmail.com> wrote:
> First there is the question of accuracy, as I said I am collecting metrics that I'd like to be as accurate as possible.
> Second there is the matter of elegance. I always like to avoid polls whenever possible.
>
> That being said, I don't wan't to embark in some odyssey just to avoid poll, so if it really is too much trouble I am ok with letting it go.
> Anyhow even with poll is there something already implemented that enables it in generic cases?
>
> thanks
> -david
>
> On Sep 22, 2011, at 5:09 PM, Andrei Savu wrote:
>
>> Why is so important to avoid having a poll? The cost is low and almost
>> any job is running at least for a few minutes.
>>
>> -- Andrei
>>
>> On Thu, Sep 22, 2011 at 3:07 PM, David Alves <da...@gmail.com> wrote:
>>> Hi Andrei
>>>
>>>        I know…
>>>        The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
>>>        I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
>>>        In particular as I'm collecting accurate metrics I'd like a non poll based technique.
>>>        Even if there is none I can always try and code it, so all ideas are welcome.
>>>
>>> thanks
>>> david
>>>
>>>
>>> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>>>
>>>> This is exactly what the example code is doing (and the hadoop
>>>> integration test). The job running code is blocking while the job is
>>>> executing.
>>>>
>>>> -- Andrei Savu / andreisavu.ro
>>>>
>>>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>>>>> Hi All
>>>>>
>>>>>        I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>>>>        Is there any "nice" way to do this, or do you have any suggestions?
>>>>>        On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>>>
>>>>> thanks
>>>>> -david
>>>>>
>>>
>>>
>
>

Re: launching a one job cluster

Posted by David Alves <da...@gmail.com>.
First there is the question of accuracy, as I said I am collecting metrics that I'd like to be as accurate as possible.
Second there is the matter of elegance. I always like to avoid polls whenever possible.

That being said, I don't wan't to embark in some odyssey just to avoid poll, so if it really is too much trouble I am ok with letting it go.
Anyhow even with poll is there something already implemented that enables it in generic cases?

thanks
-david

On Sep 22, 2011, at 5:09 PM, Andrei Savu wrote:

> Why is so important to avoid having a poll? The cost is low and almost
> any job is running at least for a few minutes.
> 
> -- Andrei
> 
> On Thu, Sep 22, 2011 at 3:07 PM, David Alves <da...@gmail.com> wrote:
>> Hi Andrei
>> 
>>        I know…
>>        The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
>>        I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
>>        In particular as I'm collecting accurate metrics I'd like a non poll based technique.
>>        Even if there is none I can always try and code it, so all ideas are welcome.
>> 
>> thanks
>> david
>> 
>> 
>> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>> 
>>> This is exactly what the example code is doing (and the hadoop
>>> integration test). The job running code is blocking while the job is
>>> executing.
>>> 
>>> -- Andrei Savu / andreisavu.ro
>>> 
>>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>>>> Hi All
>>>> 
>>>>        I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>>>        Is there any "nice" way to do this, or do you have any suggestions?
>>>>        On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>> 
>>>> thanks
>>>> -david
>>>> 
>> 
>> 


Re: launching a one job cluster

Posted by Andrei Savu <sa...@gmail.com>.
Why is so important to avoid having a poll? The cost is low and almost
any job is running at least for a few minutes.

-- Andrei

On Thu, Sep 22, 2011 at 3:07 PM, David Alves <da...@gmail.com> wrote:
> Hi Andrei
>
>        I know…
>        The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
>        I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
>        In particular as I'm collecting accurate metrics I'd like a non poll based technique.
>        Even if there is none I can always try and code it, so all ideas are welcome.
>
> thanks
> david
>
>
> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>
>> This is exactly what the example code is doing (and the hadoop
>> integration test). The job running code is blocking while the job is
>> executing.
>>
>> -- Andrei Savu / andreisavu.ro
>>
>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>>> Hi All
>>>
>>>        I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>>        Is there any "nice" way to do this, or do you have any suggestions?
>>>        On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>
>>> thanks
>>> -david
>>>
>
>

Re: launching a one job cluster

Posted by David Alves <da...@gmail.com>.
Hi Andrei

	I know… 
	The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
	I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
	In particular as I'm collecting accurate metrics I'd like a non poll based technique.
	Even if there is none I can always try and code it, so all ideas are welcome.

thanks
david

	
On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:

> This is exactly what the example code is doing (and the hadoop
> integration test). The job running code is blocking while the job is
> executing.
> 
> -- Andrei Savu / andreisavu.ro
> 
> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>> Hi All
>> 
>>        I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>        Is there any "nice" way to do this, or do you have any suggestions?
>>        On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>> 
>> thanks
>> -david
>> 


Re: launching a one job cluster

Posted by Andrei Savu <sa...@gmail.com>.
This is exactly what the example code is doing (and the hadoop
integration test). The job running code is blocking while the job is
executing.

-- Andrei Savu / andreisavu.ro

On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
> Hi All
>
>        I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>        Is there any "nice" way to do this, or do you have any suggestions?
>        On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>
> thanks
> -david
>