You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@whirr.apache.org by David Alves <da...@gmail.com> on 2011/09/22 23:03:46 UTC
launching a one job cluster
Hi All
I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
Is there any "nice" way to do this, or do you have any suggestions?
On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
thanks
-david
Re: launching a one job cluster
Posted by Alex Heneveld <Al...@CloudsoftCorp.com>.
Hi David,
David, what type of job is it? Can the shell scripts call you back when
they're done or is there some out-of-band event that would have to be
subscribed to?
I'm also interested in the wrapping mechanism. You could write bash
scripts to wrap whirr start/stap and watch for the trigger in between,
but scraping the IP addresses and configuring credentials will get
tedious. +1 to something more elegant ... would a JVM language script
be interesting? Am thinking a simple management layer with embedded
Whirr to start/stop but also able to monitor your processes in between,
finding out about them programmatically via Whirr.
--A
On 22/09/2011 15:29, Andrei Savu wrote:
> Sorry for the confusion :) When I see job I think about Hadoop.
>
> For arbitrary scripts I think jclouds provides some ways of doing this
> as you already know. To make this process of checking if a script is
> running low latency I think you need some sort of server side daemon
> but I can't recommend one.
>
> -- Andrei
>
> On Thu, Sep 22, 2011 at 3:24 PM, David Alves<da...@gmail.com> wrote:
>> As I said the thing is I'm NOT using hadoop :)
>> I'm just running generic scripts/ssh commands.
>>
>> -david
>>
>> On Sep 22, 2011, at 5:20 PM, Andrei Savu wrote:
>>
>>> I don't know that much about how to manage jobs in Hadoop using the
>>> API. Maybe Tom can provide a good answer to this. I completely
>>> understand the elegance part :)
>>>
>>> -- Andrei Savu
>>>
>>> On Thu, Sep 22, 2011 at 3:17 PM, David Alves<da...@gmail.com> wrote:
>>>> First there is the question of accuracy, as I said I am collecting metrics that I'd like to be as accurate as possible.
>>>> Second there is the matter of elegance. I always like to avoid polls whenever possible.
>>>>
>>>> That being said, I don't wan't to embark in some odyssey just to avoid poll, so if it really is too much trouble I am ok with letting it go.
>>>> Anyhow even with poll is there something already implemented that enables it in generic cases?
>>>>
>>>> thanks
>>>> -david
>>>>
>>>> On Sep 22, 2011, at 5:09 PM, Andrei Savu wrote:
>>>>
>>>>> Why is so important to avoid having a poll? The cost is low and almost
>>>>> any job is running at least for a few minutes.
>>>>>
>>>>> -- Andrei
>>>>>
>>>>> On Thu, Sep 22, 2011 at 3:07 PM, David Alves<da...@gmail.com> wrote:
>>>>>> Hi Andrei
>>>>>>
>>>>>> I know…
>>>>>> The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
>>>>>> I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
>>>>>> In particular as I'm collecting accurate metrics I'd like a non poll based technique.
>>>>>> Even if there is none I can always try and code it, so all ideas are welcome.
>>>>>>
>>>>>> thanks
>>>>>> david
>>>>>>
>>>>>>
>>>>>> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>>>>>>
>>>>>>> This is exactly what the example code is doing (and the hadoop
>>>>>>> integration test). The job running code is blocking while the job is
>>>>>>> executing.
>>>>>>>
>>>>>>> -- Andrei Savu / andreisavu.ro
>>>>>>>
>>>>>>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves<da...@gmail.com> wrote:
>>>>>>>> Hi All
>>>>>>>>
>>>>>>>> I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>>>>>>> Is there any "nice" way to do this, or do you have any suggestions?
>>>>>>>> On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>> -david
>>>>>>>>
>>>>>>
>>>>
>>
> .
>
Re: launching a one job cluster
Posted by Andrei Savu <sa...@gmail.com>.
Sorry for the confusion :) When I see job I think about Hadoop.
For arbitrary scripts I think jclouds provides some ways of doing this
as you already know. To make this process of checking if a script is
running low latency I think you need some sort of server side daemon
but I can't recommend one.
-- Andrei
On Thu, Sep 22, 2011 at 3:24 PM, David Alves <da...@gmail.com> wrote:
> As I said the thing is I'm NOT using hadoop :)
> I'm just running generic scripts/ssh commands.
>
> -david
>
> On Sep 22, 2011, at 5:20 PM, Andrei Savu wrote:
>
>> I don't know that much about how to manage jobs in Hadoop using the
>> API. Maybe Tom can provide a good answer to this. I completely
>> understand the elegance part :)
>>
>> -- Andrei Savu
>>
>> On Thu, Sep 22, 2011 at 3:17 PM, David Alves <da...@gmail.com> wrote:
>>> First there is the question of accuracy, as I said I am collecting metrics that I'd like to be as accurate as possible.
>>> Second there is the matter of elegance. I always like to avoid polls whenever possible.
>>>
>>> That being said, I don't wan't to embark in some odyssey just to avoid poll, so if it really is too much trouble I am ok with letting it go.
>>> Anyhow even with poll is there something already implemented that enables it in generic cases?
>>>
>>> thanks
>>> -david
>>>
>>> On Sep 22, 2011, at 5:09 PM, Andrei Savu wrote:
>>>
>>>> Why is so important to avoid having a poll? The cost is low and almost
>>>> any job is running at least for a few minutes.
>>>>
>>>> -- Andrei
>>>>
>>>> On Thu, Sep 22, 2011 at 3:07 PM, David Alves <da...@gmail.com> wrote:
>>>>> Hi Andrei
>>>>>
>>>>> I know…
>>>>> The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
>>>>> I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
>>>>> In particular as I'm collecting accurate metrics I'd like a non poll based technique.
>>>>> Even if there is none I can always try and code it, so all ideas are welcome.
>>>>>
>>>>> thanks
>>>>> david
>>>>>
>>>>>
>>>>> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>>>>>
>>>>>> This is exactly what the example code is doing (and the hadoop
>>>>>> integration test). The job running code is blocking while the job is
>>>>>> executing.
>>>>>>
>>>>>> -- Andrei Savu / andreisavu.ro
>>>>>>
>>>>>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>>>>>>> Hi All
>>>>>>>
>>>>>>> I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>>>>>> Is there any "nice" way to do this, or do you have any suggestions?
>>>>>>> On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>>>>>
>>>>>>> thanks
>>>>>>> -david
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>
Re: launching a one job cluster
Posted by David Alves <da...@gmail.com>.
As I said the thing is I'm NOT using hadoop :)
I'm just running generic scripts/ssh commands.
-david
On Sep 22, 2011, at 5:20 PM, Andrei Savu wrote:
> I don't know that much about how to manage jobs in Hadoop using the
> API. Maybe Tom can provide a good answer to this. I completely
> understand the elegance part :)
>
> -- Andrei Savu
>
> On Thu, Sep 22, 2011 at 3:17 PM, David Alves <da...@gmail.com> wrote:
>> First there is the question of accuracy, as I said I am collecting metrics that I'd like to be as accurate as possible.
>> Second there is the matter of elegance. I always like to avoid polls whenever possible.
>>
>> That being said, I don't wan't to embark in some odyssey just to avoid poll, so if it really is too much trouble I am ok with letting it go.
>> Anyhow even with poll is there something already implemented that enables it in generic cases?
>>
>> thanks
>> -david
>>
>> On Sep 22, 2011, at 5:09 PM, Andrei Savu wrote:
>>
>>> Why is so important to avoid having a poll? The cost is low and almost
>>> any job is running at least for a few minutes.
>>>
>>> -- Andrei
>>>
>>> On Thu, Sep 22, 2011 at 3:07 PM, David Alves <da...@gmail.com> wrote:
>>>> Hi Andrei
>>>>
>>>> I know…
>>>> The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
>>>> I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
>>>> In particular as I'm collecting accurate metrics I'd like a non poll based technique.
>>>> Even if there is none I can always try and code it, so all ideas are welcome.
>>>>
>>>> thanks
>>>> david
>>>>
>>>>
>>>> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>>>>
>>>>> This is exactly what the example code is doing (and the hadoop
>>>>> integration test). The job running code is blocking while the job is
>>>>> executing.
>>>>>
>>>>> -- Andrei Savu / andreisavu.ro
>>>>>
>>>>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>>>>>> Hi All
>>>>>>
>>>>>> I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>>>>> Is there any "nice" way to do this, or do you have any suggestions?
>>>>>> On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>>>>
>>>>>> thanks
>>>>>> -david
>>>>>>
>>>>
>>>>
>>
>>
Re: launching a one job cluster
Posted by Andrei Savu <sa...@gmail.com>.
I don't know that much about how to manage jobs in Hadoop using the
API. Maybe Tom can provide a good answer to this. I completely
understand the elegance part :)
-- Andrei Savu
On Thu, Sep 22, 2011 at 3:17 PM, David Alves <da...@gmail.com> wrote:
> First there is the question of accuracy, as I said I am collecting metrics that I'd like to be as accurate as possible.
> Second there is the matter of elegance. I always like to avoid polls whenever possible.
>
> That being said, I don't wan't to embark in some odyssey just to avoid poll, so if it really is too much trouble I am ok with letting it go.
> Anyhow even with poll is there something already implemented that enables it in generic cases?
>
> thanks
> -david
>
> On Sep 22, 2011, at 5:09 PM, Andrei Savu wrote:
>
>> Why is so important to avoid having a poll? The cost is low and almost
>> any job is running at least for a few minutes.
>>
>> -- Andrei
>>
>> On Thu, Sep 22, 2011 at 3:07 PM, David Alves <da...@gmail.com> wrote:
>>> Hi Andrei
>>>
>>> I know…
>>> The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
>>> I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
>>> In particular as I'm collecting accurate metrics I'd like a non poll based technique.
>>> Even if there is none I can always try and code it, so all ideas are welcome.
>>>
>>> thanks
>>> david
>>>
>>>
>>> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>>>
>>>> This is exactly what the example code is doing (and the hadoop
>>>> integration test). The job running code is blocking while the job is
>>>> executing.
>>>>
>>>> -- Andrei Savu / andreisavu.ro
>>>>
>>>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>>>>> Hi All
>>>>>
>>>>> I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>>>> Is there any "nice" way to do this, or do you have any suggestions?
>>>>> On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>>>
>>>>> thanks
>>>>> -david
>>>>>
>>>
>>>
>
>
Re: launching a one job cluster
Posted by David Alves <da...@gmail.com>.
First there is the question of accuracy, as I said I am collecting metrics that I'd like to be as accurate as possible.
Second there is the matter of elegance. I always like to avoid polls whenever possible.
That being said, I don't wan't to embark in some odyssey just to avoid poll, so if it really is too much trouble I am ok with letting it go.
Anyhow even with poll is there something already implemented that enables it in generic cases?
thanks
-david
On Sep 22, 2011, at 5:09 PM, Andrei Savu wrote:
> Why is so important to avoid having a poll? The cost is low and almost
> any job is running at least for a few minutes.
>
> -- Andrei
>
> On Thu, Sep 22, 2011 at 3:07 PM, David Alves <da...@gmail.com> wrote:
>> Hi Andrei
>>
>> I know…
>> The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
>> I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
>> In particular as I'm collecting accurate metrics I'd like a non poll based technique.
>> Even if there is none I can always try and code it, so all ideas are welcome.
>>
>> thanks
>> david
>>
>>
>> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>>
>>> This is exactly what the example code is doing (and the hadoop
>>> integration test). The job running code is blocking while the job is
>>> executing.
>>>
>>> -- Andrei Savu / andreisavu.ro
>>>
>>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>>>> Hi All
>>>>
>>>> I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>>> Is there any "nice" way to do this, or do you have any suggestions?
>>>> On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>>
>>>> thanks
>>>> -david
>>>>
>>
>>
Re: launching a one job cluster
Posted by Andrei Savu <sa...@gmail.com>.
Why is so important to avoid having a poll? The cost is low and almost
any job is running at least for a few minutes.
-- Andrei
On Thu, Sep 22, 2011 at 3:07 PM, David Alves <da...@gmail.com> wrote:
> Hi Andrei
>
> I know…
> The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
> I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
> In particular as I'm collecting accurate metrics I'd like a non poll based technique.
> Even if there is none I can always try and code it, so all ideas are welcome.
>
> thanks
> david
>
>
> On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
>
>> This is exactly what the example code is doing (and the hadoop
>> integration test). The job running code is blocking while the job is
>> executing.
>>
>> -- Andrei Savu / andreisavu.ro
>>
>> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>>> Hi All
>>>
>>> I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>>> Is there any "nice" way to do this, or do you have any suggestions?
>>> On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>>
>>> thanks
>>> -david
>>>
>
>
Re: launching a one job cluster
Posted by David Alves <da...@gmail.com>.
Hi Andrei
I know…
The thing is that code used the Hadoop JobClient class's runJob() method that actually polls for progress.
I am not using hadoop (in hindsight using the word "job" might have been a mistake) and I was wondering if there is already a way to do that for generic cases (e.g., scripts or java programs).
In particular as I'm collecting accurate metrics I'd like a non poll based technique.
Even if there is none I can always try and code it, so all ideas are welcome.
thanks
david
On Sep 22, 2011, at 4:52 PM, Andrei Savu wrote:
> This is exactly what the example code is doing (and the hadoop
> integration test). The job running code is blocking while the job is
> executing.
>
> -- Andrei Savu / andreisavu.ro
>
> On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
>> Hi All
>>
>> I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
>> Is there any "nice" way to do this, or do you have any suggestions?
>> On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>>
>> thanks
>> -david
>>
Re: launching a one job cluster
Posted by Andrei Savu <sa...@gmail.com>.
This is exactly what the example code is doing (and the hadoop
integration test). The job running code is blocking while the job is
executing.
-- Andrei Savu / andreisavu.ro
On Thu, Sep 22, 2011 at 2:03 PM, David Alves <da...@gmail.com> wrote:
> Hi All
>
> I need to launch a cluster run a job and terminate the cluster as the job is finished (as soon as possible).
> Is there any "nice" way to do this, or do you have any suggestions?
> On the top of my head I can imagine some quick and dirty solutions (like creating a file whenever the task is completed and polling for its existence from the whirr handler) but I'd like to do it without polling if possible. Any ideas?
>
> thanks
> -david
>