You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Per Steffensen <st...@designware.dk> on 2011/09/01 11:30:43 UTC

Timer jobs

Hi

I use hadoop for a MapReduce job in my system. I would like to have the 
job run very 5th minute. Are there any "distributed" timer job stuff in 
hadoop? Of course I could setup a timer in an external timer framework 
(CRON or something like that) that invokes the MapReduce job. But CRON 
is only running on one particular machine, so if that machine goes down 
my job will not be triggered. Then I could setup the timer on all or 
many machines, but I would not like the job to be run in more than one 
instance every 5th minute, so then the timer jobs would need to 
coordinate who is actually starting the job "this time" and all the rest 
would just have to do nothing. Guess I could come up with a solution to 
that - e.g. writing some "lock" stuff using HDFS files or by using 
ZooKeeper. But I would really like if someone had already solved the 
problem, and provided some kind of a "distributed timer framework" 
running in a "cluster", so that I could just register a timer job with 
the cluster, and then be sure that it is invoked every 5th minute, no 
matter if one or two particular machines in the cluster is down.

Any suggestions are very welcome.

Regards, Per Steffensen

Re: Timer jobs

Posted by Tharindu Mathew <mc...@gmail.com>.

On Thu, Sep 1, 2011 at 7:58 PM, Per Steffensen <st...@designware.dk> wrote:

> Thanks for your response. See comments below.
>
> Regards, Per Steffensen
>
> Alejandro Abdelnur skrev:
>
>  [moving common-user@ to BCC]
>>
>> Oozie is not HA yet. But it would be relatively easy to make it. It was
>> designed with that in mind, we even did a prototype.
>>
>>
> Ok, so if it isnt HA out-of-the-box I believe Oozie is too big a framework
> for my needs - I dont need all this workflow stuff - just a plain simple job
> trigger that triggers every 5th minute. I guess I will try out something
> smaller like Quartz Scheduler. It also only have HA/cluster support through
> JDBC (JobStore) but I guess I could fairly easy make a HDFSFilesJobStore
> which still hold the properties so that Quartz clustering works.
>
> But what I would really like to have is a scheduling framework that is HA
> out-of-the-box. Guess Oozie is not the solution for me. Anyone knows about
> other frameworks?

This is similar to my requirement. Only that I already have Quartz
scheduling my jobs and haven't started using Hadoop yet. I plan to wrap
Quartz jobs to internally call Hadoop jobs. I'm still in the design phase
though. Hopefully, it will be successful.

>
>  Oozie consists of 2 services, a SQL database to store the Oozie jobs state
>> and a servlet container where Oozie app proper runs.
>>
>> The solution for HA for the database, well, it is left to the database.
>> This
>> means, you'll have to get an HA DB.
>>
>>
> I would really like to avoid having to run a relational database. Couldnt I
> just do the persistence of Oozie jobs state in files on HDFS?
>
>  The solution for HA for the Oozie app is deploying the servlet container
>> with the Oozie app in more than one box (2 or 3); and front them by a HTTP
>> load-balancer.
>>
>> The missing part is that the current Oozie lock-service is currently an
>> in-memory implementation. This should be replaced with a Zookeeper
>> implementation. Zookeeper could run externally or internally in all Oozie
>> servers. This is what was prototyped long ago.
>>
>>
> Yes but if I have to do ZooKeeper stuff I could just do the scheduler
> myself and make run no all/many boxes. The only hard part about it is the
> "locking" thing that makes sure only one job-triggering happens in the
> entire cluster when only one job-triggering is supposed to happen, and that
> the job-triggering happens no matter how many machines might be down.
>
>  Thanks.
>>
>> Alejandro
>>
>>
>> On Thu, Sep 1, 2011 at 4:14 AM, Ronen Itkin <ro...@taykey.com> wrote:
>>
>>
>>
>>> If I get you right you are asking about Installing Oozie as Distributed
>>> and/or HA cluster?!
>>> In that case I am not familiar with an out of the box solution by Oozie.
>>> But, I think you can made up a solution of your own, for example:
>>> Installing Oozie on two servers on the same partition which will be
>>> synchronized by DRBD.
>>> You can trigger a "failover" using linux Heartbeat and that way maintain
>>> a
>>> virtual IP.
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen <st...@designware.dk>
>>> wrote:
>>>
>>>
>>>
>>>> Hi
>>>>
>>>> Thanks a lot for pointing me to Oozie. I have looked a little bit into
>>>> Oozie and it seems like the "component" triggering jobs is called
>>>> "Coordinator Application". But I really see nowhere that this
>>>> Coordinator
>>>> Application doesnt just run on a single machine, and that it will
>>>>
>>>>
>>> therefore
>>>
>>>
>>>> not trigger anything if this machine is down. Can you confirm that the
>>>> "Coordinator Application"-role is distributed in a distribued Oozie
>>>>
>>>>
>>> setup,
>>>
>>>
>>>> so that jobs gets triggered even if one or two machines are down?
>>>>
>>>> Regards, Per Steffensen
>>>>
>>>> Ronen Itkin skrev:
>>>>
>>>>  Hi
>>>>
>>>>
>>>>> Try to use Oozie for job coordination and work flows.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen <st...@designware.dk>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I use hadoop for a MapReduce job in my system. I would like to have
>>>>>> the
>>>>>> job
>>>>>> run very 5th minute. Are there any "distributed" timer job stuff in
>>>>>> hadoop?
>>>>>> Of course I could setup a timer in an external timer framework (CRON
>>>>>> or
>>>>>> something like that) that invokes the MapReduce job. But CRON is only
>>>>>> running on one particular machine, so if that machine goes down my job
>>>>>> will
>>>>>> not be triggered. Then I could setup the timer on all or many
>>>>>> machines,
>>>>>> but
>>>>>> I would not like the job to be run in more than one instance every 5th
>>>>>> minute, so then the timer jobs would need to coordinate who is
>>>>>> actually
>>>>>> starting the job "this time" and all the rest would just have to do
>>>>>> nothing.
>>>>>> Guess I could come up with a solution to that - e.g. writing some
>>>>>>
>>>>>>
>>>>> "lock"
>>>
>>>
>>>> stuff using HDFS files or by using ZooKeeper. But I would really like
>>>>>>
>>>>>>
>>>>> if
>>>
>>>
>>>> someone had already solved the problem, and provided some kind of a
>>>>>> "distributed timer framework" running in a "cluster", so that I could
>>>>>> just
>>>>>> register a timer job with the cluster, and then be sure that it is
>>>>>> invoked
>>>>>> every 5th minute, no matter if one or two particular machines in the
>>>>>> cluster
>>>>>> is down.
>>>>>>
>>>>>> Any suggestions are very welcome.
>>>>>>
>>>>>> Regards, Per Steffensen
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>> --
>>> *
>>> Ronen Itkin*
>>> Taykey | www.taykey.com
>>>
>>>
>>>
>>
>>
>>
>
>


-- 
Regards,

Tharindu

Re: Timer jobs

Posted by Per Steffensen <st...@designware.dk>.

Thanks for your response. See comments below.

Regards, Per Steffensen

Alejandro Abdelnur skrev:
> [moving common-user@ to BCC]
>
> Oozie is not HA yet. But it would be relatively easy to make it. It was
> designed with that in mind, we even did a prototype.
>   
Ok, so if it isnt HA out-of-the-box I believe Oozie is too big a 
framework for my needs - I dont need all this workflow stuff - just a 
plain simple job trigger that triggers every 5th minute. I guess I will 
try out something smaller like Quartz Scheduler. It also only have 
HA/cluster support through JDBC (JobStore) but I guess I could fairly 
easy make a HDFSFilesJobStore which still hold the properties so that 
Quartz clustering works.

But what I would really like to have is a scheduling framework that is 
HA out-of-the-box. Guess Oozie is not the solution for me. Anyone knows 
about other frameworks?
> Oozie consists of 2 services, a SQL database to store the Oozie jobs state
> and a servlet container where Oozie app proper runs.
>
> The solution for HA for the database, well, it is left to the database. This
> means, you'll have to get an HA DB.
>   
I would really like to avoid having to run a relational database. 
Couldnt I just do the persistence of Oozie jobs state in files on HDFS?
> The solution for HA for the Oozie app is deploying the servlet container
> with the Oozie app in more than one box (2 or 3); and front them by a HTTP
> load-balancer.
>
> The missing part is that the current Oozie lock-service is currently an
> in-memory implementation. This should be replaced with a Zookeeper
> implementation. Zookeeper could run externally or internally in all Oozie
> servers. This is what was prototyped long ago.
>   
Yes but if I have to do ZooKeeper stuff I could just do the scheduler 
myself and make run no all/many boxes. The only hard part about it is 
the "locking" thing that makes sure only one job-triggering happens in 
the entire cluster when only one job-triggering is supposed to happen, 
and that the job-triggering happens no matter how many machines might be 
down.
> Thanks.
>
> Alejandro
>
>
> On Thu, Sep 1, 2011 at 4:14 AM, Ronen Itkin <ro...@taykey.com> wrote:
>
>   
>> If I get you right you are asking about Installing Oozie as Distributed
>> and/or HA cluster?!
>> In that case I am not familiar with an out of the box solution by Oozie.
>> But, I think you can made up a solution of your own, for example:
>> Installing Oozie on two servers on the same partition which will be
>> synchronized by DRBD.
>> You can trigger a "failover" using linux Heartbeat and that way maintain a
>> virtual IP.
>>
>>
>>
>>
>>
>> On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen <st...@designware.dk>
>> wrote:
>>
>>     
>>> Hi
>>>
>>> Thanks a lot for pointing me to Oozie. I have looked a little bit into
>>> Oozie and it seems like the "component" triggering jobs is called
>>> "Coordinator Application". But I really see nowhere that this Coordinator
>>> Application doesnt just run on a single machine, and that it will
>>>       
>> therefore
>>     
>>> not trigger anything if this machine is down. Can you confirm that the
>>> "Coordinator Application"-role is distributed in a distribued Oozie
>>>       
>> setup,
>>     
>>> so that jobs gets triggered even if one or two machines are down?
>>>
>>> Regards, Per Steffensen
>>>
>>> Ronen Itkin skrev:
>>>
>>>  Hi
>>>       
>>>> Try to use Oozie for job coordination and work flows.
>>>>
>>>>
>>>>
>>>> On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen <st...@designware.dk>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> Hi
>>>>>
>>>>> I use hadoop for a MapReduce job in my system. I would like to have the
>>>>> job
>>>>> run very 5th minute. Are there any "distributed" timer job stuff in
>>>>> hadoop?
>>>>> Of course I could setup a timer in an external timer framework (CRON or
>>>>> something like that) that invokes the MapReduce job. But CRON is only
>>>>> running on one particular machine, so if that machine goes down my job
>>>>> will
>>>>> not be triggered. Then I could setup the timer on all or many machines,
>>>>> but
>>>>> I would not like the job to be run in more than one instance every 5th
>>>>> minute, so then the timer jobs would need to coordinate who is actually
>>>>> starting the job "this time" and all the rest would just have to do
>>>>> nothing.
>>>>> Guess I could come up with a solution to that - e.g. writing some
>>>>>           
>> "lock"
>>     
>>>>> stuff using HDFS files or by using ZooKeeper. But I would really like
>>>>>           
>> if
>>     
>>>>> someone had already solved the problem, and provided some kind of a
>>>>> "distributed timer framework" running in a "cluster", so that I could
>>>>> just
>>>>> register a timer job with the cluster, and then be sure that it is
>>>>> invoked
>>>>> every 5th minute, no matter if one or two particular machines in the
>>>>> cluster
>>>>> is down.
>>>>>
>>>>> Any suggestions are very welcome.
>>>>>
>>>>> Regards, Per Steffensen
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>
>>>>
>>>>
>>>>         
>>>       
>> --
>> *
>> Ronen Itkin*
>> Taykey | www.taykey.com
>>
>>     
>
>

Re: Timer jobs

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

[moving common-user@ to BCC]

Oozie is not HA yet. But it would be relatively easy to make it. It was
designed with that in mind, we even did a prototype.

Oozie consists of 2 services, a SQL database to store the Oozie jobs state
and a servlet container where Oozie app proper runs.

The solution for HA for the database, well, it is left to the database. This
means, you'll have to get an HA DB.

The solution for HA for the Oozie app is deploying the servlet container
with the Oozie app in more than one box (2 or 3); and front them by a HTTP
load-balancer.

The missing part is that the current Oozie lock-service is currently an
in-memory implementation. This should be replaced with a Zookeeper
implementation. Zookeeper could run externally or internally in all Oozie
servers. This is what was prototyped long ago.

Thanks.

Alejandro


On Thu, Sep 1, 2011 at 4:14 AM, Ronen Itkin <ro...@taykey.com> wrote:

> If I get you right you are asking about Installing Oozie as Distributed
> and/or HA cluster?!
> In that case I am not familiar with an out of the box solution by Oozie.
> But, I think you can made up a solution of your own, for example:
> Installing Oozie on two servers on the same partition which will be
> synchronized by DRBD.
> You can trigger a "failover" using linux Heartbeat and that way maintain a
> virtual IP.
>
>
>
>
>
> On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen <st...@designware.dk>
> wrote:
>
> > Hi
> >
> > Thanks a lot for pointing me to Oozie. I have looked a little bit into
> > Oozie and it seems like the "component" triggering jobs is called
> > "Coordinator Application". But I really see nowhere that this Coordinator
> > Application doesnt just run on a single machine, and that it will
> therefore
> > not trigger anything if this machine is down. Can you confirm that the
> > "Coordinator Application"-role is distributed in a distribued Oozie
> setup,
> > so that jobs gets triggered even if one or two machines are down?
> >
> > Regards, Per Steffensen
> >
> > Ronen Itkin skrev:
> >
> >  Hi
> >>
> >> Try to use Oozie for job coordination and work flows.
> >>
> >>
> >>
> >> On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen <st...@designware.dk>
> >> wrote:
> >>
> >>
> >>
> >>> Hi
> >>>
> >>> I use hadoop for a MapReduce job in my system. I would like to have the
> >>> job
> >>> run very 5th minute. Are there any "distributed" timer job stuff in
> >>> hadoop?
> >>> Of course I could setup a timer in an external timer framework (CRON or
> >>> something like that) that invokes the MapReduce job. But CRON is only
> >>> running on one particular machine, so if that machine goes down my job
> >>> will
> >>> not be triggered. Then I could setup the timer on all or many machines,
> >>> but
> >>> I would not like the job to be run in more than one instance every 5th
> >>> minute, so then the timer jobs would need to coordinate who is actually
> >>> starting the job "this time" and all the rest would just have to do
> >>> nothing.
> >>> Guess I could come up with a solution to that - e.g. writing some
> "lock"
> >>> stuff using HDFS files or by using ZooKeeper. But I would really like
> if
> >>> someone had already solved the problem, and provided some kind of a
> >>> "distributed timer framework" running in a "cluster", so that I could
> >>> just
> >>> register a timer job with the cluster, and then be sure that it is
> >>> invoked
> >>> every 5th minute, no matter if one or two particular machines in the
> >>> cluster
> >>> is down.
> >>>
> >>> Any suggestions are very welcome.
> >>>
> >>> Regards, Per Steffensen
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >>
> >>
> >
> >
>
>
> --
> *
> Ronen Itkin*
> Taykey | www.taykey.com
>

Re: Timer jobs

Posted by Per Steffensen <st...@designware.dk>.

Vitalii Tymchyshyn skrev:
> 01.09.11 21:55, Per Steffensen написав(ла):
>> Vitalii Tymchyshyn skrev:
>>> Hello.
>>>
>>> AFAIK now you still have HDFS NameNode and as soon as NameNode is 
>>> down - your cluster is down. So, putting scheduling on the same 
>>> machine as NameNode won't make you cluster worse in terms of SPOF 
>>> (at least for HW failures).
>>>
>>> Best regards, Vitalii Tymchyshyn
>>>
>>>
>> I believe this is why there is also a secondary namenode. 
>
> Hello.
>
> Not at all. Secondary name node is not even a hot standby. You HDFS 
> cluster address is namenode:port and no one who connects with it knows 
> about secondary name node, so it's not a HA solution.
> AFAIR secondary name node even is not a backup, but simply a tools to 
> help main name node to process transaction logs at a scheduled 
> fashion. 0.21 has backup name node, but 0.21 is unstable and it's 
> backup node does not work (tried it). For 0.20 the backup solution 
> mentioned in the docs is to have a NFS mount on name node and specify 
> it as a secondary name node data directory.
>
> Best regards, Vitalii Tymchyshyn.
>
>
Hmm, then I believe Hadoop has a serious HA problem built-in. That is 
not so smart when most of it is about doing HA. But I guess work is 
going on to solve that - in 0.21 and further forward. But thanks for you 
explanation.

Regards, Per Steffensen

Re: Timer jobs

Posted by Vitalii Tymchyshyn <ti...@gmail.com>.

01.09.11 21:55, Per Steffensen написав(ла):
> Vitalii Tymchyshyn skrev:
>> Hello.
>>
>> AFAIK now you still have HDFS NameNode and as soon as NameNode is 
>> down - your cluster is down. So, putting scheduling on the same 
>> machine as NameNode won't make you cluster worse in terms of SPOF (at 
>> least for HW failures).
>>
>> Best regards, Vitalii Tymchyshyn
>>
>>
> I believe this is why there is also a secondary namenode. 

Hello.

Not at all. Secondary name node is not even a hot standby. You HDFS 
cluster address is namenode:port and no one who connects with it knows 
about secondary name node, so it's not a HA solution.
AFAIR secondary name node even is not a backup, but simply a tools to 
help main name node to process transaction logs at a scheduled fashion. 
0.21 has backup name node, but 0.21 is unstable and it's backup node 
does not work (tried it). For 0.20 the backup solution mentioned in the 
docs is to have a NFS mount on name node and specify it as a secondary 
name node data directory.

Best regards, Vitalii Tymchyshyn.

Re: Timer jobs

Posted by Per Steffensen <st...@designware.dk>.

Vitalii Tymchyshyn skrev:
> 01.09.11 18:14, Per Steffensen написав(ла):
>> Well I am not sure I get you right, but anyway, basically I want a 
>> timer framework that triggers my jobs. And the triggering of the jobs 
>> need to work even though one or two particular machines goes down. So 
>> the "timer triggering mechanism" has to live in the cluster, so to 
>> speak. What I dont want is that the timer framework are driven from 
>> one particular machine, so that the triggering of jobs will not 
>> happen if this particular machine goes down. Basically if I have e.g. 
>> 10 machines in a Hadoop cluster I will be able to run e.g. MapReduce 
>> jobs even if 3 of the 10 machines are down. I want my timer framework 
>> to also be clustered, distributed and coordinated, so that I will 
>> also have my timer jobs triggered even though 3 out of 10 machines 
>> are down.
> Hello.
>
> AFAIK now you still have HDFS NameNode and as soon as NameNode is down 
> - your cluster is down. So, putting scheduling on the same machine as 
> NameNode won't make you cluster worse in terms of SPOF (at least for 
> HW failures).
>
> Best regards, Vitalii Tymchyshyn
>
>
I believe this is why there is also a secondary namenode. But with two 
namenodes it is still to centralized in my opinion, but guess Hadoop 
people know that, and that the namenode-role will be even more 
distributed in the future. But that does not change the fact that I 
would like to have a real distributed clustered scheduler.

Re: Timer jobs

Posted by Vitalii Tymchyshyn <ti...@gmail.com>.

01.09.11 18:14, Per Steffensen написав(ла):
> Well I am not sure I get you right, but anyway, basically I want a 
> timer framework that triggers my jobs. And the triggering of the jobs 
> need to work even though one or two particular machines goes down. So 
> the "timer triggering mechanism" has to live in the cluster, so to 
> speak. What I dont want is that the timer framework are driven from 
> one particular machine, so that the triggering of jobs will not happen 
> if this particular machine goes down. Basically if I have e.g. 10 
> machines in a Hadoop cluster I will be able to run e.g. MapReduce jobs 
> even if 3 of the 10 machines are down. I want my timer framework to 
> also be clustered, distributed and coordinated, so that I will also 
> have my timer jobs triggered even though 3 out of 10 machines are down.
Hello.

AFAIK now you still have HDFS NameNode and as soon as NameNode is down - 
your cluster is down. So, putting scheduling on the same machine as 
NameNode won't make you cluster worse in terms of SPOF (at least for HW 
failures).

Best regards, Vitalii Tymchyshyn

Re: Timer jobs

Posted by Tharindu Mathew <mc...@gmail.com>.

In Hadoop, if the client that triggers the job fails, is there a way to
recover and another client to submit the job?

On Thu, Sep 1, 2011 at 8:44 PM, Per Steffensen <st...@designware.dk> wrote:

> Well I am not sure I get you right, but anyway, basically I want a timer
> framework that triggers my jobs. And the triggering of the jobs need to work
> even though one or two particular machines goes down. So the "timer
> triggering mechanism" has to live in the cluster, so to speak. What I dont
> want is that the timer framework are driven from one particular machine, so
> that the triggering of jobs will not happen if this particular machine goes
> down. Basically if I have e.g. 10 machines in a Hadoop cluster I will be
> able to run e.g. MapReduce jobs even if 3 of the 10 machines are down. I
> want my timer framework to also be clustered, distributed and coordinated,
> so that I will also have my timer jobs triggered even though 3 out of 10
> machines are down.
>
>
> Regards, Per Steffensen
>
> Ronen Itkin skrev:
>
>> If I get you right you are asking about Installing Oozie as Distributed
>> and/or HA cluster?!
>> In that case I am not familiar with an out of the box solution by Oozie.
>> But, I think you can made up a solution of your own, for example:
>> Installing Oozie on two servers on the same partition which will be
>> synchronized by DRBD.
>> You can trigger a "failover" using linux Heartbeat and that way maintain a
>> virtual IP.
>>
>>
>>
>>
>>
>> On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen <st...@designware.dk>
>> wrote:
>>
>>
>>
>>> Hi
>>>
>>> Thanks a lot for pointing me to Oozie. I have looked a little bit into
>>> Oozie and it seems like the "component" triggering jobs is called
>>> "Coordinator Application". But I really see nowhere that this Coordinator
>>> Application doesnt just run on a single machine, and that it will
>>> therefore
>>> not trigger anything if this machine is down. Can you confirm that the
>>> "Coordinator Application"-role is distributed in a distribued Oozie
>>> setup,
>>> so that jobs gets triggered even if one or two machines are down?
>>>
>>> Regards, Per Steffensen
>>>
>>> Ronen Itkin skrev:
>>>
>>>  Hi
>>>
>>>
>>>> Try to use Oozie for job coordination and work flows.
>>>>
>>>>
>>>>
>>>> On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen <st...@designware.dk>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Hi
>>>>>
>>>>> I use hadoop for a MapReduce job in my system. I would like to have the
>>>>> job
>>>>> run very 5th minute. Are there any "distributed" timer job stuff in
>>>>> hadoop?
>>>>> Of course I could setup a timer in an external timer framework (CRON or
>>>>> something like that) that invokes the MapReduce job. But CRON is only
>>>>> running on one particular machine, so if that machine goes down my job
>>>>> will
>>>>> not be triggered. Then I could setup the timer on all or many machines,
>>>>> but
>>>>> I would not like the job to be run in more than one instance every 5th
>>>>> minute, so then the timer jobs would need to coordinate who is actually
>>>>> starting the job "this time" and all the rest would just have to do
>>>>> nothing.
>>>>> Guess I could come up with a solution to that - e.g. writing some
>>>>> "lock"
>>>>> stuff using HDFS files or by using ZooKeeper. But I would really like
>>>>> if
>>>>> someone had already solved the problem, and provided some kind of a
>>>>> "distributed timer framework" running in a "cluster", so that I could
>>>>> just
>>>>> register a timer job with the cluster, and then be sure that it is
>>>>> invoked
>>>>> every 5th minute, no matter if one or two particular machines in the
>>>>> cluster
>>>>> is down.
>>>>>
>>>>> Any suggestions are very welcome.
>>>>>
>>>>> Regards, Per Steffensen
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>
>


-- 
Regards,

Tharindu

Re: Timer jobs

Posted by Per Steffensen <st...@designware.dk>.

Well I am not sure I get you right, but anyway, basically I want a timer 
framework that triggers my jobs. And the triggering of the jobs need to 
work even though one or two particular machines goes down. So the "timer 
triggering mechanism" has to live in the cluster, so to speak. What I 
dont want is that the timer framework are driven from one particular 
machine, so that the triggering of jobs will not happen if this 
particular machine goes down. Basically if I have e.g. 10 machines in a 
Hadoop cluster I will be able to run e.g. MapReduce jobs even if 3 of 
the 10 machines are down. I want my timer framework to also be 
clustered, distributed and coordinated, so that I will also have my 
timer jobs triggered even though 3 out of 10 machines are down.

Regards, Per Steffensen

Ronen Itkin skrev:
> If I get you right you are asking about Installing Oozie as Distributed
> and/or HA cluster?!
> In that case I am not familiar with an out of the box solution by Oozie.
> But, I think you can made up a solution of your own, for example:
> Installing Oozie on two servers on the same partition which will be
> synchronized by DRBD.
> You can trigger a "failover" using linux Heartbeat and that way maintain a
> virtual IP.
>
>
>
>
>
> On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen <st...@designware.dk> wrote:
>
>   
>> Hi
>>
>> Thanks a lot for pointing me to Oozie. I have looked a little bit into
>> Oozie and it seems like the "component" triggering jobs is called
>> "Coordinator Application". But I really see nowhere that this Coordinator
>> Application doesnt just run on a single machine, and that it will therefore
>> not trigger anything if this machine is down. Can you confirm that the
>> "Coordinator Application"-role is distributed in a distribued Oozie setup,
>> so that jobs gets triggered even if one or two machines are down?
>>
>> Regards, Per Steffensen
>>
>> Ronen Itkin skrev:
>>
>>  Hi
>>     
>>> Try to use Oozie for job coordination and work flows.
>>>
>>>
>>>
>>> On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen <st...@designware.dk>
>>> wrote:
>>>
>>>
>>>
>>>       
>>>> Hi
>>>>
>>>> I use hadoop for a MapReduce job in my system. I would like to have the
>>>> job
>>>> run very 5th minute. Are there any "distributed" timer job stuff in
>>>> hadoop?
>>>> Of course I could setup a timer in an external timer framework (CRON or
>>>> something like that) that invokes the MapReduce job. But CRON is only
>>>> running on one particular machine, so if that machine goes down my job
>>>> will
>>>> not be triggered. Then I could setup the timer on all or many machines,
>>>> but
>>>> I would not like the job to be run in more than one instance every 5th
>>>> minute, so then the timer jobs would need to coordinate who is actually
>>>> starting the job "this time" and all the rest would just have to do
>>>> nothing.
>>>> Guess I could come up with a solution to that - e.g. writing some "lock"
>>>> stuff using HDFS files or by using ZooKeeper. But I would really like if
>>>> someone had already solved the problem, and provided some kind of a
>>>> "distributed timer framework" running in a "cluster", so that I could
>>>> just
>>>> register a timer job with the cluster, and then be sure that it is
>>>> invoked
>>>> every 5th minute, no matter if one or two particular machines in the
>>>> cluster
>>>> is down.
>>>>
>>>> Any suggestions are very welcome.
>>>>
>>>> Regards, Per Steffensen
>>>>
>>>>
>>>>
>>>>         
>>>
>>>
>>>
>>>       
>>     
>
>
>

Re: Timer jobs

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

[moving common-user@ to BCC]

Oozie is not HA yet. But it would be relatively easy to make it. It was
designed with that in mind, we even did a prototype.

Oozie consists of 2 services, a SQL database to store the Oozie jobs state
and a servlet container where Oozie app proper runs.

The solution for HA for the database, well, it is left to the database. This
means, you'll have to get an HA DB.

The solution for HA for the Oozie app is deploying the servlet container
with the Oozie app in more than one box (2 or 3); and front them by a HTTP
load-balancer.

The missing part is that the current Oozie lock-service is currently an
in-memory implementation. This should be replaced with a Zookeeper
implementation. Zookeeper could run externally or internally in all Oozie
servers. This is what was prototyped long ago.

Thanks.

Alejandro


On Thu, Sep 1, 2011 at 4:14 AM, Ronen Itkin <ro...@taykey.com> wrote:

> If I get you right you are asking about Installing Oozie as Distributed
> and/or HA cluster?!
> In that case I am not familiar with an out of the box solution by Oozie.
> But, I think you can made up a solution of your own, for example:
> Installing Oozie on two servers on the same partition which will be
> synchronized by DRBD.
> You can trigger a "failover" using linux Heartbeat and that way maintain a
> virtual IP.
>
>
>
>
>
> On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen <st...@designware.dk>
> wrote:
>
> > Hi
> >
> > Thanks a lot for pointing me to Oozie. I have looked a little bit into
> > Oozie and it seems like the "component" triggering jobs is called
> > "Coordinator Application". But I really see nowhere that this Coordinator
> > Application doesnt just run on a single machine, and that it will
> therefore
> > not trigger anything if this machine is down. Can you confirm that the
> > "Coordinator Application"-role is distributed in a distribued Oozie
> setup,
> > so that jobs gets triggered even if one or two machines are down?
> >
> > Regards, Per Steffensen
> >
> > Ronen Itkin skrev:
> >
> >  Hi
> >>
> >> Try to use Oozie for job coordination and work flows.
> >>
> >>
> >>
> >> On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen <st...@designware.dk>
> >> wrote:
> >>
> >>
> >>
> >>> Hi
> >>>
> >>> I use hadoop for a MapReduce job in my system. I would like to have the
> >>> job
> >>> run very 5th minute. Are there any "distributed" timer job stuff in
> >>> hadoop?
> >>> Of course I could setup a timer in an external timer framework (CRON or
> >>> something like that) that invokes the MapReduce job. But CRON is only
> >>> running on one particular machine, so if that machine goes down my job
> >>> will
> >>> not be triggered. Then I could setup the timer on all or many machines,
> >>> but
> >>> I would not like the job to be run in more than one instance every 5th
> >>> minute, so then the timer jobs would need to coordinate who is actually
> >>> starting the job "this time" and all the rest would just have to do
> >>> nothing.
> >>> Guess I could come up with a solution to that - e.g. writing some
> "lock"
> >>> stuff using HDFS files or by using ZooKeeper. But I would really like
> if
> >>> someone had already solved the problem, and provided some kind of a
> >>> "distributed timer framework" running in a "cluster", so that I could
> >>> just
> >>> register a timer job with the cluster, and then be sure that it is
> >>> invoked
> >>> every 5th minute, no matter if one or two particular machines in the
> >>> cluster
> >>> is down.
> >>>
> >>> Any suggestions are very welcome.
> >>>
> >>> Regards, Per Steffensen
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >>
> >>
> >
> >
>
>
> --
> *
> Ronen Itkin*
> Taykey | www.taykey.com
>

Re: Timer jobs

Posted by Ronen Itkin <ro...@taykey.com>.

If I get you right you are asking about Installing Oozie as Distributed
and/or HA cluster?!
In that case I am not familiar with an out of the box solution by Oozie.
But, I think you can made up a solution of your own, for example:
Installing Oozie on two servers on the same partition which will be
synchronized by DRBD.
You can trigger a "failover" using linux Heartbeat and that way maintain a
virtual IP.





On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen <st...@designware.dk> wrote:

> Hi
>
> Thanks a lot for pointing me to Oozie. I have looked a little bit into
> Oozie and it seems like the "component" triggering jobs is called
> "Coordinator Application". But I really see nowhere that this Coordinator
> Application doesnt just run on a single machine, and that it will therefore
> not trigger anything if this machine is down. Can you confirm that the
> "Coordinator Application"-role is distributed in a distribued Oozie setup,
> so that jobs gets triggered even if one or two machines are down?
>
> Regards, Per Steffensen
>
> Ronen Itkin skrev:
>
>  Hi
>>
>> Try to use Oozie for job coordination and work flows.
>>
>>
>>
>> On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen <st...@designware.dk>
>> wrote:
>>
>>
>>
>>> Hi
>>>
>>> I use hadoop for a MapReduce job in my system. I would like to have the
>>> job
>>> run very 5th minute. Are there any "distributed" timer job stuff in
>>> hadoop?
>>> Of course I could setup a timer in an external timer framework (CRON or
>>> something like that) that invokes the MapReduce job. But CRON is only
>>> running on one particular machine, so if that machine goes down my job
>>> will
>>> not be triggered. Then I could setup the timer on all or many machines,
>>> but
>>> I would not like the job to be run in more than one instance every 5th
>>> minute, so then the timer jobs would need to coordinate who is actually
>>> starting the job "this time" and all the rest would just have to do
>>> nothing.
>>> Guess I could come up with a solution to that - e.g. writing some "lock"
>>> stuff using HDFS files or by using ZooKeeper. But I would really like if
>>> someone had already solved the problem, and provided some kind of a
>>> "distributed timer framework" running in a "cluster", so that I could
>>> just
>>> register a timer job with the cluster, and then be sure that it is
>>> invoked
>>> every 5th minute, no matter if one or two particular machines in the
>>> cluster
>>> is down.
>>>
>>> Any suggestions are very welcome.
>>>
>>> Regards, Per Steffensen
>>>
>>>
>>>
>>
>>
>>
>>
>>
>
>


-- 
*
Ronen Itkin*
Taykey | www.taykey.com

Re: Timer jobs

Posted by Per Steffensen <st...@designware.dk>.

Hi

Thanks a lot for pointing me to Oozie. I have looked a little bit into 
Oozie and it seems like the "component" triggering jobs is called 
"Coordinator Application". But I really see nowhere that this 
Coordinator Application doesnt just run on a single machine, and that it 
will therefore not trigger anything if this machine is down. Can you 
confirm that the "Coordinator Application"-role is distributed in a 
distribued Oozie setup, so that jobs gets triggered even if one or two 
machines are down?

Regards, Per Steffensen

Ronen Itkin skrev:
> Hi
>
> Try to use Oozie for job coordination and work flows.
>
>
>
> On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen <st...@designware.dk> wrote:
>
>   
>> Hi
>>
>> I use hadoop for a MapReduce job in my system. I would like to have the job
>> run very 5th minute. Are there any "distributed" timer job stuff in hadoop?
>> Of course I could setup a timer in an external timer framework (CRON or
>> something like that) that invokes the MapReduce job. But CRON is only
>> running on one particular machine, so if that machine goes down my job will
>> not be triggered. Then I could setup the timer on all or many machines, but
>> I would not like the job to be run in more than one instance every 5th
>> minute, so then the timer jobs would need to coordinate who is actually
>> starting the job "this time" and all the rest would just have to do nothing.
>> Guess I could come up with a solution to that - e.g. writing some "lock"
>> stuff using HDFS files or by using ZooKeeper. But I would really like if
>> someone had already solved the problem, and provided some kind of a
>> "distributed timer framework" running in a "cluster", so that I could just
>> register a timer job with the cluster, and then be sure that it is invoked
>> every 5th minute, no matter if one or two particular machines in the cluster
>> is down.
>>
>> Any suggestions are very welcome.
>>
>> Regards, Per Steffensen
>>
>>     
>
>
>
>

Re: Timer jobs

Posted by Ronen Itkin <ro...@taykey.com>.

Hi

Try to use Oozie for job coordination and work flows.



On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen <st...@designware.dk> wrote:

> Hi
>
> I use hadoop for a MapReduce job in my system. I would like to have the job
> run very 5th minute. Are there any "distributed" timer job stuff in hadoop?
> Of course I could setup a timer in an external timer framework (CRON or
> something like that) that invokes the MapReduce job. But CRON is only
> running on one particular machine, so if that machine goes down my job will
> not be triggered. Then I could setup the timer on all or many machines, but
> I would not like the job to be run in more than one instance every 5th
> minute, so then the timer jobs would need to coordinate who is actually
> starting the job "this time" and all the rest would just have to do nothing.
> Guess I could come up with a solution to that - e.g. writing some "lock"
> stuff using HDFS files or by using ZooKeeper. But I would really like if
> someone had already solved the problem, and provided some kind of a
> "distributed timer framework" running in a "cluster", so that I could just
> register a timer job with the cluster, and then be sure that it is invoked
> every 5th minute, no matter if one or two particular machines in the cluster
> is down.
>
> Any suggestions are very welcome.
>
> Regards, Per Steffensen
>



-- 
*
Ronen Itkin*
Taykey | www.taykey.com