You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Todd Nine <to...@spidertracks.co.nz> on 2010/08/24 01:02:36 UTC

Non Hadoop scheduling frameworks

Hi all,
  We're using Zookeeper for Leader Election and system monitoring.  We're
also using it for synchronizing our cluster wide jobs with  barriers.  We're
running into an issue where we now have a single job, but each node can fire
the job independently of others with different criteria in the job.  In the
event of a system failure, another node in our application cluster will need
to fire this Job.  I've used quartz previously (we're running Java 6), but
it simply isn't designed for the use case we have.  I found this article on
cloudera.

http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/


I've looked at both plugins, but they require hadoop.  We're not currently
running hadoop, we only have Cassandra.  Here are the 2 basic use cases we
need to support.

UC1: Synchronized Jobs
1. A job is fired across all nodes
2. The nodes wait until the barrier is entered by all participants
3. The nodes process the data and leave
4. On all nodes leaving the barrier, the Leader node marks the job as
complete.


UC2: Multiple Jobs per Node
1. A Job is scheduled for a future time on a specific node (usually the same
node that's creating the trigger)
2. A Trigger can be overwritten and cancelled without the job firing
3. In the event of a node failure, the Leader will take all pending jobs
from the failed node, and partition them across the remaining nodes.


Any input would be greatly appreciated.

Thanks,
Todd

Re: Non Hadoop scheduling frameworks

Posted by Thomas Koch <th...@koch.ro>.

Todd Nine:
> [...]
> UC1: Synchronized Jobs
> 1. A job is fired across all nodes
> 2. The nodes wait until the barrier is entered by all participants
> 3. The nodes process the data and leave
> 4. On all nodes leaving the barrier, the Leader node marks the job as
> complete.
> 
> UC2: Multiple Jobs per Node
> 1. A Job is scheduled for a future time on a specific node (usually the
> same node that's creating the trigger)
> 2. A Trigger can be overwritten and cancelled without the job firing
> 3. In the event of a node failure, the Leader will take all pending jobs
> from the failed node, and partition them across the remaining nodes.

Hi Todd,

we've implemented UC2 for an internal project with ZK. I'd love to make the 
code free, but I've to ask our product owner. It's a small company so this 
could go quickly. But I don't know how to convince them. They're so afraid of 
giving away stuff.
The basic idea is, that we've two "folders" in ZK, a work queue and a lock 
folder. The items (znodes) in the work queue a timestamp prefixed. Every node 
consuming the queue tries to create an ephemeral znode in the lock "folder" 
before starting on a work item. Work items are actually URLs and we lock on 
the domain. Since we also use a lock pool on every worker that only releases 
on overflow or timeout, we can reuse locks and also get "weak" locality for 
URLs of the same domain. - That's all the magic. Six java classes on top of 
our own ZK helper lib.

Best regards,

Thomas Koch, http://www.koch.ro

Re: Non Hadoop scheduling frameworks

Posted by Todd Nine <to...@spidertracks.co.nz>.

Thanks for the feedback.  I'm probably going to modify quartz to work
with Zookeeper to start and launch jobs.  Architecturally, I don't think
persisting Jobs or trigger history in ZK is a very good idea, it's
turning it into a persistent data store, which is not designed for.  I
was thinking I could change the core APIs in the following way.

Implement leader/follower election as a standalone module.  Is this
already done somewhere?   I know there's a recipe but if the code is
done that's less for me to do.

Implement an abstract JobStore implementation (ZooKeeperJobStore) with
the following properties

Default Case

1. All calls that deal with returning triggers will use the
follower/leader semantics.  All nodes (including the leader) will be
followers.  They will only be returned jobs they should run for the call
aquireNextTrigger
2. All calls to writing triggers will write triggers to the datastore
and to a trigger queue in ZK
3. The leader will pick up triggers from the queue, and distribute them
to the next available node via the ZK trigger queues per node.  Each
operation will attempt to be wisely partitioned.  In the first
implementation, it will simply schedule the job on a node that has the
least executions near the time specified for the trigger.  In the next
release, I could use average job duration semantics to try to avoid
scheduling overlapping jobs, especially in long running jobs.

Failover

1. The leader will scan all current followers when a follower leaves, or
after a new leader is designated.
2. For any node with jobs that is not currently a follower, it's
triggers will be re-written to the trigger queue from above
3. The redistribution semantics will fire from above

Does this sound reasonable?  After performing more research I think job
semantics such as partitioning and parallel processing are outside the
scope of how the scheduler should work.  Those semantics are more
internal to the job itself, and I think they should remain outside of
the scope of this project.

todd 
SENIOR SOFTWARE ENGINEER

todd nine| spidertracks ltd |  

On Tue, 2010-08-24 at 04:20 +0000, Ted Dunning wrote:

> These are pretty easy to solve with ZK.  Ephemerality, exclusive create,
> atomic update and file versions allow you to implement most of the semantics
> you need.
> 
> I don't know of any recipes available for this, but they would be worthy
> additions to ZK.
> 
> On Mon, Aug 23, 2010 at 11:33 PM, Todd Nine <to...@spidertracks.co.nz> wrote:
> 
> > Solving UC1 and UC2 via zookeeper or some other framework if one is
> > recommended.  We don't run Hadoop, just ZK and Cassandra as we don't have a
> > need for map/reduce.  I'm searching for any existing framework that can
> > perform standard time based scheduling in a distributed environment.  As I
> > said earlier, Quartz is the closest model to what we're looking for, but it
> > can't be used in a distributed parallel environment.  Any suggestions for a
> > system that could accomplish this would be helpful.
> >
> > Thanks,
> > Todd
> >
> > On 24 August 2010 11:27, Mahadev Konar <ma...@yahoo-inc.com> wrote:
> >
> > > Hi Todd,
> > >  Just to be clear, are you looking at solving UC1 and UC2 via zookeeper?
> > Or
> > > is this a broader question for scheduling on cassandra nodes? For the
> > latter
> > > this probably isnt the right mailing list.
> > >
> > > Thanks
> > > mahadev
> > >
> > >
> > > On 8/23/10 4:02 PM, "Todd Nine" <to...@spidertracks.co.nz> wrote:
> > >
> > > Hi all,
> > >  We're using Zookeeper for Leader Election and system monitoring.  We're
> > > also using it for synchronizing our cluster wide jobs with  barriers.
> > >  We're
> > > running into an issue where we now have a single job, but each node can
> > > fire
> > > the job independently of others with different criteria in the job.  In
> > the
> > > event of a system failure, another node in our application cluster will
> > > need
> > > to fire this Job.  I've used quartz previously (we're running Java 6),
> > but
> > > it simply isn't designed for the use case we have.  I found this article
> > on
> > > cloudera.
> > >
> > > http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/
> > >
> > >
> > > I've looked at both plugins, but they require hadoop.  We're not
> > currently
> > > running hadoop, we only have Cassandra.  Here are the 2 basic use cases
> > we
> > > need to support.
> > >
> > > UC1: Synchronized Jobs
> > > 1. A job is fired across all nodes
> > > 2. The nodes wait until the barrier is entered by all participants
> > > 3. The nodes process the data and leave
> > > 4. On all nodes leaving the barrier, the Leader node marks the job as
> > > complete.
> > >
> > >
> > > UC2: Multiple Jobs per Node
> > > 1. A Job is scheduled for a future time on a specific node (usually the
> > > same
> > > node that's creating the trigger)
> > > 2. A Trigger can be overwritten and cancelled without the job firing
> > > 3. In the event of a node failure, the Leader will take all pending jobs
> > > from the failed node, and partition them across the remaining nodes.
> > >
> > >
> > > Any input would be greatly appreciated.
> > >
> > > Thanks,
> > > Todd
> > >
> > >
> >

Re: Non Hadoop scheduling frameworks

Posted by Ted Dunning <te...@gmail.com>.

These are pretty easy to solve with ZK.  Ephemerality, exclusive create,
atomic update and file versions allow you to implement most of the semantics
you need.

I don't know of any recipes available for this, but they would be worthy
additions to ZK.

On Mon, Aug 23, 2010 at 11:33 PM, Todd Nine <to...@spidertracks.co.nz> wrote:

> Solving UC1 and UC2 via zookeeper or some other framework if one is
> recommended.  We don't run Hadoop, just ZK and Cassandra as we don't have a
> need for map/reduce.  I'm searching for any existing framework that can
> perform standard time based scheduling in a distributed environment.  As I
> said earlier, Quartz is the closest model to what we're looking for, but it
> can't be used in a distributed parallel environment.  Any suggestions for a
> system that could accomplish this would be helpful.
>
> Thanks,
> Todd
>
> On 24 August 2010 11:27, Mahadev Konar <ma...@yahoo-inc.com> wrote:
>
> > Hi Todd,
> >  Just to be clear, are you looking at solving UC1 and UC2 via zookeeper?
> Or
> > is this a broader question for scheduling on cassandra nodes? For the
> latter
> > this probably isnt the right mailing list.
> >
> > Thanks
> > mahadev
> >
> >
> > On 8/23/10 4:02 PM, "Todd Nine" <to...@spidertracks.co.nz> wrote:
> >
> > Hi all,
> >  We're using Zookeeper for Leader Election and system monitoring.  We're
> > also using it for synchronizing our cluster wide jobs with  barriers.
> >  We're
> > running into an issue where we now have a single job, but each node can
> > fire
> > the job independently of others with different criteria in the job.  In
> the
> > event of a system failure, another node in our application cluster will
> > need
> > to fire this Job.  I've used quartz previously (we're running Java 6),
> but
> > it simply isn't designed for the use case we have.  I found this article
> on
> > cloudera.
> >
> > http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/
> >
> >
> > I've looked at both plugins, but they require hadoop.  We're not
> currently
> > running hadoop, we only have Cassandra.  Here are the 2 basic use cases
> we
> > need to support.
> >
> > UC1: Synchronized Jobs
> > 1. A job is fired across all nodes
> > 2. The nodes wait until the barrier is entered by all participants
> > 3. The nodes process the data and leave
> > 4. On all nodes leaving the barrier, the Leader node marks the job as
> > complete.
> >
> >
> > UC2: Multiple Jobs per Node
> > 1. A Job is scheduled for a future time on a specific node (usually the
> > same
> > node that's creating the trigger)
> > 2. A Trigger can be overwritten and cancelled without the job firing
> > 3. In the event of a node failure, the Leader will take all pending jobs
> > from the failed node, and partition them across the remaining nodes.
> >
> >
> > Any input would be greatly appreciated.
> >
> > Thanks,
> > Todd
> >
> >
>

Re: Non Hadoop scheduling frameworks

Posted by Todd Nine <to...@spidertracks.co.nz>.

Solving UC1 and UC2 via zookeeper or some other framework if one is
recommended.  We don't run Hadoop, just ZK and Cassandra as we don't have a
need for map/reduce.  I'm searching for any existing framework that can
perform standard time based scheduling in a distributed environment.  As I
said earlier, Quartz is the closest model to what we're looking for, but it
can't be used in a distributed parallel environment.  Any suggestions for a
system that could accomplish this would be helpful.

Thanks,
Todd

On 24 August 2010 11:27, Mahadev Konar <ma...@yahoo-inc.com> wrote:

> Hi Todd,
>  Just to be clear, are you looking at solving UC1 and UC2 via zookeeper? Or
> is this a broader question for scheduling on cassandra nodes? For the latter
> this probably isnt the right mailing list.
>
> Thanks
> mahadev
>
>
> On 8/23/10 4:02 PM, "Todd Nine" <to...@spidertracks.co.nz> wrote:
>
> Hi all,
>  We're using Zookeeper for Leader Election and system monitoring.  We're
> also using it for synchronizing our cluster wide jobs with  barriers.
>  We're
> running into an issue where we now have a single job, but each node can
> fire
> the job independently of others with different criteria in the job.  In the
> event of a system failure, another node in our application cluster will
> need
> to fire this Job.  I've used quartz previously (we're running Java 6), but
> it simply isn't designed for the use case we have.  I found this article on
> cloudera.
>
> http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/
>
>
> I've looked at both plugins, but they require hadoop.  We're not currently
> running hadoop, we only have Cassandra.  Here are the 2 basic use cases we
> need to support.
>
> UC1: Synchronized Jobs
> 1. A job is fired across all nodes
> 2. The nodes wait until the barrier is entered by all participants
> 3. The nodes process the data and leave
> 4. On all nodes leaving the barrier, the Leader node marks the job as
> complete.
>
>
> UC2: Multiple Jobs per Node
> 1. A Job is scheduled for a future time on a specific node (usually the
> same
> node that's creating the trigger)
> 2. A Trigger can be overwritten and cancelled without the job firing
> 3. In the event of a node failure, the Leader will take all pending jobs
> from the failed node, and partition them across the remaining nodes.
>
>
> Any input would be greatly appreciated.
>
> Thanks,
> Todd
>
>

Re: Non Hadoop scheduling frameworks

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

Hi Todd,
Just to be clear, are you looking at solving UC1 and UC2 via zookeeper? Or is this a broader question for scheduling on cassandra nodes? For the latter this probably isnt the right mailing list.

Thanks
mahadev

On 8/23/10 4:02 PM, "Todd Nine" <to...@spidertracks.co.nz> wrote:

Hi all,
We're using Zookeeper for Leader Election and system monitoring. We're
also using it for synchronizing our cluster wide jobs with barriers. We're
running into an issue where we now have a single job, but each node can fire
the job independently of others with different criteria in the job. In the
event of a system failure, another node in our application cluster will need
to fire this Job. I've used quartz previously (we're running Java 6), but
it simply isn't designed for the use case we have. I found this article on
cloudera.

http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/

I've looked at both plugins, but they require hadoop. We're not currently
running hadoop, we only have Cassandra. Here are the 2 basic use cases we
need to support.

UC1: Synchronized Jobs
1. A job is fired across all nodes
2. The nodes wait until the barrier is entered by all participants
3. The nodes process the data and leave
4. On all nodes leaving the barrier, the Leader node marks the job as
complete.

UC2: Multiple Jobs per Node
1. A Job is scheduled for a future time on a specific node (usually the same
node that's creating the trigger)
2. A Trigger can be overwritten and cancelled without the job firing
3. In the event of a node failure, the Leader will take all pending jobs
from the failed node, and partition them across the remaining nodes.

Any input would be greatly appreciated.

Thanks,
Todd