You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vikram Kone <vi...@gmail.com> on 2015/08/17 22:17:21 UTC

Need advice on a highly available job scheduler for spark jobs on Cassandra cluster

Hi.
I'm looking at existing open source workflow engines we can use for
scheduling spark jobs with intricate dependencies on a datastax cassandra
cluster. Currently we  are using crontab to schedule jobs and want to move
to something which is more robust and highly available.
There are 2 main problems with cron on cassandra are
1. Single point of failure: Our cron tasks that do spark-submit run on a
single machine and if that machine goes down in the cluster, all the jobs
are kaput till the node comes back up.
2. Can't easily specify job dependency between cron tasks to model a DAG.

Are there any workflow engines that work nicely with spark and cassandra in
high available fashion?
One of the engines I'm looking at is azkaban where we job authorizng and
dependency config is easy. But it again has a single point of failure for
azkaban master . I'm also open to run the workflow engine on a separate
cluster but since spark doesn't allow remote job submission, we are stuck
with running workflow engine on the same cassandra cluster.

Any advice on this in welcome

thx