You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by KhajaAsmath Mohammed <md...@gmail.com> on 2016/07/16 16:13:41 UTC

High availability with Spark

Hi,

could you please share your thoughts if anyone has idea on the below
topics.

   - How to achieve high availability with spark cluster? I have referred
   to the link *https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/exercises/spark-exercise-standalone-master-ha.html
   <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/exercises/spark-exercise-standalone-master-ha.html>*
   . is there any other way to do in cluster mode?
   - How to achieve high availability of spark driver? I have gone through
   documentation that it is achieved through check pointing directory. is
   there any other way?
   - what is the procedure to know the number of messages that have been
   consumed by the consumer? is there any way to tack the number of messages
   consumed in spark streaming.
   - I also want to save data from the spark streaming periodically and do
   the aggregation on that. lets say, save date for every hour/day etc and do
   aggregations on that.


Thanks,
Asmath.