You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by agateaaa <ag...@gmail.com> on 2018/05/09 20:50:02 UTC

Problem with Spark Master shutting down when zookeeper leader is shutdown

Dear Spark community,

Just wanted to bring this issue up which was filed for Spark 1.6.1 (
https://issues.apache.org/jira/browse/SPARK-15544) but also exists in Spark
2.3.0 (https://issues.apache.org/jira/browse/SPARK-23530)

We have run into this on production, where Spark Master shuts down if the
Zookeeper leader on another node is shutdown during our upgrade procedure.
Actually this is a serious issue in our opinion and defeats the purpose of
Spark being Highly Available.
Rest of the software components like Kafka are not affected by zookeeper
leader shut down.

The problem manifests in unusual way, since it affects not the node that is
being rebooted or upgraded but some other node in the cluster and it  can
go unnoticed, unless we are actively monitoring for this to happen on other
nodes during upgrade.

(BTW by upgrade we mean upgrade of our application software stack, which
might include changes to base operating system packages, not Spark version
upgrade)

Can we increase the priortiy of these two JIRA's or better still can
someone pick this issue up please?

Thank you
Ashwin