You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by preeze <et...@gmail.com> on 2015/01/12 17:38:21 UTC

Apache Spark client high availability

Dear community,

I've been searching the internet for quite a while to find out what is the
best architecture to support HA for a spark client.

We run an application that connects to a standalone Spark cluster and caches
a big chuck of data for subsequent intensive calculations. To achieve HA
we'll need to run several instances of the application on different hosts.

Initially I explored the option to reuse (i.e. share) the same executors set
between SparkContext instances of all running applications. Found it
impossible.

So, every application, which creates an instance of SparkContext, has to
spawn its own executors. Externalizing and sharing executors' memory cache
with Tachyon is a semi-solution since each application's executors will keep
using their own set of CPU cores.

Spark-jobserver is another possibility. It manages SparkContext itself and
accepts job requests from multiple clients for the same context which is
brilliant. However, this becomes a new single point of failure.

Now I am exploring if it's possible to run the Spark cluster in YARN cluster
mode and connect to the driver from multiple clients.

Is there anything I am missing guys?
Any suggestion is highly appreciated!

--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-client-high-availability-tp10088.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Apache Spark client high availability

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

We usually run Spark in HA with the following stack:

-> Apache Mesos
-> Marathon - init/control system for starting, stopping, and maintaining
always-on applications.(Mainly SparkStreaming)
-> Chronos - general-purpose scheduler for Mesos, supports job dependency
graphs.
-> Spark Job Server - primarily for it's ability to reuse shared contexts
with multiple jobs

This thread has a better discussion
http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-td7935.html



Thanks
Best Regards

On Mon, Jan 12, 2015 at 10:08 PM, preeze <et...@gmail.com> wrote:

> Dear community,
>
> I've been searching the internet for quite a while to find out what is the
> best architecture to support HA for a spark client.
>
> We run an application that connects to a standalone Spark cluster and
> caches
> a big chuck of data for subsequent intensive calculations. To achieve HA
> we'll need to run several instances of the application on different hosts.
>
> Initially I explored the option to reuse (i.e. share) the same executors
> set
> between SparkContext instances of all running applications. Found it
> impossible.
>
> So, every application, which creates an instance of SparkContext, has to
> spawn its own executors. Externalizing and sharing executors' memory cache
> with Tachyon is a semi-solution since each application's executors will
> keep
> using their own set of CPU cores.
>
> Spark-jobserver is another possibility. It manages SparkContext itself and
> accepts job requests from multiple clients for the same context which is
> brilliant. However, this becomes a new single point of failure.
>
> Now I am exploring if it's possible to run the Spark cluster in YARN
> cluster
> mode and connect to the driver from multiple clients.
>
> Is there anything I am missing guys?
> Any suggestion is highly appreciated!
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-client-high-availability-tp10088.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>