You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by preeze <et...@gmail.com> on 2015/01/15 12:52:04 UTC

Spark client reconnect to driver in yarn-cluster deployment mode

>From the official spark documentation
(http://spark.apache.org/docs/1.2.0/running-on-yarn.html):

"In yarn-cluster mode, the Spark driver runs inside an application master
process which is managed by YARN on the cluster, and the client can go away
after initiating the application."

Is there any designed way that the client connects back to the driver (still
running in YARN) for collecting results at a later stage?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-client-reconnect-to-driver-in-yarn-cluster-deployment-mode-tp10122.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Spark client reconnect to driver in yarn-cluster deployment mode

Posted by Andrew Or <an...@databricks.com>.
Hi Preeze,

> Is there any designed way that the client connects back to the driver
(still running in YARN) for collecting results at a later stage?

No, there is not support built into Spark for this. For this to happen
seamlessly the driver will have to start a server (pull model) or send the
results to some other server once the jobs complete (push model), both of
which add complexity to the driver. Alternatively, you can just poll on the
output files that your application produces; e.g. you can have your driver
write the results of a count to a file and poll on that file. Something
like that.

-Andrew

2015-01-19 5:59 GMT-08:00 Romi Kuntsman <ro...@totango.com>:

> "in yarn-client mode it only controls the environment of the executor
> launcher"
>
> So you either use yarn-client mode, and then your app keeps running and
> controlling the process
> Or you use yarn-cluster mode, and then you send a jar to YARN, and that jar
> should have code to report the result back to you
>
> *Romi Kuntsman*, *Big Data Engineer*
>  http://www.totango.com
>
> On Thu, Jan 15, 2015 at 1:52 PM, preeze <et...@gmail.com> wrote:
>
> > From the official spark documentation
> > (http://spark.apache.org/docs/1.2.0/running-on-yarn.html):
> >
> > "In yarn-cluster mode, the Spark driver runs inside an application master
> > process which is managed by YARN on the cluster, and the client can go
> away
> > after initiating the application."
> >
> > Is there any designed way that the client connects back to the driver
> > (still
> > running in YARN) for collecting results at a later stage?
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-client-reconnect-to-driver-in-yarn-cluster-deployment-mode-tp10122.html
> > Sent from the Apache Spark Developers List mailing list archive at
> > Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
> >
>

Re: Spark client reconnect to driver in yarn-cluster deployment mode

Posted by Romi Kuntsman <ro...@totango.com>.
"in yarn-client mode it only controls the environment of the executor
launcher"

So you either use yarn-client mode, and then your app keeps running and
controlling the process
Or you use yarn-cluster mode, and then you send a jar to YARN, and that jar
should have code to report the result back to you

*Romi Kuntsman*, *Big Data Engineer*
 http://www.totango.com

On Thu, Jan 15, 2015 at 1:52 PM, preeze <et...@gmail.com> wrote:

> From the official spark documentation
> (http://spark.apache.org/docs/1.2.0/running-on-yarn.html):
>
> "In yarn-cluster mode, the Spark driver runs inside an application master
> process which is managed by YARN on the cluster, and the client can go away
> after initiating the application."
>
> Is there any designed way that the client connects back to the driver
> (still
> running in YARN) for collecting results at a later stage?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-client-reconnect-to-driver-in-yarn-cluster-deployment-mode-tp10122.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>