You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jeff Evans <je...@gmail.com> on 2020/01/17 22:09:46 UTC

Is there a way to get the final web URL from an active Spark context

Given a session/context, we can get the UI web URL like this:

sparkSession.sparkContext.uiWebUrl

This gives me something like http://node-name.cluster-name:4040.  If
opening this from outside the cluster (ex: my laptop), this redirects
via HTTP 302 to something like
http://node-name.cluster-name:8088/proxy/redirect/application_1579210019853_0023/.
For discussion purposes, call the latter one the "final web URL".
Critically, this final URL is active even after the application
terminates.  The original uiWebUrl
(http://node-name.cluster-name:4040) is not available after the
application terminates, so one has to have captured the redirect in
time, if they want to provide a persistent link to that history server
UI entry (ex: for debugging purposes).

Is there a way, other than using some HTTP client, to detect what this
final URL will be directly from the SparkContext?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Is there a way to get the final web URL from an active Spark context

Posted by Jeff Evans <je...@gmail.com>.
To answer my own question, it turns out what I was after is the YARN
ResourceManager URL for the Spark application.  As alluded to in SPARK-20458
<https://issues.apache.org/jira/browse/SPARK-20458>, it's possible to use
the YARN API client to get this value.  Here is a gist that shows how it
can be done (given an instance of the Hadoop Configuration object):
https://gist.github.com/jeff303/8dab0e52dc227741b6605f576a317798


On Fri, Jan 17, 2020 at 4:09 PM Jeff Evans <je...@gmail.com>
wrote:

> Given a session/context, we can get the UI web URL like this:
>
> sparkSession.sparkContext.uiWebUrl
>
> This gives me something like http://node-name.cluster-name:4040.  If
> opening this from outside the cluster (ex: my laptop), this redirects
> via HTTP 302 to something like
>
> http://node-name.cluster-name:8088/proxy/redirect/application_1579210019853_0023/
> .
> For discussion purposes, call the latter one the "final web URL".
> Critically, this final URL is active even after the application
> terminates.  The original uiWebUrl
> (http://node-name.cluster-name:4040) is not available after the
> application terminates, so one has to have captured the redirect in
> time, if they want to provide a persistent link to that history server
> UI entry (ex: for debugging purposes).
>
> Is there a way, other than using some HTTP client, to detect what this
> final URL will be directly from the SparkContext?
>