You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Andrew Melo <an...@gmail.com> on 2018/08/07 22:11:38 UTC

SparkContext singleton get w/o create?

Hello,

One pain point with various Jupyter extensions [1][2] that provide
visual feedback about running spark processes is the lack of a public
API to introspect the web URL. The notebook server needs to know the
URL to find information about the current SparkContext.

Simply looking for "localhost:4040" works most of the time, but fails
if multiple spark notebooks are being run on the same host -- spark
increments the port for each new context, leading to confusion when
the notebooks are trying to probe the web interface for information.

I'd like to implement an analog to SparkContext.getOrCreate(), perhaps
called "getIfExists()" that returns the current singleton if it
exists, or None otherwise. The Jupyter code would then be able to use
this entrypoint to query Spark for an active Spark context, which it
could use to probe the web URL.

It's a minor change, but this would be my first contribution to Spark,
and I want to make sure my plan was kosher before I implemented it.

Thanks!
Andrew





[1] https://krishnan-r.github.io/sparkmonitor/

[2] https://github.com/mozilla/jupyter-spark

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: SparkContext singleton get w/o create?

Posted by Andrew Melo <an...@gmail.com>.

Hi,

I'm a long-time listener, first-time committer to spark, so this is
good to get my feet wet. I'm particularly interested in SPARK-23836,
which is an itch I may want to dive into and scratch myself in the
next month or so since it's pretty painful for our use-case.

Thanks!
Andrew

On Mon, Aug 27, 2018 at 2:20 PM, Holden Karau <ho...@pigscanfly.ca> wrote:
> Sure, I don't think you should wait on that being merged in. If you want to
> take the JIRA go ahead (although if you're already familiar with the Spark
> code base it might make sense to leave it as a starter issue for someone who
> is just getting started).
>
> On Mon, Aug 27, 2018 at 12:18 PM Andrew Melo <an...@gmail.com> wrote:
>>
>> Hi Holden,
>>
>> I'm agnostic to the approach (though it seems cleaner to have an
>> explicit API for it). If you would like, I can take that JIRA and
>> implement it (should be a 3-line function).
>>
>> Cheers
>> Andrew
>>
>> On Mon, Aug 27, 2018 at 2:14 PM, Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>> > Seems reasonable. We should probably add `getActiveSession` to the
>> > PySpark
>> > API (filed a starter JIRA
>> > https://issues.apache.org/jira/browse/SPARK-25255
>> > )
>> >
>> > On Mon, Aug 27, 2018 at 12:09 PM Andrew Melo <an...@gmail.com>
>> > wrote:
>> >>
>> >> Hello Sean, others -
>> >>
>> >> Just to confirm, is it OK for client applications to access
>> >> SparkContext._active_spark_context, if it wraps the accesses in `with
>> >> SparkContext._lock:`?
>> >>
>> >> If that's acceptable to Spark, I'll implement the modifications in the
>> >> Jupyter extensions.
>> >>
>> >> thanks!
>> >> Andrew
>> >>
>> >> On Tue, Aug 7, 2018 at 5:52 PM, Andrew Melo <an...@gmail.com>
>> >> wrote:
>> >> > Hi Sean,
>> >> >
>> >> > On Tue, Aug 7, 2018 at 5:44 PM, Sean Owen <sr...@gmail.com> wrote:
>> >> >> Ah, python.  How about SparkContext._active_spark_context then?
>> >> >
>> >> > Ah yes, that looks like the right member, but I'm a bit wary about
>> >> > depending on functionality of objects with leading underscores. I
>> >> > assumed that was "private" and subject to change. Is that something I
>> >> > should be unconcerned about.
>> >> >
>> >> > The other thought is that the accesses with SparkContext are
>> >> > protected
>> >> > by "SparkContext._lock" -- should I also use that lock?
>> >> >
>> >> > Thanks for your help!
>> >> > Andrew
>> >> >
>> >> >>
>> >> >> On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo <an...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi Sean,
>> >> >>>
>> >> >>> On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen <sr...@gmail.com> wrote:
>> >> >>> > Is SparkSession.getActiveSession what you're looking for?
>> >> >>>
>> >> >>> Perhaps -- though there's not a corresponding python function, and
>> >> >>> I'm
>> >> >>> not exactly sure how to call the scala getActiveSession without
>> >> >>> first
>> >> >>> instantiating the python version and causing a JVM to start.
>> >> >>>
>> >> >>> Is there an easy way to call getActiveSession that doesn't start a
>> >> >>> JVM?
>> >> >>>
>> >> >>> Cheers
>> >> >>> Andrew
>> >> >>>
>> >> >>> >
>> >> >>> > On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo
>> >> >>> > <an...@gmail.com>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> Hello,
>> >> >>> >>
>> >> >>> >> One pain point with various Jupyter extensions [1][2] that
>> >> >>> >> provide
>> >> >>> >> visual feedback about running spark processes is the lack of a
>> >> >>> >> public
>> >> >>> >> API to introspect the web URL. The notebook server needs to know
>> >> >>> >> the
>> >> >>> >> URL to find information about the current SparkContext.
>> >> >>> >>
>> >> >>> >> Simply looking for "localhost:4040" works most of the time, but
>> >> >>> >> fails
>> >> >>> >> if multiple spark notebooks are being run on the same host --
>> >> >>> >> spark
>> >> >>> >> increments the port for each new context, leading to confusion
>> >> >>> >> when
>> >> >>> >> the notebooks are trying to probe the web interface for
>> >> >>> >> information.
>> >> >>> >>
>> >> >>> >> I'd like to implement an analog to SparkContext.getOrCreate(),
>> >> >>> >> perhaps
>> >> >>> >> called "getIfExists()" that returns the current singleton if it
>> >> >>> >> exists, or None otherwise. The Jupyter code would then be able
>> >> >>> >> to
>> >> >>> >> use
>> >> >>> >> this entrypoint to query Spark for an active Spark context,
>> >> >>> >> which
>> >> >>> >> it
>> >> >>> >> could use to probe the web URL.
>> >> >>> >>
>> >> >>> >> It's a minor change, but this would be my first contribution to
>> >> >>> >> Spark,
>> >> >>> >> and I want to make sure my plan was kosher before I implemented
>> >> >>> >> it.
>> >> >>> >>
>> >> >>> >> Thanks!
>> >> >>> >> Andrew
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> [1] https://krishnan-r.github.io/sparkmonitor/
>> >> >>> >>
>> >> >>> >> [2] https://github.com/mozilla/jupyter-spark
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> ---------------------------------------------------------------------
>> >> >>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >> >>> >>
>> >> >>> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >>
>> >
>> >
>> > --
>> > Twitter: https://twitter.com/holdenkarau
>> > Books (Learning Spark, High Performance Spark, etc.):
>> > https://amzn.to/2MaRAG9
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: SparkContext singleton get w/o create?

Posted by Holden Karau <ho...@pigscanfly.ca>.

Sure, I don't think you should wait on that being merged in. If you want to
take the JIRA go ahead (although if you're already familiar with the Spark
code base it might make sense to leave it as a starter issue for someone
who is just getting started).

On Mon, Aug 27, 2018 at 12:18 PM Andrew Melo <an...@gmail.com> wrote:

> Hi Holden,
>
> I'm agnostic to the approach (though it seems cleaner to have an
> explicit API for it). If you would like, I can take that JIRA and
> implement it (should be a 3-line function).
>
> Cheers
> Andrew
>
> On Mon, Aug 27, 2018 at 2:14 PM, Holden Karau <ho...@pigscanfly.ca>
> wrote:
> > Seems reasonable. We should probably add `getActiveSession` to the
> PySpark
> > API (filed a starter JIRA
> https://issues.apache.org/jira/browse/SPARK-25255
> > )
> >
> > On Mon, Aug 27, 2018 at 12:09 PM Andrew Melo <an...@gmail.com>
> wrote:
> >>
> >> Hello Sean, others -
> >>
> >> Just to confirm, is it OK for client applications to access
> >> SparkContext._active_spark_context, if it wraps the accesses in `with
> >> SparkContext._lock:`?
> >>
> >> If that's acceptable to Spark, I'll implement the modifications in the
> >> Jupyter extensions.
> >>
> >> thanks!
> >> Andrew
> >>
> >> On Tue, Aug 7, 2018 at 5:52 PM, Andrew Melo <an...@gmail.com>
> wrote:
> >> > Hi Sean,
> >> >
> >> > On Tue, Aug 7, 2018 at 5:44 PM, Sean Owen <sr...@gmail.com> wrote:
> >> >> Ah, python.  How about SparkContext._active_spark_context then?
> >> >
> >> > Ah yes, that looks like the right member, but I'm a bit wary about
> >> > depending on functionality of objects with leading underscores. I
> >> > assumed that was "private" and subject to change. Is that something I
> >> > should be unconcerned about.
> >> >
> >> > The other thought is that the accesses with SparkContext are protected
> >> > by "SparkContext._lock" -- should I also use that lock?
> >> >
> >> > Thanks for your help!
> >> > Andrew
> >> >
> >> >>
> >> >> On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo <an...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi Sean,
> >> >>>
> >> >>> On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen <sr...@gmail.com> wrote:
> >> >>> > Is SparkSession.getActiveSession what you're looking for?
> >> >>>
> >> >>> Perhaps -- though there's not a corresponding python function, and
> I'm
> >> >>> not exactly sure how to call the scala getActiveSession without
> first
> >> >>> instantiating the python version and causing a JVM to start.
> >> >>>
> >> >>> Is there an easy way to call getActiveSession that doesn't start a
> >> >>> JVM?
> >> >>>
> >> >>> Cheers
> >> >>> Andrew
> >> >>>
> >> >>> >
> >> >>> > On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo <andrew.melo@gmail.com
> >
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> Hello,
> >> >>> >>
> >> >>> >> One pain point with various Jupyter extensions [1][2] that
> provide
> >> >>> >> visual feedback about running spark processes is the lack of a
> >> >>> >> public
> >> >>> >> API to introspect the web URL. The notebook server needs to know
> >> >>> >> the
> >> >>> >> URL to find information about the current SparkContext.
> >> >>> >>
> >> >>> >> Simply looking for "localhost:4040" works most of the time, but
> >> >>> >> fails
> >> >>> >> if multiple spark notebooks are being run on the same host --
> spark
> >> >>> >> increments the port for each new context, leading to confusion
> when
> >> >>> >> the notebooks are trying to probe the web interface for
> >> >>> >> information.
> >> >>> >>
> >> >>> >> I'd like to implement an analog to SparkContext.getOrCreate(),
> >> >>> >> perhaps
> >> >>> >> called "getIfExists()" that returns the current singleton if it
> >> >>> >> exists, or None otherwise. The Jupyter code would then be able to
> >> >>> >> use
> >> >>> >> this entrypoint to query Spark for an active Spark context, which
> >> >>> >> it
> >> >>> >> could use to probe the web URL.
> >> >>> >>
> >> >>> >> It's a minor change, but this would be my first contribution to
> >> >>> >> Spark,
> >> >>> >> and I want to make sure my plan was kosher before I implemented
> it.
> >> >>> >>
> >> >>> >> Thanks!
> >> >>> >> Andrew
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> [1] https://krishnan-r.github.io/sparkmonitor/
> >> >>> >>
> >> >>> >> [2] https://github.com/mozilla/jupyter-spark
> >> >>> >>
> >> >>> >>
> >> >>> >>
> ---------------------------------------------------------------------
> >> >>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >> >>> >>
> >> >>> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>
> >
> >
> > --
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> > https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: SparkContext singleton get w/o create?

Posted by Andrew Melo <an...@gmail.com>.

Hi Holden,

I'm agnostic to the approach (though it seems cleaner to have an
explicit API for it). If you would like, I can take that JIRA and
implement it (should be a 3-line function).

Cheers
Andrew

On Mon, Aug 27, 2018 at 2:14 PM, Holden Karau <ho...@pigscanfly.ca> wrote:
> Seems reasonable. We should probably add `getActiveSession` to the PySpark
> API (filed a starter JIRA https://issues.apache.org/jira/browse/SPARK-25255
> )
>
> On Mon, Aug 27, 2018 at 12:09 PM Andrew Melo <an...@gmail.com> wrote:
>>
>> Hello Sean, others -
>>
>> Just to confirm, is it OK for client applications to access
>> SparkContext._active_spark_context, if it wraps the accesses in `with
>> SparkContext._lock:`?
>>
>> If that's acceptable to Spark, I'll implement the modifications in the
>> Jupyter extensions.
>>
>> thanks!
>> Andrew
>>
>> On Tue, Aug 7, 2018 at 5:52 PM, Andrew Melo <an...@gmail.com> wrote:
>> > Hi Sean,
>> >
>> > On Tue, Aug 7, 2018 at 5:44 PM, Sean Owen <sr...@gmail.com> wrote:
>> >> Ah, python.  How about SparkContext._active_spark_context then?
>> >
>> > Ah yes, that looks like the right member, but I'm a bit wary about
>> > depending on functionality of objects with leading underscores. I
>> > assumed that was "private" and subject to change. Is that something I
>> > should be unconcerned about.
>> >
>> > The other thought is that the accesses with SparkContext are protected
>> > by "SparkContext._lock" -- should I also use that lock?
>> >
>> > Thanks for your help!
>> > Andrew
>> >
>> >>
>> >> On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo <an...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi Sean,
>> >>>
>> >>> On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen <sr...@gmail.com> wrote:
>> >>> > Is SparkSession.getActiveSession what you're looking for?
>> >>>
>> >>> Perhaps -- though there's not a corresponding python function, and I'm
>> >>> not exactly sure how to call the scala getActiveSession without first
>> >>> instantiating the python version and causing a JVM to start.
>> >>>
>> >>> Is there an easy way to call getActiveSession that doesn't start a
>> >>> JVM?
>> >>>
>> >>> Cheers
>> >>> Andrew
>> >>>
>> >>> >
>> >>> > On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo <an...@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hello,
>> >>> >>
>> >>> >> One pain point with various Jupyter extensions [1][2] that provide
>> >>> >> visual feedback about running spark processes is the lack of a
>> >>> >> public
>> >>> >> API to introspect the web URL. The notebook server needs to know
>> >>> >> the
>> >>> >> URL to find information about the current SparkContext.
>> >>> >>
>> >>> >> Simply looking for "localhost:4040" works most of the time, but
>> >>> >> fails
>> >>> >> if multiple spark notebooks are being run on the same host -- spark
>> >>> >> increments the port for each new context, leading to confusion when
>> >>> >> the notebooks are trying to probe the web interface for
>> >>> >> information.
>> >>> >>
>> >>> >> I'd like to implement an analog to SparkContext.getOrCreate(),
>> >>> >> perhaps
>> >>> >> called "getIfExists()" that returns the current singleton if it
>> >>> >> exists, or None otherwise. The Jupyter code would then be able to
>> >>> >> use
>> >>> >> this entrypoint to query Spark for an active Spark context, which
>> >>> >> it
>> >>> >> could use to probe the web URL.
>> >>> >>
>> >>> >> It's a minor change, but this would be my first contribution to
>> >>> >> Spark,
>> >>> >> and I want to make sure my plan was kosher before I implemented it.
>> >>> >>
>> >>> >> Thanks!
>> >>> >> Andrew
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> [1] https://krishnan-r.github.io/sparkmonitor/
>> >>> >>
>> >>> >> [2] https://github.com/mozilla/jupyter-spark
>> >>> >>
>> >>> >>
>> >>> >> ---------------------------------------------------------------------
>> >>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >>> >>
>> >>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: SparkContext singleton get w/o create?

Posted by Holden Karau <ho...@pigscanfly.ca>.

Seems reasonable. We should probably add `getActiveSession` to the PySpark
API (filed a starter JIRA https://issues.apache.org/jira/browse/SPARK-25255
)

On Mon, Aug 27, 2018 at 12:09 PM Andrew Melo <an...@gmail.com> wrote:

> Hello Sean, others -
>
> Just to confirm, is it OK for client applications to access
> SparkContext._active_spark_context, if it wraps the accesses in `with
> SparkContext._lock:`?
>
> If that's acceptable to Spark, I'll implement the modifications in the
> Jupyter extensions.
>
> thanks!
> Andrew
>
> On Tue, Aug 7, 2018 at 5:52 PM, Andrew Melo <an...@gmail.com> wrote:
> > Hi Sean,
> >
> > On Tue, Aug 7, 2018 at 5:44 PM, Sean Owen <sr...@gmail.com> wrote:
> >> Ah, python.  How about SparkContext._active_spark_context then?
> >
> > Ah yes, that looks like the right member, but I'm a bit wary about
> > depending on functionality of objects with leading underscores. I
> > assumed that was "private" and subject to change. Is that something I
> > should be unconcerned about.
> >
> > The other thought is that the accesses with SparkContext are protected
> > by "SparkContext._lock" -- should I also use that lock?
> >
> > Thanks for your help!
> > Andrew
> >
> >>
> >> On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo <an...@gmail.com>
> wrote:
> >>>
> >>> Hi Sean,
> >>>
> >>> On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen <sr...@gmail.com> wrote:
> >>> > Is SparkSession.getActiveSession what you're looking for?
> >>>
> >>> Perhaps -- though there's not a corresponding python function, and I'm
> >>> not exactly sure how to call the scala getActiveSession without first
> >>> instantiating the python version and causing a JVM to start.
> >>>
> >>> Is there an easy way to call getActiveSession that doesn't start a JVM?
> >>>
> >>> Cheers
> >>> Andrew
> >>>
> >>> >
> >>> > On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo <an...@gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> Hello,
> >>> >>
> >>> >> One pain point with various Jupyter extensions [1][2] that provide
> >>> >> visual feedback about running spark processes is the lack of a
> public
> >>> >> API to introspect the web URL. The notebook server needs to know the
> >>> >> URL to find information about the current SparkContext.
> >>> >>
> >>> >> Simply looking for "localhost:4040" works most of the time, but
> fails
> >>> >> if multiple spark notebooks are being run on the same host -- spark
> >>> >> increments the port for each new context, leading to confusion when
> >>> >> the notebooks are trying to probe the web interface for information.
> >>> >>
> >>> >> I'd like to implement an analog to SparkContext.getOrCreate(),
> perhaps
> >>> >> called "getIfExists()" that returns the current singleton if it
> >>> >> exists, or None otherwise. The Jupyter code would then be able to
> use
> >>> >> this entrypoint to query Spark for an active Spark context, which it
> >>> >> could use to probe the web URL.
> >>> >>
> >>> >> It's a minor change, but this would be my first contribution to
> Spark,
> >>> >> and I want to make sure my plan was kosher before I implemented it.
> >>> >>
> >>> >> Thanks!
> >>> >> Andrew
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> [1] https://krishnan-r.github.io/sparkmonitor/
> >>> >>
> >>> >> [2] https://github.com/mozilla/jupyter-spark
> >>> >>
> >>> >>
> ---------------------------------------------------------------------
> >>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>> >>
> >>> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: SparkContext singleton get w/o create?

Posted by Andrew Melo <an...@gmail.com>.

Hello Sean, others -

Just to confirm, is it OK for client applications to access
SparkContext._active_spark_context, if it wraps the accesses in `with
SparkContext._lock:`?

If that's acceptable to Spark, I'll implement the modifications in the
Jupyter extensions.

thanks!
Andrew

On Tue, Aug 7, 2018 at 5:52 PM, Andrew Melo <an...@gmail.com> wrote:
> Hi Sean,
>
> On Tue, Aug 7, 2018 at 5:44 PM, Sean Owen <sr...@gmail.com> wrote:
>> Ah, python.  How about SparkContext._active_spark_context then?
>
> Ah yes, that looks like the right member, but I'm a bit wary about
> depending on functionality of objects with leading underscores. I
> assumed that was "private" and subject to change. Is that something I
> should be unconcerned about.
>
> The other thought is that the accesses with SparkContext are protected
> by "SparkContext._lock" -- should I also use that lock?
>
> Thanks for your help!
> Andrew
>
>>
>> On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo <an...@gmail.com> wrote:
>>>
>>> Hi Sean,
>>>
>>> On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen <sr...@gmail.com> wrote:
>>> > Is SparkSession.getActiveSession what you're looking for?
>>>
>>> Perhaps -- though there's not a corresponding python function, and I'm
>>> not exactly sure how to call the scala getActiveSession without first
>>> instantiating the python version and causing a JVM to start.
>>>
>>> Is there an easy way to call getActiveSession that doesn't start a JVM?
>>>
>>> Cheers
>>> Andrew
>>>
>>> >
>>> > On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo <an...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hello,
>>> >>
>>> >> One pain point with various Jupyter extensions [1][2] that provide
>>> >> visual feedback about running spark processes is the lack of a public
>>> >> API to introspect the web URL. The notebook server needs to know the
>>> >> URL to find information about the current SparkContext.
>>> >>
>>> >> Simply looking for "localhost:4040" works most of the time, but fails
>>> >> if multiple spark notebooks are being run on the same host -- spark
>>> >> increments the port for each new context, leading to confusion when
>>> >> the notebooks are trying to probe the web interface for information.
>>> >>
>>> >> I'd like to implement an analog to SparkContext.getOrCreate(), perhaps
>>> >> called "getIfExists()" that returns the current singleton if it
>>> >> exists, or None otherwise. The Jupyter code would then be able to use
>>> >> this entrypoint to query Spark for an active Spark context, which it
>>> >> could use to probe the web URL.
>>> >>
>>> >> It's a minor change, but this would be my first contribution to Spark,
>>> >> and I want to make sure my plan was kosher before I implemented it.
>>> >>
>>> >> Thanks!
>>> >> Andrew
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> [1] https://krishnan-r.github.io/sparkmonitor/
>>> >>
>>> >> [2] https://github.com/mozilla/jupyter-spark
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >>
>>> >

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: SparkContext singleton get w/o create?

Posted by Andrew Melo <an...@gmail.com>.

Hi Sean,

On Tue, Aug 7, 2018 at 5:44 PM, Sean Owen <sr...@gmail.com> wrote:
> Ah, python.  How about SparkContext._active_spark_context then?

Ah yes, that looks like the right member, but I'm a bit wary about
depending on functionality of objects with leading underscores. I
assumed that was "private" and subject to change. Is that something I
should be unconcerned about.

The other thought is that the accesses with SparkContext are protected
by "SparkContext._lock" -- should I also use that lock?

Thanks for your help!
Andrew

>
> On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo <an...@gmail.com> wrote:
>>
>> Hi Sean,
>>
>> On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen <sr...@gmail.com> wrote:
>> > Is SparkSession.getActiveSession what you're looking for?
>>
>> Perhaps -- though there's not a corresponding python function, and I'm
>> not exactly sure how to call the scala getActiveSession without first
>> instantiating the python version and causing a JVM to start.
>>
>> Is there an easy way to call getActiveSession that doesn't start a JVM?
>>
>> Cheers
>> Andrew
>>
>> >
>> > On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo <an...@gmail.com>
>> > wrote:
>> >>
>> >> Hello,
>> >>
>> >> One pain point with various Jupyter extensions [1][2] that provide
>> >> visual feedback about running spark processes is the lack of a public
>> >> API to introspect the web URL. The notebook server needs to know the
>> >> URL to find information about the current SparkContext.
>> >>
>> >> Simply looking for "localhost:4040" works most of the time, but fails
>> >> if multiple spark notebooks are being run on the same host -- spark
>> >> increments the port for each new context, leading to confusion when
>> >> the notebooks are trying to probe the web interface for information.
>> >>
>> >> I'd like to implement an analog to SparkContext.getOrCreate(), perhaps
>> >> called "getIfExists()" that returns the current singleton if it
>> >> exists, or None otherwise. The Jupyter code would then be able to use
>> >> this entrypoint to query Spark for an active Spark context, which it
>> >> could use to probe the web URL.
>> >>
>> >> It's a minor change, but this would be my first contribution to Spark,
>> >> and I want to make sure my plan was kosher before I implemented it.
>> >>
>> >> Thanks!
>> >> Andrew
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> [1] https://krishnan-r.github.io/sparkmonitor/
>> >>
>> >> [2] https://github.com/mozilla/jupyter-spark
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >>
>> >

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: SparkContext singleton get w/o create?

Posted by Sean Owen <sr...@gmail.com>.

Ah, python.  How about SparkContext._active_spark_context then?

On Tue, Aug 7, 2018 at 5:34 PM Andrew Melo <an...@gmail.com> wrote:

> Hi Sean,
>
> On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen <sr...@gmail.com> wrote:
> > Is SparkSession.getActiveSession what you're looking for?
>
> Perhaps -- though there's not a corresponding python function, and I'm
> not exactly sure how to call the scala getActiveSession without first
> instantiating the python version and causing a JVM to start.
>
> Is there an easy way to call getActiveSession that doesn't start a JVM?
>
> Cheers
> Andrew
>
> >
> > On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo <an...@gmail.com>
> wrote:
> >>
> >> Hello,
> >>
> >> One pain point with various Jupyter extensions [1][2] that provide
> >> visual feedback about running spark processes is the lack of a public
> >> API to introspect the web URL. The notebook server needs to know the
> >> URL to find information about the current SparkContext.
> >>
> >> Simply looking for "localhost:4040" works most of the time, but fails
> >> if multiple spark notebooks are being run on the same host -- spark
> >> increments the port for each new context, leading to confusion when
> >> the notebooks are trying to probe the web interface for information.
> >>
> >> I'd like to implement an analog to SparkContext.getOrCreate(), perhaps
> >> called "getIfExists()" that returns the current singleton if it
> >> exists, or None otherwise. The Jupyter code would then be able to use
> >> this entrypoint to query Spark for an active Spark context, which it
> >> could use to probe the web URL.
> >>
> >> It's a minor change, but this would be my first contribution to Spark,
> >> and I want to make sure my plan was kosher before I implemented it.
> >>
> >> Thanks!
> >> Andrew
> >>
> >>
> >>
> >>
> >>
> >> [1] https://krishnan-r.github.io/sparkmonitor/
> >>
> >> [2] https://github.com/mozilla/jupyter-spark
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>
> >
>

Re: SparkContext singleton get w/o create?

Posted by Andrew Melo <an...@gmail.com>.

Hi Sean,

On Tue, Aug 7, 2018 at 5:16 PM, Sean Owen <sr...@gmail.com> wrote:
> Is SparkSession.getActiveSession what you're looking for?

Perhaps -- though there's not a corresponding python function, and I'm
not exactly sure how to call the scala getActiveSession without first
instantiating the python version and causing a JVM to start.

Is there an easy way to call getActiveSession that doesn't start a JVM?

Cheers
Andrew

>
> On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo <an...@gmail.com> wrote:
>>
>> Hello,
>>
>> One pain point with various Jupyter extensions [1][2] that provide
>> visual feedback about running spark processes is the lack of a public
>> API to introspect the web URL. The notebook server needs to know the
>> URL to find information about the current SparkContext.
>>
>> Simply looking for "localhost:4040" works most of the time, but fails
>> if multiple spark notebooks are being run on the same host -- spark
>> increments the port for each new context, leading to confusion when
>> the notebooks are trying to probe the web interface for information.
>>
>> I'd like to implement an analog to SparkContext.getOrCreate(), perhaps
>> called "getIfExists()" that returns the current singleton if it
>> exists, or None otherwise. The Jupyter code would then be able to use
>> this entrypoint to query Spark for an active Spark context, which it
>> could use to probe the web URL.
>>
>> It's a minor change, but this would be my first contribution to Spark,
>> and I want to make sure my plan was kosher before I implemented it.
>>
>> Thanks!
>> Andrew
>>
>>
>>
>>
>>
>> [1] https://krishnan-r.github.io/sparkmonitor/
>>
>> [2] https://github.com/mozilla/jupyter-spark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: SparkContext singleton get w/o create?

Posted by Sean Owen <sr...@gmail.com>.

Is SparkSession.getActiveSession what you're looking for?

On Tue, Aug 7, 2018 at 5:11 PM Andrew Melo <an...@gmail.com> wrote:

> Hello,
>
> One pain point with various Jupyter extensions [1][2] that provide
> visual feedback about running spark processes is the lack of a public
> API to introspect the web URL. The notebook server needs to know the
> URL to find information about the current SparkContext.
>
> Simply looking for "localhost:4040" works most of the time, but fails
> if multiple spark notebooks are being run on the same host -- spark
> increments the port for each new context, leading to confusion when
> the notebooks are trying to probe the web interface for information.
>
> I'd like to implement an analog to SparkContext.getOrCreate(), perhaps
> called "getIfExists()" that returns the current singleton if it
> exists, or None otherwise. The Jupyter code would then be able to use
> this entrypoint to query Spark for an active Spark context, which it
> could use to probe the web URL.
>
> It's a minor change, but this would be my first contribution to Spark,
> and I want to make sure my plan was kosher before I implemented it.
>
> Thanks!
> Andrew
>
>
>
>
>
> [1] https://krishnan-r.github.io/sparkmonitor/
>
> [2] https://github.com/mozilla/jupyter-spark
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>