You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Eric Friedman <er...@gmail.com> on 2014/12/25 19:01:34 UTC

action progress in ipython notebook?

Spark 1.2.0 is SO much more usable than previous releases -- many thanks to
the team for this release.

A question about progress of actions.  I can see how things are progressing
using the Spark UI.  I can also see the nice ASCII art animation on the
spark driver console.

Has anyone come up with a way to accomplish something similar in an iPython
notebook using pyspark?

Thanks
Eric

Re: action progress in ipython notebook?

Posted by Aniket Bhatnagar <an...@gmail.com>.
Thanks Josh. Looks promising. I will give it a try.

Thanks,
Aniket

On Mon, Dec 29, 2014, 9:55 PM Josh Rosen <ro...@gmail.com> wrote:

> It's accessed through the `statusTracker` field on SparkContext.
>
> *Scala*:
>
>
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkStatusTracker
>
> *Java*:
>
>
> https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkStatusTracker.html
>
> Don't create new instances of this yourself; instead, use sc.statusTracker
> to obtain the current instance.
>
> This API is missing a bunch of stuff that's available in the web UI, but
> it was designed so that we can add new methods without breaking binary
> compatibility. Although it would technically be a new feature, I'd hope
> that we can backport some additions to 1.2.1 since it's just adding a
> facade / stable interface in front of JobProgressListener and thus has
> little to no risk to introduce new bugs elsewhere in Spark.
>
>
>
> On Mon, Dec 29, 2014 at 3:08 AM, Aniket Bhatnagar <
> aniket.bhatnagar@gmail.com> wrote:
>
>> Hi Josh
>>
>> Is there documentation available for status API? I would like to use it.
>>
>> Thanks,
>> Aniket
>>
>>
>> On Sun Dec 28 2014 at 02:37:32 Josh Rosen <ro...@gmail.com> wrote:
>>
>>> The console progress bars are implemented on top of a new stable "status
>>> API" that was added in Spark 1.2.  It's possible to query job progress
>>> using this interface (in older versions of Spark, you could implement a
>>> custom SparkListener and maintain the counts of completed / running /
>>> failed tasks / stages yourself).
>>>
>>> There are actually several subtleties involved in implementing
>>> "job-level" progress bars which behave in an intuitive way; there's a
>>> pretty extensive discussion of the challenges at
>>> https://github.com/apache/spark/pull/3009.  Also, check out the pull
>>> request for the console progress bars for an interesting design discussion
>>> around how they handle parallel stages:
>>> https://github.com/apache/spark/pull/3029.
>>>
>>> I'm not sure about the plumbing that would be necessary to display live
>>> progress updates in the IPython notebook UI, though.  The general pattern
>>> would probably involve a mapping to relate notebook cells to Spark jobs
>>> (you can do this with job groups, I think), plus some periodic timer that
>>> polls the driver for the status of the current job in order to update the
>>> progress bar.
>>>
>>> For Spark 1.3, I'm working on designing a REST interface to accesses
>>> this type of job / stage / task progress information, as well as expanding
>>> the types of information exposed through the stable status API interface.
>>>
>>> - Josh
>>>
>>> On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman <
>>> eric.d.friedman@gmail.com> wrote:
>>>
>>>> Spark 1.2.0 is SO much more usable than previous releases -- many
>>>> thanks to the team for this release.
>>>>
>>>> A question about progress of actions.  I can see how things are
>>>> progressing using the Spark UI.  I can also see the nice ASCII art
>>>> animation on the spark driver console.
>>>>
>>>> Has anyone come up with a way to accomplish something similar in an
>>>> iPython notebook using pyspark?
>>>>
>>>> Thanks
>>>> Eric
>>>>
>>>
>>>
>

Re: action progress in ipython notebook?

Posted by Josh Rosen <ro...@gmail.com>.
It's accessed through the `statusTracker` field on SparkContext.

*Scala*:

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkStatusTracker

*Java*:

https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkStatusTracker.html

Don't create new instances of this yourself; instead, use sc.statusTracker
to obtain the current instance.

This API is missing a bunch of stuff that's available in the web UI, but it
was designed so that we can add new methods without breaking binary
compatibility. Although it would technically be a new feature, I'd hope
that we can backport some additions to 1.2.1 since it's just adding a
facade / stable interface in front of JobProgressListener and thus has
little to no risk to introduce new bugs elsewhere in Spark.



On Mon, Dec 29, 2014 at 3:08 AM, Aniket Bhatnagar <
aniket.bhatnagar@gmail.com> wrote:

> Hi Josh
>
> Is there documentation available for status API? I would like to use it.
>
> Thanks,
> Aniket
>
>
> On Sun Dec 28 2014 at 02:37:32 Josh Rosen <ro...@gmail.com> wrote:
>
>> The console progress bars are implemented on top of a new stable "status
>> API" that was added in Spark 1.2.  It's possible to query job progress
>> using this interface (in older versions of Spark, you could implement a
>> custom SparkListener and maintain the counts of completed / running /
>> failed tasks / stages yourself).
>>
>> There are actually several subtleties involved in implementing
>> "job-level" progress bars which behave in an intuitive way; there's a
>> pretty extensive discussion of the challenges at
>> https://github.com/apache/spark/pull/3009.  Also, check out the pull
>> request for the console progress bars for an interesting design discussion
>> around how they handle parallel stages:
>> https://github.com/apache/spark/pull/3029.
>>
>> I'm not sure about the plumbing that would be necessary to display live
>> progress updates in the IPython notebook UI, though.  The general pattern
>> would probably involve a mapping to relate notebook cells to Spark jobs
>> (you can do this with job groups, I think), plus some periodic timer that
>> polls the driver for the status of the current job in order to update the
>> progress bar.
>>
>> For Spark 1.3, I'm working on designing a REST interface to accesses this
>> type of job / stage / task progress information, as well as expanding the
>> types of information exposed through the stable status API interface.
>>
>> - Josh
>>
>> On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman <
>> eric.d.friedman@gmail.com> wrote:
>>
>>> Spark 1.2.0 is SO much more usable than previous releases -- many thanks
>>> to the team for this release.
>>>
>>> A question about progress of actions.  I can see how things are
>>> progressing using the Spark UI.  I can also see the nice ASCII art
>>> animation on the spark driver console.
>>>
>>> Has anyone come up with a way to accomplish something similar in an
>>> iPython notebook using pyspark?
>>>
>>> Thanks
>>> Eric
>>>
>>
>>

Re: action progress in ipython notebook?

Posted by Aniket Bhatnagar <an...@gmail.com>.
Hi Josh

Is there documentation available for status API? I would like to use it.

Thanks,
Aniket

On Sun Dec 28 2014 at 02:37:32 Josh Rosen <ro...@gmail.com> wrote:

> The console progress bars are implemented on top of a new stable "status
> API" that was added in Spark 1.2.  It's possible to query job progress
> using this interface (in older versions of Spark, you could implement a
> custom SparkListener and maintain the counts of completed / running /
> failed tasks / stages yourself).
>
> There are actually several subtleties involved in implementing "job-level"
> progress bars which behave in an intuitive way; there's a pretty extensive
> discussion of the challenges at https://github.com/apache/spark/pull/3009.
> Also, check out the pull request for the console progress bars for an
> interesting design discussion around how they handle parallel stages:
> https://github.com/apache/spark/pull/3029.
>
> I'm not sure about the plumbing that would be necessary to display live
> progress updates in the IPython notebook UI, though.  The general pattern
> would probably involve a mapping to relate notebook cells to Spark jobs
> (you can do this with job groups, I think), plus some periodic timer that
> polls the driver for the status of the current job in order to update the
> progress bar.
>
> For Spark 1.3, I'm working on designing a REST interface to accesses this
> type of job / stage / task progress information, as well as expanding the
> types of information exposed through the stable status API interface.
>
> - Josh
>
> On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman <eric.d.friedman@gmail.com
> > wrote:
>
>> Spark 1.2.0 is SO much more usable than previous releases -- many thanks
>> to the team for this release.
>>
>> A question about progress of actions.  I can see how things are
>> progressing using the Spark UI.  I can also see the nice ASCII art
>> animation on the spark driver console.
>>
>> Has anyone come up with a way to accomplish something similar in an
>> iPython notebook using pyspark?
>>
>> Thanks
>> Eric
>>
>
>

Re: action progress in ipython notebook?

Posted by Eric Friedman <er...@gmail.com>.

> On Dec 29, 2014, at 1:52 PM, Matei Zaharia <ma...@gmail.com> wrote:
> 
> Hey Eric, sounds like you are running into several issues, but thanks for reporting them. Just to comment on a few of these:
> 
>> I'm not seeing RDDs or SRDDs cached in the Spark UI. That page remains empty despite my calling cache().
> 
> This is expected until you compute the RDDs the first time and they actually get cached, though I can see why it would be confusing. They don't get registered with the UI until this happens.


Yes, I'm aware of this. In the cases I observed, I saw nothing even after a collect() action. I'll attempt to reproduce as PEBCAK is always a possibility and will log a jira if I'm able. 



> 
>> I think that attempts to access a directory of parquet files still requires reading the schema from the footer of every file. Painfully slow for terabytes of data.
> 
> This is also expected unfortunately due to the way Parquet works, but if you use Spark SQL to read these, the metadata gets cached after the first time you access a particular file.

It would be helpful if one could "hint" the scheme for a homogeneous data set, by indicating that the footer of any single file represents truth for all files in the directory. This is certainly the norm and is actually pretty similar to how inferSchema works when it take()s a single row from an RDD. If I tried to slip a heterogeneous set in there, well, shame on me. 



> 
>> Exceptions are still often reflective of a symptom rather than a root cause. For example, I had a join that was blowing up, but it was variously reported as insufficient kryro buffers and even an AST error in the SQL parser. 
>> 
>> Saving an SRDD to a table in Hive doesn't work. I had to sneak it in by saving to a file and the creating an external table.
> 
> Yeah, unfortunately this is a bug in 1.2.0 (https://issues.apache.org/jira/browse/SPARK-4825). It should be fixed for 1.2.1.
> 
>> In interactive work, it would be nice if I could interrupt the current job without killing the whole session.
> 
> You can actually do this with the "kill" button for that stage on the application web UI. It's a bit non-obvious but it does work.
> 
> Anyway, thanks for reporting this stuff. Don't be afraid to open JIRAs for such issues and for usability suggestions, especially if you have a way to reproduce them. Some of the usability things are "obvious" to people who know the UI inside-out but not to anyone else.

Thank you very much.  

Eric




> 
> Matei
> 
> 
>> The lower latency potential of Sparrow is also very intriguing. 
>> 
>> Getting GraphX for PySpark would be very welcome. 
>> 
>> It's easy to find fault, of course. I do want to say again how grateful I am to have a usable release in 1.2 and look forward to 1.3 and beyond with real excitement. 
>> 
>> ----
>> Eric Friedman
>> 
>>> On Dec 28, 2014, at 5:40 PM, Patrick Wendell <pw...@gmail.com> wrote:
>>> 
>>> Hey Eric,
>>> 
>>> I'm just curious - which specific features in 1.2 do you find most
>>> help with usability? This is a theme we're focusing on for 1.3 as
>>> well, so it's helpful to hear what makes a difference.
>>> 
>>> - Patrick
>>> 
>>> On Sun, Dec 28, 2014 at 1:36 AM, Eric Friedman
>>> <er...@gmail.com> wrote:
>>>> Hi Josh,
>>>> 
>>>> Thanks for the informative answer. Sounds like one should await your changes
>>>> in 1.3. As information, I found the following set of options for doing the
>>>> visual in a notebook.
>>>> 
>>>> http://nbviewer.ipython.org/github/ipython/ipython/blob/3607712653c66d63e0d7f13f073bde8c0f209ba8/docs/examples/notebooks/Animations_and_Progress.ipynb
>>>> 
>>>> 
>>>> On Dec 27, 2014, at 4:07 PM, Josh Rosen <ro...@gmail.com> wrote:
>>>> 
>>>> The console progress bars are implemented on top of a new stable "status
>>>> API" that was added in Spark 1.2.  It's possible to query job progress using
>>>> this interface (in older versions of Spark, you could implement a custom
>>>> SparkListener and maintain the counts of completed / running / failed tasks
>>>> / stages yourself).
>>>> 
>>>> There are actually several subtleties involved in implementing "job-level"
>>>> progress bars which behave in an intuitive way; there's a pretty extensive
>>>> discussion of the challenges at https://github.com/apache/spark/pull/3009.
>>>> Also, check out the pull request for the console progress bars for an
>>>> interesting design discussion around how they handle parallel stages:
>>>> https://github.com/apache/spark/pull/3029.
>>>> 
>>>> I'm not sure about the plumbing that would be necessary to display live
>>>> progress updates in the IPython notebook UI, though.  The general pattern
>>>> would probably involve a mapping to relate notebook cells to Spark jobs (you
>>>> can do this with job groups, I think), plus some periodic timer that polls
>>>> the driver for the status of the current job in order to update the progress
>>>> bar.
>>>> 
>>>> For Spark 1.3, I'm working on designing a REST interface to accesses this
>>>> type of job / stage / task progress information, as well as expanding the
>>>> types of information exposed through the stable status API interface.
>>>> 
>>>> - Josh
>>>> 
>>>> On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman <er...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Spark 1.2.0 is SO much more usable than previous releases -- many thanks
>>>>> to the team for this release.
>>>>> 
>>>>> A question about progress of actions.  I can see how things are
>>>>> progressing using the Spark UI.  I can also see the nice ASCII art animation
>>>>> on the spark driver console.
>>>>> 
>>>>> Has anyone come up with a way to accomplish something similar in an
>>>>> iPython notebook using pyspark?
>>>>> 
>>>>> Thanks
>>>>> Eric
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: action progress in ipython notebook?

Posted by Matei Zaharia <ma...@gmail.com>.
Hey Eric, sounds like you are running into several issues, but thanks for reporting them. Just to comment on a few of these:

> I'm not seeing RDDs or SRDDs cached in the Spark UI. That page remains empty despite my calling cache(). 

This is expected until you compute the RDDs the first time and they actually get cached, though I can see why it would be confusing. They don't get registered with the UI until this happens.

> I think that attempts to access a directory of parquet files still requires reading the schema from the footer of every file. Painfully slow for terabytes of data. 

This is also expected unfortunately due to the way Parquet works, but if you use Spark SQL to read these, the metadata gets cached after the first time you access a particular file.

> Exceptions are still often reflective of a symptom rather than a root cause. For example, I had a join that was blowing up, but it was variously reported as insufficient kryro buffers and even an AST error in the SQL parser. 
> 
> Saving an SRDD to a table in Hive doesn't work. I had to sneak it in by saving to a file and the creating an external table. 

Yeah, unfortunately this is a bug in 1.2.0 (https://issues.apache.org/jira/browse/SPARK-4825). It should be fixed for 1.2.1.

> In interactive work, it would be nice if I could interrupt the current job without killing the whole session.

You can actually do this with the "kill" button for that stage on the application web UI. It's a bit non-obvious but it does work.

Anyway, thanks for reporting this stuff. Don't be afraid to open JIRAs for such issues and for usability suggestions, especially if you have a way to reproduce them. Some of the usability things are "obvious" to people who know the UI inside-out but not to anyone else.

Matei


> The lower latency potential of Sparrow is also very intriguing. 
> 
> Getting GraphX for PySpark would be very welcome. 
> 
> It's easy to find fault, of course. I do want to say again how grateful I am to have a usable release in 1.2 and look forward to 1.3 and beyond with real excitement. 
> 
> ----
> Eric Friedman
> 
>> On Dec 28, 2014, at 5:40 PM, Patrick Wendell <pw...@gmail.com> wrote:
>> 
>> Hey Eric,
>> 
>> I'm just curious - which specific features in 1.2 do you find most
>> help with usability? This is a theme we're focusing on for 1.3 as
>> well, so it's helpful to hear what makes a difference.
>> 
>> - Patrick
>> 
>> On Sun, Dec 28, 2014 at 1:36 AM, Eric Friedman
>> <er...@gmail.com> wrote:
>>> Hi Josh,
>>> 
>>> Thanks for the informative answer. Sounds like one should await your changes
>>> in 1.3. As information, I found the following set of options for doing the
>>> visual in a notebook.
>>> 
>>> http://nbviewer.ipython.org/github/ipython/ipython/blob/3607712653c66d63e0d7f13f073bde8c0f209ba8/docs/examples/notebooks/Animations_and_Progress.ipynb
>>> 
>>> 
>>> On Dec 27, 2014, at 4:07 PM, Josh Rosen <ro...@gmail.com> wrote:
>>> 
>>> The console progress bars are implemented on top of a new stable "status
>>> API" that was added in Spark 1.2.  It's possible to query job progress using
>>> this interface (in older versions of Spark, you could implement a custom
>>> SparkListener and maintain the counts of completed / running / failed tasks
>>> / stages yourself).
>>> 
>>> There are actually several subtleties involved in implementing "job-level"
>>> progress bars which behave in an intuitive way; there's a pretty extensive
>>> discussion of the challenges at https://github.com/apache/spark/pull/3009.
>>> Also, check out the pull request for the console progress bars for an
>>> interesting design discussion around how they handle parallel stages:
>>> https://github.com/apache/spark/pull/3029.
>>> 
>>> I'm not sure about the plumbing that would be necessary to display live
>>> progress updates in the IPython notebook UI, though.  The general pattern
>>> would probably involve a mapping to relate notebook cells to Spark jobs (you
>>> can do this with job groups, I think), plus some periodic timer that polls
>>> the driver for the status of the current job in order to update the progress
>>> bar.
>>> 
>>> For Spark 1.3, I'm working on designing a REST interface to accesses this
>>> type of job / stage / task progress information, as well as expanding the
>>> types of information exposed through the stable status API interface.
>>> 
>>> - Josh
>>> 
>>> On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman <er...@gmail.com>
>>> wrote:
>>>> 
>>>> Spark 1.2.0 is SO much more usable than previous releases -- many thanks
>>>> to the team for this release.
>>>> 
>>>> A question about progress of actions.  I can see how things are
>>>> progressing using the Spark UI.  I can also see the nice ASCII art animation
>>>> on the spark driver console.
>>>> 
>>>> Has anyone come up with a way to accomplish something similar in an
>>>> iPython notebook using pyspark?
>>>> 
>>>> Thanks
>>>> Eric
>>> 
>>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: action progress in ipython notebook?

Posted by Nicholas Chammas <ni...@gmail.com>.
On Mon, Dec 29, 2014 at 12:00 AM, Eric Friedman <er...@gmail.com>
wrote:

I'm not seeing RDDs or SRDDs cached in the Spark UI. That page remains
> empty despite my calling cache().
>
Just a small note on this, and perhaps you already know: Calling cache() is
not enough to cache something and thus make it show up in the UI. Caching
in Spark is lazy.

You need to call cache() and then run some some action (like count()) to
force the RDD into memory. Then you’ll be able to see it in the UI under
storage. You should even be able to see the “cached fraction” increase with
page refreshes as Spark gradually loads the RDD bit by bit.

Nick
​

Re: action progress in ipython notebook?

Posted by Eric Friedman <er...@gmail.com>.
Hi Patrick,

I don't mean to be glib, but the fact that it works at all on my cluster (600 nodes) and data is a novel experience. This is the first release that I haven't had to struggle with and then give up entirely. I can , for example, finally use HiveContext from PySpark on CDH, at least to read data. 

That said, there are plenty of opportunities in 1.3, not that you asked. *smile*:

I'm not seeing RDDs or SRDDs cached in the Spark UI. That page remains empty despite my calling cache(). 

I think that attempts to access a directory of parquet files still requires reading the schema from the footer of every file. Painfully slow for terabytes of data. 

Exceptions are still often reflective of a symptom rather than a root cause. For example, I had a join that was blowing up, but it was variously reported as insufficient kryro buffers and even an AST error in the SQL parser. 

Saving an SRDD to a table in Hive doesn't work. I had to sneak it in by saving to a file and the creating an external table. 

In interactive work, it would be nice if I could interrupt the current job without killing the whole session. The lower latency potential of Sparrow is also very intriguing. 

Getting GraphX for PySpark would be very welcome. 

It's easy to find fault, of course. I do want to say again how grateful I am to have a usable release in 1.2 and look forward to 1.3 and beyond with real excitement. 

----
Eric Friedman

> On Dec 28, 2014, at 5:40 PM, Patrick Wendell <pw...@gmail.com> wrote:
> 
> Hey Eric,
> 
> I'm just curious - which specific features in 1.2 do you find most
> help with usability? This is a theme we're focusing on for 1.3 as
> well, so it's helpful to hear what makes a difference.
> 
> - Patrick
> 
> On Sun, Dec 28, 2014 at 1:36 AM, Eric Friedman
> <er...@gmail.com> wrote:
>> Hi Josh,
>> 
>> Thanks for the informative answer. Sounds like one should await your changes
>> in 1.3. As information, I found the following set of options for doing the
>> visual in a notebook.
>> 
>> http://nbviewer.ipython.org/github/ipython/ipython/blob/3607712653c66d63e0d7f13f073bde8c0f209ba8/docs/examples/notebooks/Animations_and_Progress.ipynb
>> 
>> 
>> On Dec 27, 2014, at 4:07 PM, Josh Rosen <ro...@gmail.com> wrote:
>> 
>> The console progress bars are implemented on top of a new stable "status
>> API" that was added in Spark 1.2.  It's possible to query job progress using
>> this interface (in older versions of Spark, you could implement a custom
>> SparkListener and maintain the counts of completed / running / failed tasks
>> / stages yourself).
>> 
>> There are actually several subtleties involved in implementing "job-level"
>> progress bars which behave in an intuitive way; there's a pretty extensive
>> discussion of the challenges at https://github.com/apache/spark/pull/3009.
>> Also, check out the pull request for the console progress bars for an
>> interesting design discussion around how they handle parallel stages:
>> https://github.com/apache/spark/pull/3029.
>> 
>> I'm not sure about the plumbing that would be necessary to display live
>> progress updates in the IPython notebook UI, though.  The general pattern
>> would probably involve a mapping to relate notebook cells to Spark jobs (you
>> can do this with job groups, I think), plus some periodic timer that polls
>> the driver for the status of the current job in order to update the progress
>> bar.
>> 
>> For Spark 1.3, I'm working on designing a REST interface to accesses this
>> type of job / stage / task progress information, as well as expanding the
>> types of information exposed through the stable status API interface.
>> 
>> - Josh
>> 
>> On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman <er...@gmail.com>
>> wrote:
>>> 
>>> Spark 1.2.0 is SO much more usable than previous releases -- many thanks
>>> to the team for this release.
>>> 
>>> A question about progress of actions.  I can see how things are
>>> progressing using the Spark UI.  I can also see the nice ASCII art animation
>>> on the spark driver console.
>>> 
>>> Has anyone come up with a way to accomplish something similar in an
>>> iPython notebook using pyspark?
>>> 
>>> Thanks
>>> Eric
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: action progress in ipython notebook?

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Eric,

I'm just curious - which specific features in 1.2 do you find most
help with usability? This is a theme we're focusing on for 1.3 as
well, so it's helpful to hear what makes a difference.

- Patrick

On Sun, Dec 28, 2014 at 1:36 AM, Eric Friedman
<er...@gmail.com> wrote:
> Hi Josh,
>
> Thanks for the informative answer. Sounds like one should await your changes
> in 1.3. As information, I found the following set of options for doing the
> visual in a notebook.
>
> http://nbviewer.ipython.org/github/ipython/ipython/blob/3607712653c66d63e0d7f13f073bde8c0f209ba8/docs/examples/notebooks/Animations_and_Progress.ipynb
>
>
> On Dec 27, 2014, at 4:07 PM, Josh Rosen <ro...@gmail.com> wrote:
>
> The console progress bars are implemented on top of a new stable "status
> API" that was added in Spark 1.2.  It's possible to query job progress using
> this interface (in older versions of Spark, you could implement a custom
> SparkListener and maintain the counts of completed / running / failed tasks
> / stages yourself).
>
> There are actually several subtleties involved in implementing "job-level"
> progress bars which behave in an intuitive way; there's a pretty extensive
> discussion of the challenges at https://github.com/apache/spark/pull/3009.
> Also, check out the pull request for the console progress bars for an
> interesting design discussion around how they handle parallel stages:
> https://github.com/apache/spark/pull/3029.
>
> I'm not sure about the plumbing that would be necessary to display live
> progress updates in the IPython notebook UI, though.  The general pattern
> would probably involve a mapping to relate notebook cells to Spark jobs (you
> can do this with job groups, I think), plus some periodic timer that polls
> the driver for the status of the current job in order to update the progress
> bar.
>
> For Spark 1.3, I'm working on designing a REST interface to accesses this
> type of job / stage / task progress information, as well as expanding the
> types of information exposed through the stable status API interface.
>
> - Josh
>
> On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman <er...@gmail.com>
> wrote:
>>
>> Spark 1.2.0 is SO much more usable than previous releases -- many thanks
>> to the team for this release.
>>
>> A question about progress of actions.  I can see how things are
>> progressing using the Spark UI.  I can also see the nice ASCII art animation
>> on the spark driver console.
>>
>> Has anyone come up with a way to accomplish something similar in an
>> iPython notebook using pyspark?
>>
>> Thanks
>> Eric
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: action progress in ipython notebook?

Posted by Eric Friedman <er...@gmail.com>.
Hi Josh,

Thanks for the informative answer. Sounds like one should await your changes in 1.3. As information, I found the following set of options for doing the visual in a notebook.

http://nbviewer.ipython.org/github/ipython/ipython/blob/3607712653c66d63e0d7f13f073bde8c0f209ba8/docs/examples/notebooks/Animations_and_Progress.ipynb


> On Dec 27, 2014, at 4:07 PM, Josh Rosen <ro...@gmail.com> wrote:
> 
> The console progress bars are implemented on top of a new stable "status API" that was added in Spark 1.2.  It's possible to query job progress using this interface (in older versions of Spark, you could implement a custom SparkListener and maintain the counts of completed / running / failed tasks / stages yourself). 
> 
> There are actually several subtleties involved in implementing "job-level" progress bars which behave in an intuitive way; there's a pretty extensive discussion of the challenges at https://github.com/apache/spark/pull/3009.  Also, check out the pull request for the console progress bars for an interesting design discussion around how they handle parallel stages: https://github.com/apache/spark/pull/3029.
> 
> I'm not sure about the plumbing that would be necessary to display live progress updates in the IPython notebook UI, though.  The general pattern would probably involve a mapping to relate notebook cells to Spark jobs (you can do this with job groups, I think), plus some periodic timer that polls the driver for the status of the current job in order to update the progress bar.
> 
> For Spark 1.3, I'm working on designing a REST interface to accesses this type of job / stage / task progress information, as well as expanding the types of information exposed through the stable status API interface.
> 
> - Josh
> 
>> On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman <er...@gmail.com> wrote:
>> Spark 1.2.0 is SO much more usable than previous releases -- many thanks to the team for this release.
>> 
>> A question about progress of actions.  I can see how things are progressing using the Spark UI.  I can also see the nice ASCII art animation on the spark driver console.
>> 
>> Has anyone come up with a way to accomplish something similar in an iPython notebook using pyspark?
>> 
>> Thanks
>> Eric
> 

Re: action progress in ipython notebook?

Posted by Josh Rosen <ro...@gmail.com>.
The console progress bars are implemented on top of a new stable "status
API" that was added in Spark 1.2.  It's possible to query job progress
using this interface (in older versions of Spark, you could implement a
custom SparkListener and maintain the counts of completed / running /
failed tasks / stages yourself).

There are actually several subtleties involved in implementing "job-level"
progress bars which behave in an intuitive way; there's a pretty extensive
discussion of the challenges at https://github.com/apache/spark/pull/3009.
Also, check out the pull request for the console progress bars for an
interesting design discussion around how they handle parallel stages:
https://github.com/apache/spark/pull/3029.

I'm not sure about the plumbing that would be necessary to display live
progress updates in the IPython notebook UI, though.  The general pattern
would probably involve a mapping to relate notebook cells to Spark jobs
(you can do this with job groups, I think), plus some periodic timer that
polls the driver for the status of the current job in order to update the
progress bar.

For Spark 1.3, I'm working on designing a REST interface to accesses this
type of job / stage / task progress information, as well as expanding the
types of information exposed through the stable status API interface.

- Josh

On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman <er...@gmail.com>
wrote:

> Spark 1.2.0 is SO much more usable than previous releases -- many thanks
> to the team for this release.
>
> A question about progress of actions.  I can see how things are
> progressing using the Spark UI.  I can also see the nice ASCII art
> animation on the spark driver console.
>
> Has anyone come up with a way to accomplish something similar in an
> iPython notebook using pyspark?
>
> Thanks
> Eric
>