You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Roman Pastukhov <me...@gmail.com> on 2014/03/17 15:35:34 UTC

Log analyzer and other Spark tools

Hi.

We're thinking about writing a tool that would read Spark logs and output
cache contents at some point in time (e.g. if you want to see what data
fills the cache and whether some of it may be unpersisted to improve
performance).

Are there similar projects that already exist? Is there a list of
Spark-related tools? There is Spark debugger/SRD (
https://github.com/mesos/spark/wiki/Spark-Debugger,
http://spark-replay-debugger-overview.readthedocs.org/en/latest/) but I
couldn't find any links to them on the Spark project site.

Re: Log analyzer and other Spark tools

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Roman,

Ya definitely checkout pull request 42 - one cool thing is this patch
now includes information about in-memory storage in the listener
interface, so you can see directly which blocks are cached/on-disk
etc.

- Patrick

On Mon, Mar 17, 2014 at 5:34 PM, Matei Zaharia <ma...@gmail.com> wrote:
> Take a look at the SparkListener API included in Spark, you can use it to
> capture various events. There's also this pull request:
> https://github.com/apache/spark/pull/42 that will persist application logs
> and let you rebuild the web UI after the app runs. It uses the same API to
> log events.
>
> Matei
>
> On Mar 17, 2014, at 7:35 AM, Roman Pastukhov <me...@gmail.com> wrote:
>
> Hi.
>
> We're thinking about writing a tool that would read Spark logs and output
> cache contents at some point in time (e.g. if you want to see what data
> fills the cache and whether some of it may be unpersisted to improve
> performance).
>
> Are there similar projects that already exist? Is there a list of
> Spark-related tools? There is Spark debugger/SRD
> (https://github.com/mesos/spark/wiki/Spark-Debugger,
> http://spark-replay-debugger-overview.readthedocs.org/en/latest/) but I
> couldn't find any links to them on the Spark project site.
>
>

Re: Log analyzer and other Spark tools

Posted by Matei Zaharia <ma...@gmail.com>.
Take a look at the SparkListener API included in Spark, you can use it to capture various events. There’s also this pull request: https://github.com/apache/spark/pull/42 that will persist application logs and let you rebuild the web UI after the app runs. It uses the same API to log events.

Matei

On Mar 17, 2014, at 7:35 AM, Roman Pastukhov <me...@gmail.com> wrote:

> Hi.
> 
> We're thinking about writing a tool that would read Spark logs and output cache contents at some point in time (e.g. if you want to see what data fills the cache and whether some of it may be unpersisted to improve performance).
> 
> Are there similar projects that already exist? Is there a list of Spark-related tools? There is Spark debugger/SRD (https://github.com/mesos/spark/wiki/Spark-Debugger, http://spark-replay-debugger-overview.readthedocs.org/en/latest/) but I couldn't find any links to them on the Spark project site.