You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jack Kolokasis <ko...@ics.forth.gr> on 2019/03/26 12:59:38 UTC

Spark Profiler

Hello all,

     I am looking for a spark profiler to trace my application to find 
the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage.

I am looking forward for your reply.

--Iacovos


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark Profiler

Posted by jcdauchy <je...@moodys.com>.
Hello Jack,

You can also have a look at “Babar”, there is a nice “flame graph” feature
too. I haven’t had the time to test it out.

https://github.com/criteo/babar

JC




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark Profiler

Posted by Hariharan <ha...@gmail.com>.
Hi Jack,

You can try sparklens (https://github.com/qubole/sparklens). I think it
won't give details at as low a level as you're looking for, but it can help
you identify and remove performance bottlenecks.

~ Hariharan

On Fri, Mar 29, 2019 at 12:01 AM bo yang <bo...@gmail.com> wrote:

> Yeah, these options are very valuable. Just add another option :) We build
> a jvm profiler (https://github.com/uber-common/jvm-profiler) to monitor
> and profile Spark applications in large scale (e.g. sending metrics to
> kafka / hive for batch analysis). People could try it as well.
>
>
> On Wed, Mar 27, 2019 at 1:49 PM Jack Kolokasis <ko...@ics.forth.gr>
> wrote:
>
>> Thanks for your reply.  Your help is very valuable and all these links
>> are helpful (especially your example)
>>
>> Best Regards
>>
>> --Iacovos
>> On 3/27/19 10:42 PM, Luca Canali wrote:
>>
>> I find that the Spark metrics system is quite useful to gather resource
>> utilization metrics of Spark applications, including CPU, memory and I/O.
>>
>> If you are interested an example how this works for us at:
>> https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark
>> If instead you are rather looking at ways to instrument your Spark code
>> with performance metrics, Spark task metrics and event listeners are quite
>> useful for that. See also
>> https://github.com/apache/spark/blob/master/docs/monitoring.md and
>> https://github.com/LucaCanali/sparkMeasure
>>
>>
>>
>> Regards,
>>
>> Luca
>>
>>
>>
>> *From:* manish ranjan <cs...@gmail.com> <cs...@gmail.com>
>> *Sent:* Tuesday, March 26, 2019 15:24
>> *To:* Jack Kolokasis <ko...@ics.forth.gr> <ko...@ics.forth.gr>
>> *Cc:* user <us...@spark.apache.org> <us...@spark.apache.org>
>> *Subject:* Re: Spark Profiler
>>
>>
>>
>> I have found ganglia very helpful in understanding network I/o , CPU and
>> memory usage  for a given spark cluster.
>>
>> I have not used , but have heard good things about Dr Elephant ( which I
>> think was contributed by LinkedIn but not 100%sure).
>>
>>
>>
>> On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis <ko...@ics.forth.gr>
>> wrote:
>>
>> Hello all,
>>
>>      I am looking for a spark profiler to trace my application to find
>> the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage.
>>
>> I am looking forward for your reply.
>>
>> --Iacovos
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Re: Spark Profiler

Posted by bo yang <bo...@gmail.com>.
Yeah, these options are very valuable. Just add another option :) We build
a jvm profiler (https://github.com/uber-common/jvm-profiler) to monitor and
profile Spark applications in large scale (e.g. sending metrics to kafka /
hive for batch analysis). People could try it as well.


On Wed, Mar 27, 2019 at 1:49 PM Jack Kolokasis <ko...@ics.forth.gr>
wrote:

> Thanks for your reply.  Your help is very valuable and all these links are
> helpful (especially your example)
>
> Best Regards
>
> --Iacovos
> On 3/27/19 10:42 PM, Luca Canali wrote:
>
> I find that the Spark metrics system is quite useful to gather resource
> utilization metrics of Spark applications, including CPU, memory and I/O.
>
> If you are interested an example how this works for us at:
> https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark
> If instead you are rather looking at ways to instrument your Spark code
> with performance metrics, Spark task metrics and event listeners are quite
> useful for that. See also
> https://github.com/apache/spark/blob/master/docs/monitoring.md and
> https://github.com/LucaCanali/sparkMeasure
>
>
>
> Regards,
>
> Luca
>
>
>
> *From:* manish ranjan <cs...@gmail.com> <cs...@gmail.com>
> *Sent:* Tuesday, March 26, 2019 15:24
> *To:* Jack Kolokasis <ko...@ics.forth.gr> <ko...@ics.forth.gr>
> *Cc:* user <us...@spark.apache.org> <us...@spark.apache.org>
> *Subject:* Re: Spark Profiler
>
>
>
> I have found ganglia very helpful in understanding network I/o , CPU and
> memory usage  for a given spark cluster.
>
> I have not used , but have heard good things about Dr Elephant ( which I
> think was contributed by LinkedIn but not 100%sure).
>
>
>
> On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis <ko...@ics.forth.gr>
> wrote:
>
> Hello all,
>
>      I am looking for a spark profiler to trace my application to find
> the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage.
>
> I am looking forward for your reply.
>
> --Iacovos
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark Profiler

Posted by Jack Kolokasis <ko...@ics.forth.gr>.
Thanks for your reply.  Your help is very valuable and all these links 
are helpful (especially your example)

Best Regards

--Iacovos

On 3/27/19 10:42 PM, Luca Canali wrote:
>
> I find that the Spark metrics system is quite useful to gather 
> resource utilization metrics of Spark applications, including CPU, 
> memory and I/O.
>
> If you are interested an example how this works for us at: 
> https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark 
>
> If instead you are rather looking at ways to instrument your Spark 
> code with performance metrics, Spark task metrics and event listeners 
> are quite useful for that. See also 
> https://github.com/apache/spark/blob/master/docs/monitoring.md and 
> https://github.com/LucaCanali/sparkMeasure
>
> Regards,
>
> Luca
>
> *From:*manish ranjan <cs...@gmail.com>
> *Sent:* Tuesday, March 26, 2019 15:24
> *To:* Jack Kolokasis <ko...@ics.forth.gr>
> *Cc:* user <us...@spark.apache.org>
> *Subject:* Re: Spark Profiler
>
> I have found ganglia very helpful in understanding network I/o , CPU 
> and memory usage  for a given spark cluster.
>
> I have not used , but have heard good things about Dr Elephant ( which 
> I think was contributed by LinkedIn but not 100%sure).
>
> On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis <kolokasis@ics.forth.gr 
> <ma...@ics.forth.gr>> wrote:
>
>     Hello all,
>
>          I am looking for a spark profiler to trace my application to
>     find
>     the bottlenecks. I need to trace CPU usage, Memory Usage and I/O
>     usage.
>
>     I am looking forward for your reply.
>
>     --Iacovos
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>     <ma...@spark.apache.org>
>

RE: Spark Profiler

Posted by Luca Canali <Lu...@cern.ch>.
I find that the Spark metrics system is quite useful to gather resource utilization metrics of Spark applications, including CPU, memory and I/O.
If you are interested an example how this works for us at: https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark
If instead you are rather looking at ways to instrument your Spark code with performance metrics, Spark task metrics and event listeners are quite useful for that. See also https://github.com/apache/spark/blob/master/docs/monitoring.md and https://github.com/LucaCanali/sparkMeasure

Regards,
Luca

From: manish ranjan <cs...@gmail.com>
Sent: Tuesday, March 26, 2019 15:24
To: Jack Kolokasis <ko...@ics.forth.gr>
Cc: user <us...@spark.apache.org>
Subject: Re: Spark Profiler

I have found ganglia very helpful in understanding network I/o , CPU and memory usage  for a given spark cluster.
I have not used , but have heard good things about Dr Elephant ( which I think was contributed by LinkedIn but not 100%sure).

On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis <ko...@ics.forth.gr>> wrote:
Hello all,

     I am looking for a spark profiler to trace my application to find
the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage.

I am looking forward for your reply.

--Iacovos


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>

Re: Spark Profiler

Posted by manish ranjan <cs...@gmail.com>.
I have found ganglia very helpful in understanding network I/o , CPU and
memory usage  for a given spark cluster.
I have not used , but have heard good things about Dr Elephant ( which I
think was contributed by LinkedIn but not 100%sure).

On Tue, Mar 26, 2019, 5:59 AM Jack Kolokasis <ko...@ics.forth.gr> wrote:

> Hello all,
>
>      I am looking for a spark profiler to trace my application to find
> the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage.
>
> I am looking forward for your reply.
>
> --Iacovos
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>