You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Will Benton <wi...@redhat.com> on 2014/07/14 18:51:11 UTC

Profiling Spark tests with YourKit (or something else)

Hi all,

I've been evaluating YourKit and would like to profile the heap and CPU usage of certain tests from the Spark test suite.  In particular, I'm very interested in tracking heap usage by allocation site.  Unfortunately, I get a lot of crashes running Spark tests with profiling (and thus allocation-site tracking) enabled in YourKit; just using the sampler works fine, but it appears that enabling the profiler breaks Utils.getCallSite.

Is there a way to make this combination work?  If not, what are people using to understand the memory and CPU behavior of Spark and Spark apps?


thanks,
wb

Re: Profiling Spark tests with YourKit (or something else)

Posted by Will Benton <wi...@redhat.com>.
Sure thing:

   https://issues.apache.org/jira/browse/SPARK-2486
   https://github.com/apache/spark/pull/1413

best,
wb


----- Original Message -----
> From: "Aaron Davidson" <il...@gmail.com>
> To: dev@spark.apache.org
> Sent: Monday, July 14, 2014 8:38:16 PM
> Subject: Re: Profiling Spark tests with YourKit (or something else)
> 
> Would you mind filing a JIRA for this? That does sound like something bogus
> happening on the JVM/YourKit level, but this sort of diagnosis is
> sufficiently important that we should be resilient against it.
> 
> 
> On Mon, Jul 14, 2014 at 6:01 PM, Will Benton <wi...@redhat.com> wrote:
> 
> > ----- Original Message -----
> > > From: "Aaron Davidson" <il...@gmail.com>
> > > To: dev@spark.apache.org
> > > Sent: Monday, July 14, 2014 5:21:10 PM
> > > Subject: Re: Profiling Spark tests with YourKit (or something else)
> > >
> > > Out of curiosity, what problems are you seeing with Utils.getCallSite?
> >
> > Aaron, if I enable call site tracking or CPU profiling in YourKit, many
> > (but not all) Spark test cases will NPE on the line filtering out
> > "getStackTrace" from the stack trace (this is Utils.scala:812 in the
> > current master).  I'm not sure if this is a consequence of
> > Thread#getStackTrace including bogus frames when running instrumented or if
> > whatever instrumentation YourKit inserts relies on assumptions that don't
> > always hold for Scala code.
> >
> >
> > best,
> > wb
> >
> 

Re: Profiling Spark tests with YourKit (or something else)

Posted by Aaron Davidson <il...@gmail.com>.
Would you mind filing a JIRA for this? That does sound like something bogus
happening on the JVM/YourKit level, but this sort of diagnosis is
sufficiently important that we should be resilient against it.


On Mon, Jul 14, 2014 at 6:01 PM, Will Benton <wi...@redhat.com> wrote:

> ----- Original Message -----
> > From: "Aaron Davidson" <il...@gmail.com>
> > To: dev@spark.apache.org
> > Sent: Monday, July 14, 2014 5:21:10 PM
> > Subject: Re: Profiling Spark tests with YourKit (or something else)
> >
> > Out of curiosity, what problems are you seeing with Utils.getCallSite?
>
> Aaron, if I enable call site tracking or CPU profiling in YourKit, many
> (but not all) Spark test cases will NPE on the line filtering out
> "getStackTrace" from the stack trace (this is Utils.scala:812 in the
> current master).  I'm not sure if this is a consequence of
> Thread#getStackTrace including bogus frames when running instrumented or if
> whatever instrumentation YourKit inserts relies on assumptions that don't
> always hold for Scala code.
>
>
> best,
> wb
>

Re: Profiling Spark tests with YourKit (or something else)

Posted by Will Benton <wi...@redhat.com>.
----- Original Message -----
> From: "Aaron Davidson" <il...@gmail.com>
> To: dev@spark.apache.org
> Sent: Monday, July 14, 2014 5:21:10 PM
> Subject: Re: Profiling Spark tests with YourKit (or something else)
> 
> Out of curiosity, what problems are you seeing with Utils.getCallSite?

Aaron, if I enable call site tracking or CPU profiling in YourKit, many (but not all) Spark test cases will NPE on the line filtering out "getStackTrace" from the stack trace (this is Utils.scala:812 in the current master).  I'm not sure if this is a consequence of Thread#getStackTrace including bogus frames when running instrumented or if whatever instrumentation YourKit inserts relies on assumptions that don't always hold for Scala code.


best,
wb

Re: Profiling Spark tests with YourKit (or something else)

Posted by Aaron Davidson <il...@gmail.com>.
Out of curiosity, what problems are you seeing with Utils.getCallSite?


On Mon, Jul 14, 2014 at 2:59 PM, Will Benton <wi...@redhat.com> wrote:

> Thanks, Matei; I have also had some success with jmap and friends and will
> probably just stick with them!
>
>
> best,
> wb
>
>
> ----- Original Message -----
> > From: "Matei Zaharia" <ma...@gmail.com>
> > To: dev@spark.apache.org
> > Sent: Monday, July 14, 2014 1:02:04 PM
> > Subject: Re: Profiling Spark tests with YourKit (or something else)
> >
> > I haven't seen issues using the JVM's own tools (jstack, jmap, hprof and
> > such), so maybe there's a problem in YourKit or in your release of the
> JVM.
> > Otherwise I'd suggest increasing the heap size of the unit tests a bit
> (you
> > can do this in the SBT build file). Maybe they are very close to full and
> > profiling pushes them over the edge.
> >
> > Matei
> >
> > On Jul 14, 2014, at 9:51 AM, Will Benton <wi...@redhat.com> wrote:
> >
> > > Hi all,
> > >
> > > I've been evaluating YourKit and would like to profile the heap and CPU
> > > usage of certain tests from the Spark test suite.  In particular, I'm
> very
> > > interested in tracking heap usage by allocation site.  Unfortunately, I
> > > get a lot of crashes running Spark tests with profiling (and thus
> > > allocation-site tracking) enabled in YourKit; just using the sampler
> works
> > > fine, but it appears that enabling the profiler breaks
> Utils.getCallSite.
> > >
> > > Is there a way to make this combination work?  If not, what are people
> > > using to understand the memory and CPU behavior of Spark and Spark
> apps?
> > >
> > >
> > > thanks,
> > > wb
> >
> >
>

Re: Profiling Spark tests with YourKit (or something else)

Posted by Will Benton <wi...@redhat.com>.
Thanks, Matei; I have also had some success with jmap and friends and will probably just stick with them!


best,
wb


----- Original Message -----
> From: "Matei Zaharia" <ma...@gmail.com>
> To: dev@spark.apache.org
> Sent: Monday, July 14, 2014 1:02:04 PM
> Subject: Re: Profiling Spark tests with YourKit (or something else)
> 
> I haven't seen issues using the JVM's own tools (jstack, jmap, hprof and
> such), so maybe there's a problem in YourKit or in your release of the JVM.
> Otherwise I'd suggest increasing the heap size of the unit tests a bit (you
> can do this in the SBT build file). Maybe they are very close to full and
> profiling pushes them over the edge.
> 
> Matei
> 
> On Jul 14, 2014, at 9:51 AM, Will Benton <wi...@redhat.com> wrote:
> 
> > Hi all,
> > 
> > I've been evaluating YourKit and would like to profile the heap and CPU
> > usage of certain tests from the Spark test suite.  In particular, I'm very
> > interested in tracking heap usage by allocation site.  Unfortunately, I
> > get a lot of crashes running Spark tests with profiling (and thus
> > allocation-site tracking) enabled in YourKit; just using the sampler works
> > fine, but it appears that enabling the profiler breaks Utils.getCallSite.
> > 
> > Is there a way to make this combination work?  If not, what are people
> > using to understand the memory and CPU behavior of Spark and Spark apps?
> > 
> > 
> > thanks,
> > wb
> 
> 

Re: Profiling Spark tests with YourKit (or something else)

Posted by Matei Zaharia <ma...@gmail.com>.
I haven't seen issues using the JVM's own tools (jstack, jmap, hprof and such), so maybe there's a problem in YourKit or in your release of the JVM. Otherwise I'd suggest increasing the heap size of the unit tests a bit (you can do this in the SBT build file). Maybe they are very close to full and profiling pushes them over the edge.

Matei

On Jul 14, 2014, at 9:51 AM, Will Benton <wi...@redhat.com> wrote:

> Hi all,
> 
> I've been evaluating YourKit and would like to profile the heap and CPU usage of certain tests from the Spark test suite.  In particular, I'm very interested in tracking heap usage by allocation site.  Unfortunately, I get a lot of crashes running Spark tests with profiling (and thus allocation-site tracking) enabled in YourKit; just using the sampler works fine, but it appears that enabling the profiler breaks Utils.getCallSite.
> 
> Is there a way to make this combination work?  If not, what are people using to understand the memory and CPU behavior of Spark and Spark apps?
> 
> 
> thanks,
> wb