You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aseem Bansal <as...@gmail.com> on 2016/09/01 09:06:53 UTC
Spark 2.0.0 - Java vs Scala performance difference
Hi
Would there be any significant performance difference when using Java vs.
Scala API?
Re: Spark 2.0.0 - Java vs Scala performance difference
Posted by Adam Roberts <AR...@uk.ibm.com>.
On Java vs Scala: Sean's right that behind the scenes you'll be calling
JVM based APIs anyway (e.g. sun.misc.unsafe for Tungsten) and that the
vast majority of Apache Spark's important logic is written in Scala.
Would be an interesting experiment to write the same functioning program
using the Java APIs vs Scala APIs just to see if there is a noticeable
difference: I'm thinking in terms of how the Scala implementation
libraries perform at runtime, with profiling (we use Healthcenter, tprof,
or just microbenchmarking with prints and timers), we've seen lots of code
in Scala itself to do with (un)boxing and instanceOf checks that could do
with some TLC for performance.
Now quite outdated but still shows that writing what's concise (Scala)
isn't always best for performance:
https://jazzy.id.au/2012/10/16/benchmarking_scala_against_java.html
So if we just to stick to Java we may not hit those overheads as often
(there's a talk by my colleague on boosting performance from a Java
implementer's perspective at https://www.youtube.com/watch?v=rcVTM-71bZk),
but I don't expect the differences to be enormous. Full disclosure that I
work for IBM and one of our goals is to improve Apache Spark and our Java
implementation to perform fast together.
There's also the obvious trade-off of developer productivity and code
maintainability (more Java devs than Scala devs), so my suggestion is that
if you're much better at writing Java or Scala code, use that for what is
considered the real important performance critical logic - be aware that
you're going be hitting the Apache Spark codebase written in Scala anyway,
so there's only so much to be gained here.
I also think that Just in Time Compiler implementations are generally
better at optimising what's written as Java code instead of Scala code as
knowing the types way ahead of time and where we can make codepath
shortcuts in the bytecode execution should deliver a slight performance
improvements. I am keen to come up with some solid recommendations based
on evidence for us all to benefit from.
From: Aseem Bansal <as...@gmail.com>
To: ayan guha <gu...@gmail.com>
Cc: Sean Owen <so...@cloudera.com>, user <us...@spark.apache.org>
Date: 01/09/2016 13:11
Subject: Re: Spark 2.0.0 - Java vs Scala performance difference
there is already a mail thread for scala vs python. check the archives
On Thu, Sep 1, 2016 at 5:18 PM, ayan guha <gu...@gmail.com> wrote:
How about Scala vs Python?
On Thu, Sep 1, 2016 at 7:27 PM, Sean Owen <so...@cloudera.com> wrote:
I can't think of a situation where it would be materially different.
Both are using the JVM-based APIs directly. Here and there there's a
tiny bit of overhead in using the Java APIs because something is
translated from a Java-style object to a Scala-style object, but this
is generally trivial.
On Thu, Sep 1, 2016 at 10:06 AM, Aseem Bansal <as...@gmail.com>
wrote:
> Hi
>
> Would there be any significant performance difference when using Java
vs.
> Scala API?
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
--
Best Regards,
Ayan Guha
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: Spark 2.0.0 - Java vs Scala performance difference
Posted by Aseem Bansal <as...@gmail.com>.
there is already a mail thread for scala vs python. check the archives
On Thu, Sep 1, 2016 at 5:18 PM, ayan guha <gu...@gmail.com> wrote:
> How about Scala vs Python?
>
> On Thu, Sep 1, 2016 at 7:27 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> I can't think of a situation where it would be materially different.
>> Both are using the JVM-based APIs directly. Here and there there's a
>> tiny bit of overhead in using the Java APIs because something is
>> translated from a Java-style object to a Scala-style object, but this
>> is generally trivial.
>>
>> On Thu, Sep 1, 2016 at 10:06 AM, Aseem Bansal <as...@gmail.com>
>> wrote:
>> > Hi
>> >
>> > Would there be any significant performance difference when using Java
>> vs.
>> > Scala API?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>
Re: Spark 2.0.0 - Java vs Scala performance difference
Posted by ayan guha <gu...@gmail.com>.
How about Scala vs Python?
On Thu, Sep 1, 2016 at 7:27 PM, Sean Owen <so...@cloudera.com> wrote:
> I can't think of a situation where it would be materially different.
> Both are using the JVM-based APIs directly. Here and there there's a
> tiny bit of overhead in using the Java APIs because something is
> translated from a Java-style object to a Scala-style object, but this
> is generally trivial.
>
> On Thu, Sep 1, 2016 at 10:06 AM, Aseem Bansal <as...@gmail.com>
> wrote:
> > Hi
> >
> > Would there be any significant performance difference when using Java vs.
> > Scala API?
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
--
Best Regards,
Ayan Guha
Re: Spark 2.0.0 - Java vs Scala performance difference
Posted by Sean Owen <so...@cloudera.com>.
I can't think of a situation where it would be materially different.
Both are using the JVM-based APIs directly. Here and there there's a
tiny bit of overhead in using the Java APIs because something is
translated from a Java-style object to a Scala-style object, but this
is generally trivial.
On Thu, Sep 1, 2016 at 10:06 AM, Aseem Bansal <as...@gmail.com> wrote:
> Hi
>
> Would there be any significant performance difference when using Java vs.
> Scala API?
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org