You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by karan alang <ka...@gmail.com> on 2018/10/26 22:04:11 UTC

java vs scala for Apache Spark - is there a performance difference ?

Hello
- is there a "performance" difference when using Java or Scala for Apache
Spark ?

I understand, there are other obvious differences (less code with scala,
easier to focus on logic etc),
but wrt performance - i think there would not be much of a difference since
both of them are JVM based,
pls. let me know if this is not the case.

thanks!

Re: java vs scala for Apache Spark - is there a performance difference ?

Posted by Jörn Franke <jo...@gmail.com>.
Older versions of Spark had indeed a lower performance on Python and R due to a conversion need between JVM datatypes and python/r datatypes. This changed in Spark 2.2, I think, with the integration of Apache Arrow.  However, what you do after the conversion in those languages can be still slower than, for instance, in Java if you do not use Spark only functions. It could be also faster (eg you use a python module implemented natively in C and if there is no translation into c datatypes needed). 
Scala has in certain cases a more elegant syntax than Java (if you do not use Lambda). Sometimes this elegant syntax can lead to (unintentional) inefficient things for which there is a better way to express them (eg implicit conversions, use of collection methods etc). However there are better ways and you just have to spot these issues in the source code and address them, if needed. 
So a comparison does not make really sense between those languages - it always depends.

> Am 30.10.2018 um 07:00 schrieb akshay naidu <ak...@gmail.com>:
> 
> how about Python. 
> java vs scala vs python vs R
> which is better.
> 
>> On Sat, Oct 27, 2018 at 3:34 AM karan alang <ka...@gmail.com> wrote:
>> Hello 
>> - is there a "performance" difference when using Java or Scala for Apache Spark ?
>> 
>> I understand, there are other obvious differences (less code with scala, easier to focus on logic etc), 
>> but wrt performance - i think there would not be much of a difference since both of them are JVM based, 
>> pls. let me know if this is not the case.
>> 
>> thanks!

Re: java vs scala for Apache Spark - is there a performance difference ?

Posted by akshay naidu <ak...@gmail.com>.
how about Python.
java vs scala vs python vs R
which is better.

On Sat, Oct 27, 2018 at 3:34 AM karan alang <ka...@gmail.com> wrote:

> Hello
> - is there a "performance" difference when using Java or Scala for Apache
> Spark ?
>
> I understand, there are other obvious differences (less code with scala,
> easier to focus on logic etc),
> but wrt performance - i think there would not be much of a difference
> since both of them are JVM based,
> pls. let me know if this is not the case.
>
> thanks!
>

Re: java vs scala for Apache Spark - is there a performance difference ?

Posted by Battini Lakshman <ba...@gmail.com>.
On Oct 27, 2018 3:34 AM, "karan alang" <ka...@gmail.com> wrote:

Hello
- is there a "performance" difference when using Java or Scala for Apache
Spark ?

I understand, there are other obvious differences (less code with scala,
easier to focus on logic etc),
but wrt performance - i think there would not be much of a difference since
both of them are JVM based,
pls. let me know if this is not the case.

thanks!

Re: java vs scala for Apache Spark - is there a performance difference ?

Posted by Gourav Sengupta <go...@gmail.com>.
I genuinely do not think that Scala for Spark needs us to be super in
Scala. There is infact a tutorial called as "Just enough Scala for Spark"
which even with my IQ does not take more than 40 mins to go through. Also
the sytax of Scala is almost always similar to that of Python.

Data processing is much more amenable to functional thinking and therefore
Scala suits best also Spark is written in Scala.

Regards,
Gourav

On Mon, Oct 29, 2018 at 11:33 PM kant kodali <ka...@gmail.com> wrote:

> Most people when they compare two different programming languages 99% of
> the time it all seems to boil down to syntax sugar.
>
> Performance I doubt Scala is ever faster than Java given that Scala likes
> Heap more than Java. I had also written some pointless micro-benchmarking
> code like (Random String Generation, hash computations, etc..) on Java,
> Scala and Golang and Java had outperformed both Scala and Golang as well on
> many occasions.
>
> Now that Java 11 had released things seem to get even better given the
> startup time is also very low.
>
> I am happy to change my view as long as I can see some code and benchmarks!
>
>
>
> On Mon, Oct 29, 2018 at 1:58 PM Jean Georges Perrin <jg...@jgp.net> wrote:
>
>> did not see anything, but curious if you find something.
>>
>> I think one of the big benefit of using Java, for data engineering in the
>> context of  Spark, is that you do not have to train a lot of your team to
>> Scala. Now if you want to do data science, Java is probably not the best
>> tool yet...
>>
>> On Oct 26, 2018, at 6:04 PM, karan alang <ka...@gmail.com> wrote:
>>
>> Hello
>> - is there a "performance" difference when using Java or Scala for Apache
>> Spark ?
>>
>> I understand, there are other obvious differences (less code with scala,
>> easier to focus on logic etc),
>> but wrt performance - i think there would not be much of a difference
>> since both of them are JVM based,
>> pls. let me know if this is not the case.
>>
>> thanks!
>>
>>
>>

Re: java vs scala for Apache Spark - is there a performance difference ?

Posted by kant kodali <ka...@gmail.com>.
Most people when they compare two different programming languages 99% of
the time it all seems to boil down to syntax sugar.

Performance I doubt Scala is ever faster than Java given that Scala likes
Heap more than Java. I had also written some pointless micro-benchmarking
code like (Random String Generation, hash computations, etc..) on Java,
Scala and Golang and Java had outperformed both Scala and Golang as well on
many occasions.

Now that Java 11 had released things seem to get even better given the
startup time is also very low.

I am happy to change my view as long as I can see some code and benchmarks!



On Mon, Oct 29, 2018 at 1:58 PM Jean Georges Perrin <jg...@jgp.net> wrote:

> did not see anything, but curious if you find something.
>
> I think one of the big benefit of using Java, for data engineering in the
> context of  Spark, is that you do not have to train a lot of your team to
> Scala. Now if you want to do data science, Java is probably not the best
> tool yet...
>
> On Oct 26, 2018, at 6:04 PM, karan alang <ka...@gmail.com> wrote:
>
> Hello
> - is there a "performance" difference when using Java or Scala for Apache
> Spark ?
>
> I understand, there are other obvious differences (less code with scala,
> easier to focus on logic etc),
> but wrt performance - i think there would not be much of a difference
> since both of them are JVM based,
> pls. let me know if this is not the case.
>
> thanks!
>
>
>

Re: java vs scala for Apache Spark - is there a performance difference ?

Posted by Jean Georges Perrin <jg...@jgp.net>.
did not see anything, but curious if you find something.

I think one of the big benefit of using Java, for data engineering in the context of  Spark, is that you do not have to train a lot of your team to Scala. Now if you want to do data science, Java is probably not the best tool yet...

> On Oct 26, 2018, at 6:04 PM, karan alang <ka...@gmail.com> wrote:
> 
> Hello 
> - is there a "performance" difference when using Java or Scala for Apache Spark ?
> 
> I understand, there are other obvious differences (less code with scala, easier to focus on logic etc), 
> but wrt performance - i think there would not be much of a difference since both of them are JVM based, 
> pls. let me know if this is not the case.
> 
> thanks!