You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andrew Ash <an...@andrewash.com> on 2014/04/15 02:48:55 UTC

Scala vs Python performance differences

Hi Spark users,

I've always done all my Spark work in Scala, but occasionally people ask
about Python and its performance impact vs the same algorithm
implementation in Scala.

Has anyone done tests to measure the difference?

Anecdotally I've heard Python is a 40% slowdown but that's entirely hearsay.

Cheers,
Andrew

Re: Scala vs Python performance differences

Posted by Samarth Mailinglist <ma...@gmail.com>.
I was about to ask this question.

On Wed, Nov 12, 2014 at 3:42 PM, Andrew Ash <an...@andrewash.com> wrote:

> Jeremy,
>
> Did you complete this benchmark in a way that's shareable with those
> interested here?
>
> Andrew
>
> On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> I'd also be interested in seeing such a benchmark.
>>
>>
>> On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira <ia...@hotmail.com>
>> wrote:
>>
>>> This would be super useful. Thanks.
>>>
>>> On 4/15/14, 1:30 AM, "Jeremy Freeman" <fr...@gmail.com> wrote:
>>>
>>> >Hi Andrew,
>>> >
>>> >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing
>>> on
>>> >ML algorithms, as I'm particularly curious about the relative
>>> performance
>>> >of
>>> >MLlib in Scala vs the Python MLlib API vs pure Python implementations.
>>> >
>>> >Will share real results as soon as I have them, but roughly, in our
>>> hands,
>>> >that 40% number is ballpark correct, at least for some basic operations
>>> >(e.g
>>> >textFile, count, reduce).
>>> >
>>> >-- Jeremy
>>> >
>>> >---------------------
>>> >Jeremy Freeman, PhD
>>> >Neuroscientist
>>> >@thefreemanlab
>>> >
>>> >
>>> >
>>> >--
>>> >View this message in context:
>>> >
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor
>>> >mance-differences-tp4247p4261.html
>>> >Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>>
>>>
>>
>

Re: Scala vs Python performance differences

Posted by Andrew Ash <an...@andrewash.com>.
Jeremy,

Did you complete this benchmark in a way that's shareable with those
interested here?

Andrew

On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> I'd also be interested in seeing such a benchmark.
>
>
> On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira <ia...@hotmail.com>
> wrote:
>
>> This would be super useful. Thanks.
>>
>> On 4/15/14, 1:30 AM, "Jeremy Freeman" <fr...@gmail.com> wrote:
>>
>> >Hi Andrew,
>> >
>> >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing
>> on
>> >ML algorithms, as I'm particularly curious about the relative performance
>> >of
>> >MLlib in Scala vs the Python MLlib API vs pure Python implementations.
>> >
>> >Will share real results as soon as I have them, but roughly, in our
>> hands,
>> >that 40% number is ballpark correct, at least for some basic operations
>> >(e.g
>> >textFile, count, reduce).
>> >
>> >-- Jeremy
>> >
>> >---------------------
>> >Jeremy Freeman, PhD
>> >Neuroscientist
>> >@thefreemanlab
>> >
>> >
>> >
>> >--
>> >View this message in context:
>> >
>> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor
>> >mance-differences-tp4247p4261.html
>> >Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>>
>>
>

Re: Scala vs Python performance differences

Posted by Nicholas Chammas <ni...@gmail.com>.
I'd also be interested in seeing such a benchmark.


On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira <ia...@hotmail.com>wrote:

> This would be super useful. Thanks.
>
> On 4/15/14, 1:30 AM, "Jeremy Freeman" <fr...@gmail.com> wrote:
>
> >Hi Andrew,
> >
> >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on
> >ML algorithms, as I'm particularly curious about the relative performance
> >of
> >MLlib in Scala vs the Python MLlib API vs pure Python implementations.
> >
> >Will share real results as soon as I have them, but roughly, in our hands,
> >that 40% number is ballpark correct, at least for some basic operations
> >(e.g
> >textFile, count, reduce).
> >
> >-- Jeremy
> >
> >---------------------
> >Jeremy Freeman, PhD
> >Neuroscientist
> >@thefreemanlab
> >
> >
> >
> >--
> >View this message in context:
> >
> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor
> >mance-differences-tp4247p4261.html
> >Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
>

Re: Scala vs Python performance differences

Posted by Ian Ferreira <ia...@hotmail.com>.
This would be super useful. Thanks.

On 4/15/14, 1:30 AM, "Jeremy Freeman" <fr...@gmail.com> wrote:

>Hi Andrew,
>
>I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on
>ML algorithms, as I'm particularly curious about the relative performance
>of
>MLlib in Scala vs the Python MLlib API vs pure Python implementations.
>
>Will share real results as soon as I have them, but roughly, in our hands,
>that 40% number is ballpark correct, at least for some basic operations
>(e.g
>textFile, count, reduce).
>
>-- Jeremy
>
>---------------------
>Jeremy Freeman, PhD
>Neuroscientist
>@thefreemanlab
>
>
>
>--
>View this message in context:
>http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor
>mance-differences-tp4247p4261.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Scala vs Python performance differences

Posted by Jeremy Freeman <fr...@gmail.com>.
Hi Andrew,

I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on
ML algorithms, as I'm particularly curious about the relative performance of
MLlib in Scala vs the Python MLlib API vs pure Python implementations. 

Will share real results as soon as I have them, but roughly, in our hands,
that 40% number is ballpark correct, at least for some basic operations (e.g
textFile, count, reduce).

-- Jeremy

---------------------
Jeremy Freeman, PhD
Neuroscientist
@thefreemanlab



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p4261.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Scala vs Python performance differences

Posted by Bin Wang <bi...@gmail.com>.
At least, Spark Streaming doesn't support Python at this moment, right?


On Mon, Apr 14, 2014 at 6:48 PM, Andrew Ash <an...@andrewash.com> wrote:

> Hi Spark users,
>
> I've always done all my Spark work in Scala, but occasionally people ask
> about Python and its performance impact vs the same algorithm
> implementation in Scala.
>
> Has anyone done tests to measure the difference?
>
> Anecdotally I've heard Python is a 40% slowdown but that's entirely
> hearsay.
>
> Cheers,
> Andrew
>

Re: Scala vs Python performance differences

Posted by Davies Liu <da...@databricks.com>.
Hey Phil,

Thank you sharing this. The result didn't surprise me a lot, it's normal to do
the prototype in Python, once it get stable and you really need the performance,
then rewrite part of it in C or whole of it in another language does make sense,
it will not cause you much time.

Davies

On Fri, Jan 16, 2015 at 7:38 AM, philpearl <ph...@tanktop.tv> wrote:
> I was interested in this as I had some Spark code in Python that was too slow
> and wanted to know whether Scala would fix it for me.  So I re-wrote my code
> in Scala.
>
> In my particular case the Scala version was 10 times faster.  But I think
> that is because I did an awful lot of computation in my own code rather than
> in a library like numpy. (I put a bit more detail  here
> <http://tttv-engineering.tumblr.com/post/108260351966/spark-python-vs-scala>
> in case you are interested)
>
> So there's one data point, if only for the obvious data point comparing
> computations in Scala to computations in pure Python.
>
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Scala vs Python performance differences

Posted by philpearl <ph...@tanktop.tv>.
I was interested in this as I had some Spark code in Python that was too slow
and wanted to know whether Scala would fix it for me.  So I re-wrote my code
in Scala.

In my particular case the Scala version was 10 times faster.  But I think
that is because I did an awful lot of computation in my own code rather than
in a library like numpy. (I put a bit more detail  here
<http://tttv-engineering.tumblr.com/post/108260351966/spark-python-vs-scala>  
in case you are interested)

So there's one data point, if only for the obvious data point comparing
computations in Scala to computations in pure Python.





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org