You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by SNEHASISH DUTTA <in...@gmail.com> on 2019/03/11 09:34:32 UTC

Benchmark Java/Scala/Python for Apache spark

Hi

Is there a way to get performance benchmarks for development of application
using either Java/Scala/Python

Use case mostly involve SQL pipeline/data ingested from various sources
including Kafka

What should be the most preferred language and it would be great if the
preference for language can be justified from the perspective of
application development

Thanks and Regards
Snehasish

Re: Benchmark Java/Scala/Python for Apache spark

Posted by Reynold Xin <rx...@databricks.com>.
If you use UDFs in Python, you would want to use Pandas UDF for better
performance.

On Mon, Mar 11, 2019 at 7:50 PM Jonathan Winandy <jo...@gmail.com>
wrote:

> Thanks, I didn't know!
>
> That being said, any udf use seems to affect badly code generation (and
> the performance).
>
>
> On Mon, 11 Mar 2019, 15:13 Dylan Guedes, <dj...@gmail.com> wrote:
>
>> Btw, even if you are using Python you can register your UDFs in Scala and
>> use them in Python.
>>
>> On Mon, Mar 11, 2019 at 6:55 AM Jonathan Winandy <
>> jonathan.winandy@gmail.com> wrote:
>>
>>> Hello Snehasish
>>>
>>> If you are not using UDFs, you will have very similar performance with
>>> those languages on SQL.
>>>
>>> So it go down to :
>>> * if you know python, go for python.
>>> * if you are used to the JVM, and are ready for a bit of paradigm shift,
>>> go for Scala.
>>>
>>> Our team is using Scala, however we help other data engs that are using
>>> python.
>>>
>>> I would say go for pure functional programming, however that is biased
>>> and python gets the job done anyway.
>>>
>>> Cheers,
>>> Jonathan
>>>
>>> On Mon, 11 Mar 2019, 10:34 SNEHASISH DUTTA, <in...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> Is there a way to get performance benchmarks for development of
>>>> application using either Java/Scala/Python
>>>>
>>>> Use case mostly involve SQL pipeline/data ingested from various sources
>>>> including Kafka
>>>>
>>>> What should be the most preferred language and it would be great if the
>>>> preference for language can be justified from the perspective of
>>>> application development
>>>>
>>>> Thanks and Regards
>>>> Snehasish
>>>>
>>>

Re: Benchmark Java/Scala/Python for Apache spark

Posted by Jonathan Winandy <jo...@gmail.com>.
Thanks, I didn't know!

That being said, any udf use seems to affect badly code generation (and the
performance).


On Mon, 11 Mar 2019, 15:13 Dylan Guedes, <dj...@gmail.com> wrote:

> Btw, even if you are using Python you can register your UDFs in Scala and
> use them in Python.
>
> On Mon, Mar 11, 2019 at 6:55 AM Jonathan Winandy <
> jonathan.winandy@gmail.com> wrote:
>
>> Hello Snehasish
>>
>> If you are not using UDFs, you will have very similar performance with
>> those languages on SQL.
>>
>> So it go down to :
>> * if you know python, go for python.
>> * if you are used to the JVM, and are ready for a bit of paradigm shift,
>> go for Scala.
>>
>> Our team is using Scala, however we help other data engs that are using
>> python.
>>
>> I would say go for pure functional programming, however that is biased
>> and python gets the job done anyway.
>>
>> Cheers,
>> Jonathan
>>
>> On Mon, 11 Mar 2019, 10:34 SNEHASISH DUTTA, <in...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> Is there a way to get performance benchmarks for development of
>>> application using either Java/Scala/Python
>>>
>>> Use case mostly involve SQL pipeline/data ingested from various sources
>>> including Kafka
>>>
>>> What should be the most preferred language and it would be great if the
>>> preference for language can be justified from the perspective of
>>> application development
>>>
>>> Thanks and Regards
>>> Snehasish
>>>
>>

Re: Benchmark Java/Scala/Python for Apache spark

Posted by Dylan Guedes <dj...@gmail.com>.
Btw, even if you are using Python you can register your UDFs in Scala and
use them in Python.

On Mon, Mar 11, 2019 at 6:55 AM Jonathan Winandy <jo...@gmail.com>
wrote:

> Hello Snehasish
>
> If you are not using UDFs, you will have very similar performance with
> those languages on SQL.
>
> So it go down to :
> * if you know python, go for python.
> * if you are used to the JVM, and are ready for a bit of paradigm shift,
> go for Scala.
>
> Our team is using Scala, however we help other data engs that are using
> python.
>
> I would say go for pure functional programming, however that is biased and
> python gets the job done anyway.
>
> Cheers,
> Jonathan
>
> On Mon, 11 Mar 2019, 10:34 SNEHASISH DUTTA, <in...@gmail.com>
> wrote:
>
>> Hi
>>
>> Is there a way to get performance benchmarks for development of
>> application using either Java/Scala/Python
>>
>> Use case mostly involve SQL pipeline/data ingested from various sources
>> including Kafka
>>
>> What should be the most preferred language and it would be great if the
>> preference for language can be justified from the perspective of
>> application development
>>
>> Thanks and Regards
>> Snehasish
>>
>

Re: Benchmark Java/Scala/Python for Apache spark

Posted by Jonathan Winandy <jo...@gmail.com>.
Hello Snehasish

If you are not using UDFs, you will have very similar performance with
those languages on SQL.

So it go down to :
* if you know python, go for python.
* if you are used to the JVM, and are ready for a bit of paradigm shift, go
for Scala.

Our team is using Scala, however we help other data engs that are using
python.

I would say go for pure functional programming, however that is biased and
python gets the job done anyway.

Cheers,
Jonathan

On Mon, 11 Mar 2019, 10:34 SNEHASISH DUTTA, <in...@gmail.com>
wrote:

> Hi
>
> Is there a way to get performance benchmarks for development of
> application using either Java/Scala/Python
>
> Use case mostly involve SQL pipeline/data ingested from various sources
> including Kafka
>
> What should be the most preferred language and it would be great if the
> preference for language can be justified from the perspective of
> application development
>
> Thanks and Regards
> Snehasish
>