You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by SNEHASISH DUTTA <in...@gmail.com> on 2019/03/11 09:34:32 UTC
Benchmark Java/Scala/Python for Apache spark
Hi
Is there a way to get performance benchmarks for development of application
using either Java/Scala/Python
Use case mostly involve SQL pipeline/data ingested from various sources
including Kafka
What should be the most preferred language and it would be great if the
preference for language can be justified from the perspective of
application development
Thanks and Regards
Snehasish
Re: Benchmark Java/Scala/Python for Apache spark
Posted by Reynold Xin <rx...@databricks.com>.
If you use UDFs in Python, you would want to use Pandas UDF for better
performance.
On Mon, Mar 11, 2019 at 7:50 PM Jonathan Winandy <jo...@gmail.com>
wrote:
> Thanks, I didn't know!
>
> That being said, any udf use seems to affect badly code generation (and
> the performance).
>
>
> On Mon, 11 Mar 2019, 15:13 Dylan Guedes, <dj...@gmail.com> wrote:
>
>> Btw, even if you are using Python you can register your UDFs in Scala and
>> use them in Python.
>>
>> On Mon, Mar 11, 2019 at 6:55 AM Jonathan Winandy <
>> jonathan.winandy@gmail.com> wrote:
>>
>>> Hello Snehasish
>>>
>>> If you are not using UDFs, you will have very similar performance with
>>> those languages on SQL.
>>>
>>> So it go down to :
>>> * if you know python, go for python.
>>> * if you are used to the JVM, and are ready for a bit of paradigm shift,
>>> go for Scala.
>>>
>>> Our team is using Scala, however we help other data engs that are using
>>> python.
>>>
>>> I would say go for pure functional programming, however that is biased
>>> and python gets the job done anyway.
>>>
>>> Cheers,
>>> Jonathan
>>>
>>> On Mon, 11 Mar 2019, 10:34 SNEHASISH DUTTA, <in...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> Is there a way to get performance benchmarks for development of
>>>> application using either Java/Scala/Python
>>>>
>>>> Use case mostly involve SQL pipeline/data ingested from various sources
>>>> including Kafka
>>>>
>>>> What should be the most preferred language and it would be great if the
>>>> preference for language can be justified from the perspective of
>>>> application development
>>>>
>>>> Thanks and Regards
>>>> Snehasish
>>>>
>>>
Re: Benchmark Java/Scala/Python for Apache spark
Posted by Jonathan Winandy <jo...@gmail.com>.
Thanks, I didn't know!
That being said, any udf use seems to affect badly code generation (and the
performance).
On Mon, 11 Mar 2019, 15:13 Dylan Guedes, <dj...@gmail.com> wrote:
> Btw, even if you are using Python you can register your UDFs in Scala and
> use them in Python.
>
> On Mon, Mar 11, 2019 at 6:55 AM Jonathan Winandy <
> jonathan.winandy@gmail.com> wrote:
>
>> Hello Snehasish
>>
>> If you are not using UDFs, you will have very similar performance with
>> those languages on SQL.
>>
>> So it go down to :
>> * if you know python, go for python.
>> * if you are used to the JVM, and are ready for a bit of paradigm shift,
>> go for Scala.
>>
>> Our team is using Scala, however we help other data engs that are using
>> python.
>>
>> I would say go for pure functional programming, however that is biased
>> and python gets the job done anyway.
>>
>> Cheers,
>> Jonathan
>>
>> On Mon, 11 Mar 2019, 10:34 SNEHASISH DUTTA, <in...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> Is there a way to get performance benchmarks for development of
>>> application using either Java/Scala/Python
>>>
>>> Use case mostly involve SQL pipeline/data ingested from various sources
>>> including Kafka
>>>
>>> What should be the most preferred language and it would be great if the
>>> preference for language can be justified from the perspective of
>>> application development
>>>
>>> Thanks and Regards
>>> Snehasish
>>>
>>
Re: Benchmark Java/Scala/Python for Apache spark
Posted by Dylan Guedes <dj...@gmail.com>.
Btw, even if you are using Python you can register your UDFs in Scala and
use them in Python.
On Mon, Mar 11, 2019 at 6:55 AM Jonathan Winandy <jo...@gmail.com>
wrote:
> Hello Snehasish
>
> If you are not using UDFs, you will have very similar performance with
> those languages on SQL.
>
> So it go down to :
> * if you know python, go for python.
> * if you are used to the JVM, and are ready for a bit of paradigm shift,
> go for Scala.
>
> Our team is using Scala, however we help other data engs that are using
> python.
>
> I would say go for pure functional programming, however that is biased and
> python gets the job done anyway.
>
> Cheers,
> Jonathan
>
> On Mon, 11 Mar 2019, 10:34 SNEHASISH DUTTA, <in...@gmail.com>
> wrote:
>
>> Hi
>>
>> Is there a way to get performance benchmarks for development of
>> application using either Java/Scala/Python
>>
>> Use case mostly involve SQL pipeline/data ingested from various sources
>> including Kafka
>>
>> What should be the most preferred language and it would be great if the
>> preference for language can be justified from the perspective of
>> application development
>>
>> Thanks and Regards
>> Snehasish
>>
>
Re: Benchmark Java/Scala/Python for Apache spark
Posted by Jonathan Winandy <jo...@gmail.com>.
Hello Snehasish
If you are not using UDFs, you will have very similar performance with
those languages on SQL.
So it go down to :
* if you know python, go for python.
* if you are used to the JVM, and are ready for a bit of paradigm shift, go
for Scala.
Our team is using Scala, however we help other data engs that are using
python.
I would say go for pure functional programming, however that is biased and
python gets the job done anyway.
Cheers,
Jonathan
On Mon, 11 Mar 2019, 10:34 SNEHASISH DUTTA, <in...@gmail.com>
wrote:
> Hi
>
> Is there a way to get performance benchmarks for development of
> application using either Java/Scala/Python
>
> Use case mostly involve SQL pipeline/data ingested from various sources
> including Kafka
>
> What should be the most preferred language and it would be great if the
> preference for language can be justified from the perspective of
> application development
>
> Thanks and Regards
> Snehasish
>