You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Holden Karau <ho...@pigscanfly.ca> on 2019/11/09 17:52:01 UTC

Re: Why Spark generates Java code and not Scala?

Switching this from user to dev

On Sat, Nov 9, 2019 at 9:47 AM Bartosz Konieczny <ba...@gmail.com>
wrote:

> Hi there,
>
> Few days ago I got an intriguing but hard to answer question:
> "Why Spark generates Java code and not Scala code?"
> (https://github.com/bartosz25/spark-scala-playground/issues/18)
>
> Since I'm not sure about the exact answer, I'd like to ask you to confirm
> or not my thinking. I was looking for the reasons in the JIRA and the
> research paper "Spark SQL: Relational Data Processing in Spark" (
> http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf) but
> found nothing explaining why Java over Scala. The single task I found was
> about why Scala and not Java but concerning data types (
> https://issues.apache.org/jira/browse/SPARK-5193) That's why I'm writing
> here.
>
> My guesses about choosing Java code are:
> - Java runtime compiler libs are more mature and prod-ready than the
> Scala's - or at least, they were at the implementation time
> - Scala compiler tends to be slower than the Java's
> https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed
>
From the discussions when I was doing some code gen (in MLlib not SQL) I
think this is the primary reason why.

>
> <https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed>
> - Scala compiler seems to be more complex, so debugging & maintaining it
> would be harder
>
this was also given as a secondary reason

> - it was easier to represent a pure Java OO design than mixed FP/OO in
> Scala
>
no one brought up this point. Maybe it was a consideration and it just
wasn’t raised.

> ?
>
> Thank you for your help.
>
>
> --
> Bartosz Konieczny
> data engineer
> https://www.waitingforcode.com
> https://github.com/bartosz25/
> https://twitter.com/waitingforcode
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: Why Spark generates Java code and not Scala?

Posted by Reynold Xin <rx...@databricks.com>.
It’s mainly due to compilation speed. Scala compiler is known to be slow.
Even javac is quite slow. We use Janino which is a simpler compiler to get
faster compilation speed at runtime.

Also for low level code we can’t use (due to perf concerns) any of the
edges scala has over java, eg we can’t use the scala collection library,
functional programming, map/flatMap. So using scala doesn’t really buy
anything even if there is no compilation speed concerns.

On Sat, Nov 9, 2019 at 9:52 AM Holden Karau <ho...@pigscanfly.ca> wrote:

>
> Switching this from user to dev
>
> On Sat, Nov 9, 2019 at 9:47 AM Bartosz Konieczny <ba...@gmail.com>
> wrote:
>
>> Hi there,
>>
>> Few days ago I got an intriguing but hard to answer question:
>> "Why Spark generates Java code and not Scala code?"
>> (https://github.com/bartosz25/spark-scala-playground/issues/18)
>>
>> Since I'm not sure about the exact answer, I'd like to ask you to confirm
>> or not my thinking. I was looking for the reasons in the JIRA and the
>> research paper "Spark SQL: Relational Data Processing in Spark" (
>> http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf) but
>> found nothing explaining why Java over Scala. The single task I found was
>> about why Scala and not Java but concerning data types (
>> https://issues.apache.org/jira/browse/SPARK-5193) That's why I'm writing
>> here.
>>
>> My guesses about choosing Java code are:
>> - Java runtime compiler libs are more mature and prod-ready than the
>> Scala's - or at least, they were at the implementation time
>> - Scala compiler tends to be slower than the Java's
>> https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed
>>
> From the discussions when I was doing some code gen (in MLlib not SQL) I
> think this is the primary reason why.
>
>>
>> <https://stackoverflow.com/questions/3490383/java-compile-speed-vs-scala-compile-speed>
>> - Scala compiler seems to be more complex, so debugging & maintaining it
>> would be harder
>>
> this was also given as a secondary reason
>
>> - it was easier to represent a pure Java OO design than mixed FP/OO in
>> Scala
>>
> no one brought up this point. Maybe it was a consideration and it just
> wasn’t raised.
>
>> ?
>>
>> Thank you for your help.
>>
>>
>> --
>> Bartosz Konieczny
>> data engineer
>> https://www.waitingforcode.com
>> https://github.com/bartosz25/
>> https://twitter.com/waitingforcode
>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>