You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "canwang (Jira)" <ji...@apache.org> on 2021/07/16 09:16:00 UTC

[jira] [Commented] (SPARK-32530) SPIP: Kotlin support for Apache Spark

    [ https://issues.apache.org/jira/browse/SPARK-32530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381919#comment-17381919 ] 

canwang commented on SPARK-32530:
---------------------------------

I've been helping with the Jetbrains' [Kotlin Spark APIhttps://github.com/JetBrains/kotlin-spark-api] too,I also hope that first-class support for Kotlin language into the Apache Spark project

1. I think kotlin api may be a better choice on jvm for spark developers.

- As the description says, there are a lot of kotlin developers now, and they are growing fast, and more and more projects use kotlin as the first-class api,For example the demo on spring's web page has defaulted to kotlin.

- As you said, there are very few developers using java to develop spark, because although spark perfectly supports java, the syntax of java is not friendly to developing spark. I believe they use java because of the relatively long learning curve of scala. High, koltin is much better, which can also be reflected in the growth rate of koltin users

2. The cost of adapting kotlin may not be high

- The current [Kotlin Spark APIhttps://github.com/JetBrains/kotlin-spark-api] already exists and it is basically usable. Migrating to the spark appliction repo should only need to add more tests.

- Judging from the existing [Kotlin Spark APIhttps://github.com/JetBrains/kotlin-spark-api], the main work of adaptation is to process the Serializer and Deserializer in the Encoder. I think the workload of these adaptation work should be able to refer to the adaptation of java, and it is even simpler than java. , Because of the adaptation of java, kotlin has a reference

> SPIP: Kotlin support for Apache Spark
> -------------------------------------
>
>                 Key: SPARK-32530
>                 URL: https://issues.apache.org/jira/browse/SPARK-32530
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: Pasha Finkeshteyn
>            Priority: Major
>
> h2. Background and motivation
> Kotlin is a cross-platform, statically typed, general-purpose JVM language. In the last year more than 5 million developers have used Kotlin in mobile, backend, frontend and scientific development. The number of Kotlin developers grows rapidly every year. 
>  * [According to redmonk|https://redmonk.com/sogrady/2020/02/28/language-rankings-1-20/]: "Kotlin, the second fastest growing language we’ve seen outside of Swift, made a big splash a year ago at this time when it vaulted eight full spots up the list."
>  * [According to snyk.io|https://snyk.io/wp-content/uploads/jvm_2020.pdf], Kotlin is the second most popular language on the JVM
>  * [According to StackOverflow|https://insights.stackoverflow.com/survey/2020] Kotlin’s share increased by 7.8% in 2020.
> We notice the increasing usage of Kotlin in data analysis ([6% of users in 2020|https://www.jetbrains.com/lp/devecosystem-2020/kotlin/], as opposed to 2% in 2019) and machine learning (3% of users in 2020, as opposed to 0% in 2019), and we expect these numbers to continue to grow. 
> We, authors of this SPIP, strongly believe that making Kotlin API officially available to developers can bring new users to Apache Spark and help some of the existing users.
> h2. Goals
> The goal of this project is to bring first-class support for Kotlin language into the Apache Spark project. We’re going to achieve this by adding one more module to the current Apache Spark distribution.
> h2. Non-goals
> There is no goal to replace any existing language support or to change any existing Apache Spark API.
> At this time, there is no goal to support non-core APIs of Apache Spark like Spark ML and Spark structured streaming. This may change in the future based on community feedback.
> There is no goal to provide CLI for Kotlin for Apache Spark, this will be a separate SPIP.
> There is no goal to provide support for Apache Spark < 3.0.0.
> h2. Current implementation
> A working prototype is available at [https://github.com/JetBrains/kotlin-spark-api]. It has been tested inside JetBrains and by early adopters.
> h2. What are the risks?
> There is always a risk that this product won’t get enough popularity and will bring more costs than benefits. It can be mitigated by the fact that we don't need to change any existing API and support can be potentially dropped at any time.
> We also believe that existing API is rather low maintenance. It does not bring anything more complex than already exists in the Spark codebase. Furthermore, the implementation is compact - less than 2000 lines of code.
> We are committed to maintaining, improving and evolving the API based on feedback from both Spark and Kotlin communities. As the Kotlin data community continues to grow, we see Kotlin API for Apache Spark as an important part in the evolving Kotlin ecosystem, and intend to fully support it. 
> h2. How long will it take?
> A  working implementation is already available, and if the community will have any proposal of changes for this implementation to be improved, these can be implemented quickly — in weeks if not days.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org