You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Wenchen Fan <cl...@gmail.com> on 2018/07/31 17:42:18 UTC

DISCUSS: SPARK-24882 data source v2 API improvement

Hi all,

Data source v2 is out for a while. During this release, we

migrated most of the streaming sources to the v2 API (SPARK-22911
<https://issues.apache.org/jira/browse/SPARK-22911>)
started to migrate file sources (SPARK-23817
<https://issues.apache.org/jira/browse/SPARK-23817>)
started to design new features (SPARK-24525
<https://issues.apache.org/jira/browse/SPARK-24252>, SPARK-23521
<https://issues.apache.org/jira/browse/SPARK-23521>)
did some refactoring and cleanup (SPARK-23203
<https://issues.apache.org/jira/browse/SPARK-23203>, SPARK-23323
<https://issues.apache.org/jira/browse/SPARK-23323>, SPARK-23325
<https://issues.apache.org/jira/browse/SPARK-23325>, ...)

From these works, we learned a lot and get a clearer picture about how the
data source API should work with Spark and developers.

As a result, we are proposing SPARK-24882
<https://issues.apache.org/jira/browse/SPARK-24882>, to revisit the entire
API and improve it. Please read the design doc attached in the JIRA ticket
if you are interested in it. Any comments are appreciated!

Thanks
Wenchen