You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jan Berkel (Jira)" <ji...@apache.org> on 2020/10/12 21:19:00 UTC

[jira] [Commented] (SPARK-25390) Data source V2 API refactoring

    [ https://issues.apache.org/jira/browse/SPARK-25390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212675#comment-17212675 ] 

Jan Berkel commented on SPARK-25390:
------------------------------------

I'm in a similar situation. [~Kyrdan] asked on the mailing list as directed, but nobody replied. It's strange that such a central API is completely undocumented. The new iteration of the datasource API doesn't look remotely like v2, it might as well have been called v3.

If it's not possible to provide the documentation, put at least some notes/warnings in the migration guide or changelog indicating that Spark3's datasource API has changed completely.

And, as far as I can tell at the moment, it doesn't seem to be possible to implement the new Datasource V2 using plain Java classes.

> Data source V2 API refactoring
> ------------------------------
>
>                 Key: SPARK-25390
>                 URL: https://issues.apache.org/jira/browse/SPARK-25390
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Wenchen Fan
>            Assignee: Wenchen Fan
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Currently it's not very clear how we should abstract data source v2 API. The abstraction should be unified between batch and streaming, or similar but have a well-defined difference between batch and streaming. And the abstraction should also include catalog/table.
> An example of the abstraction:
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}
> We should refactor the data source v2 API according to the abstraction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org