You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ryan Blue (JIRA)" <ji...@apache.org> on 2018/08/17 20:22:00 UTC
[jira] [Comment Edited] (SPARK-24882) data source v2 API improvement

    [ https://issues.apache.org/jira/browse/SPARK-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584343#comment-16584343 ] 

Ryan Blue edited comment on SPARK-24882 at 8/17/18 8:21 PM:
------------------------------------------------------------

One more thing: I think we should separate BatchOverwriteSupport in two. A source won't necessarily support both overwrite by filter and the dynamic partition overwrite because not all sources are partitioned. There should probably be {{BatchWriteSupport}}, {{BatchOverwriteSupport}}, and {{BatchPartitionOverwriteSupport}}.

See https://github.com/apache/spark/pull/21308 for example docs for overwrite by filter.


was (Author: rdblue):
One more thing: I think we should separate BatchOverwriteSupport in two. A source won't necessarily support both overwrite by filter and the dynamic partition overwrite because not all sources are partitioned. There should probably be {{BatchWriteSupport}}, {{BatchOverwriteSupport}}, and {{BatchPartitionOverwriteSupport}}.

> data source v2 API improvement
> ------------------------------
>
>                 Key: SPARK-24882
>                 URL: https://issues.apache.org/jira/browse/SPARK-24882
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Wenchen Fan
>            Assignee: Wenchen Fan
>            Priority: Major
>
> Data source V2 is out for a while, see the SPIP [here|https://docs.google.com/document/d/1n_vUVbF4KD3gxTmkNEon5qdQ-Z8qU5Frf6WMQZ6jJVM/edit?usp=sharing]. We have already migrated most of the built-in streaming data sources to the V2 API, and the file source migration is in progress. During the migration, we found several problems and want to address them before we stabilize the V2 API.
> To solve these problems, we need to separate responsibilities in the data source v2 API, isolate the stateull part of the API, think of better naming of some interfaces. Details please see the attached google doc: https://docs.google.com/document/d/1DDXCTCrup4bKWByTalkXWgavcPdvur8a4eEu8x1BzPM/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org