You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "PEIYUAN SUN (Jira)" <ji...@apache.org> on 2023/04/16 13:54:00 UTC

[jira] [Created] (SPARK-43155) DataSourceV2 is hard to be implemented without following V1

PEIYUAN SUN created SPARK-43155:
-----------------------------------

             Summary: DataSourceV2 is hard to be implemented without following V1
                 Key: SPARK-43155
                 URL: https://issues.apache.org/jira/browse/SPARK-43155
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: PEIYUAN SUN
             Fix For: 3.5.0


### Description

The current interface of DataSourceV2 becomes overly complicated than the Spark 2.x versions. To implement under the DataSourceV2, user needs to learn not only the V2 APIs and interfaces. But also the DataSourceV1 (as it is a failback version). 

#### Interface Gaps
There is no easy way and clear examples on how to implement both for a new dataSource. For example, the examples in standard spark repo like orc, parquet, json has a FileFormat interface for V1 while all these are not feasible to be followed since the SPI is hard-code as `DefaultSource` instead of dynamic loading if from user provided class outside the Spark Repo.

#### Loss of simple layer over different kinds of dataSource
With original V1, user can actually implement a new wrapper on top of orc/parquet easily with Relation Interface. The DataSourceV2 again here becomes too low level and hard to be used in this case.

#### No explicit guidance
The functionality interfaces are not well organized which forces the reader spend lots of time to understand the commit history, existing patterns as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org