You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2018/12/10 21:50:52 UTC

[GitHub] rdblue commented on a change in pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write)

rdblue commented on a change in pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write)
URL: https://github.com/apache/spark/pull/23208#discussion_r240394058
 
 

 ##########
 File path: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java
 ##########
 @@ -25,7 +25,10 @@
  * The base interface for v2 data sources which don't have a real catalog. Implementations must
  * have a public, 0-arg constructor.
  * <p>
- * The major responsibility of this interface is to return a {@link Table} for read/write.
+ * The major responsibility of this interface is to return a {@link Table} for read/write. If you
+ * want to allow end-users to write data to non-existing tables via write APIs in `DataFrameWriter`
+ * with `SaveMode`, you must return a {@link Table} instance even if the table doesn't exist. The
+ * table schema can be empty in this case.
 
 Review comment:
   `SaveMode` is incompatible with the SPIP to standarize behavior that was voted on and accepted. The save mode in `DataFrameWriter` must be used to create v2 plans that have well-defined behavior and cannot be passed to implementations in the final version of the v2 read/write API.
   
   I see no reason to put off removing `SaveMode` from the API. If we remove it now, we will avoid having more versions of this API that are **fundamentally broken**. We will avoid more implementations that rely on it, not aware that it will be removed.
   
   To your point about whether it is safe: the only case where this is actually used is `SaveMode.Overwrite` and `SaveMode.Append`. To replace those, all that needs to happen is to define what kind of overwrite should happen here (dynamic or truncate).
   
   I can supply the logical plan and physical implementation in a follow-up PR because I already have all this written and waiting to go in. Or, I can add a PR to merge first if you'd like to have these changes depend on that implementation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org