You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/19 18:20:23 UTC

[GitHub] rdblue commented on issue #23836: [SPARK-26915][SQL] DataFrameWriter.save() should write without schema validation

rdblue commented on issue #23836: [SPARK-26915][SQL] DataFrameWriter.save()  should write without schema validation
URL: https://github.com/apache/spark/pull/23836#issuecomment-465249681
 
 
   @gengliangwang: The community agreed to remove the v2 write paths using SaveMode before the next release. The problem is that SaveMode is ambiguous and doesn't have reliable behavior. That's why we are introducing the [new logical plans](https://docs.google.com/document/d/1gYm5Ji2Mge3QBdOliFV5gSPTKlX4q1DCBXIkiyMv62A/edit?ts=5a987801#heading=h.m45webtwxf2d): to set expectations for behavior.
   
   Part of the challenge while standardizing behavior across sources is to support what Spark already does in v2. In this case, we need to define how a table can opt out of schema validation, or at least have relaxed validation rules that allow things like adding new columns by writing a DF with a new column.
   
   I don't think that the right way to do that is to use a write path that has no validation (the WriteToDataSourceV2 plan), especially when that write path is set to be removed.
   
   As I've said before, the right way to do this is to:
   1. Define what the behavior should be in v2 for these tables
   2. Propose an API that allows sources to request that behavior
   
   I think number 1 is the most important. What you have here removes for all v2 writes, but I think the behavior you are trying to mimic from v1 is applied when writing to path-based tables. That's a big unintended consequence, and why it is important to **state what you're trying to accomplish and have a design for how you're going to do it**.
   
   Please consider this a -1.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org