You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/20 05:17:41 UTC

[GitHub] gengliangwang commented on issue #23836: [SPARK-26915][SQL] DataFrameWriter.save() should write without schema validation

gengliangwang commented on issue #23836: [SPARK-26915][SQL] DataFrameWriter.save()  should write without schema validation
URL: https://github.com/apache/spark/pull/23836#issuecomment-465428121
 
 
   @rdblue Sorry if I cause some misunderstanding.
   Let's focus on the API `DataFrameWriter.save` here:
   1. The expressions `AppendData` and `OverwriteByExpression` always get table schema and validate the output schema with table schema. This is a serious behavior change.  Before we have a final agreement about file source V2 behavior(mostly likely we will keep the V1 behavior in this API), I would suggest to revert them for now.
   
   2. Keeping ORC V2 working helps us to find problems. This PR is not proposing to keep `SupportsSaveMode` or to saying this is the final solution. I am suggesting that we should keep file source behavior in this API. Unless there is a clear solution that eventually the following partial development will work.
   
   The above solution of @cloud-fan is good directions to go. We can discuss the solution in tomorrow's meetup. 
   For the code changes in `DataFrameWriter.save`, I think we should remove the new expressions for now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org