You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/19 02:31:54 UTC

[GitHub] cloud-fan commented on issue #23829: [SPARK-26915][SQL]File source should write without schema validation in DataFrameWriter.save()

cloud-fan commented on issue #23829: [SPARK-26915][SQL]File source should write without schema validation in DataFrameWriter.save()
URL: https://github.com/apache/spark/pull/23829#issuecomment-464954983
 
 
   @rdblue there are 2 problems here
   1. file source should not have schema validation during write
   2. file source can't report schema during write, if the output path doesn't exist
   
   For 1, I think we can introduce a new trait(or capability API) to indicate that a data source doesn't need schema validation during write.
   
   For 2, I think we need the CTAS(and RTAS) operator.
   
   One thing we need to note that, `DataFrameWriter` API can mix data and metadata operations. e.g. `df.mode("append")` can append data to a non-existing table, with CTAS semantic.
   
   How would the ongoing catalog API proposal solve this issue?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org