You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/20 03:35:42 UTC

[GitHub] cloud-fan commented on issue #23836: [SPARK-26915][SQL] DataFrameWriter.save() should write without schema validation

cloud-fan commented on issue #23836: [SPARK-26915][SQL] DataFrameWriter.save()  should write without schema validation
URL: https://github.com/apache/spark/pull/23836#issuecomment-465408204
 
 
   I definitely agree with the direction: translate `SaveMode` to operators with clear semantic, and remove `SaveMode` from ds v2 but keep it in the public API for a while.
   
   However I think the current translation is not precise: append mode doesn't mean append, it's actually "create table if not exist or append table". At least this is the case for file source and JDBC source.
   
   The next problem is, how to implement "create table if not exist or append table" with ds v2 APIs. I have 2 proposals:
   1. keep the "catalog -> table -> write builder -> write", and implementation has 2 steps: a) create the table if not exists. b) do a normal append.
   2. slightly change the abstraction to "catalog -> staged table -> write builder -> write", so that we can write data to a non-existing table, and make the entire process atomic.
   
   For proposal 1, file source doesn't work because it can't create an empty table(it doesn't have metastore). I guess other data source will face the same issue. And it requires the catalog API, which is not done yet.
   
   I think proposal 2 is better. It's useful even after we have the catalog API, to implement atomic CTAS.
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org