You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/19 02:34:57 UTC

[GitHub] cloud-fan edited a comment on issue #23829: [SPARK-26915][SQL]File source should write without schema validation in DataFrameWriter.save()

cloud-fan edited a comment on issue #23829: [SPARK-26915][SQL]File source should write without schema validation in DataFrameWriter.save()
URL: https://github.com/apache/spark/pull/23829#issuecomment-464954983

@rdblue there are 2 problems here
1. file source should not have schema validation during write
2. file source can't report schema during write, if the output path doesn't exist

For 1, I think we can introduce a new trait(or capability API) to indicate that a data source doesn't need schema validation during write.

For 2, I think we need the CTAS(and RTAS) operator.

One thing to note, `DataFrameWriter` API can mix data and metadata operations. e.g. `df.mode("append")` can append data to a non-existing table, with CTAS semantic. I can't find a corresponding SQL operator, maybe we need to create a new one like `CREATE OR INSERT TABLE ... AS SELECT`.

How would the ongoing catalog API proposal solve this issue?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org