You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/07/24 08:26:20 UTC
[jira] [Resolved] (SPARK-16410) DataFrameWriter's jdbc method drops
table in overwrite mode
[ https://issues.apache.org/jira/browse/SPARK-16410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-16410.
-------------------------------
Resolution: Duplicate
> DataFrameWriter's jdbc method drops table in overwrite mode
> -----------------------------------------------------------
>
> Key: SPARK-16410
> URL: https://issues.apache.org/jira/browse/SPARK-16410
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.1, 1.6.2
> Reporter: Ian Hellstrom
>
> According to the [API documentation|http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter], the write mode {{overwrite}} should _overwrite the existing data_, which suggests that the data is removed, i.e. the table is truncated.
> However, that is now what happens in the [source code|https://github.com/apache/spark/blob/0ad6ce7e54b1d8f5946dde652fa5341d15059158/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L421]:
> {code}
> if (mode == SaveMode.Overwrite && tableExists) {
> JdbcUtils.dropTable(conn, table)
> tableExists = false
> }
> {code}
> This clearly shows that the table is first dropped and then recreated. This causes two major issues:
> * Existing indexes, partitioning schemes, etc. are completely lost.
> * The case of identifiers may be changed without the user understanding why.
> In my opinion, the table should be truncated, not dropped. Overwriting data is a DML operation and should not cause DDL.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org