You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/07/24 08:26:20 UTC

[jira] [Resolved] (SPARK-16410) DataFrameWriter's jdbc method drops table in overwrite mode

     [ https://issues.apache.org/jira/browse/SPARK-16410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-16410.
-------------------------------
    Resolution: Duplicate

> DataFrameWriter's jdbc method drops table in overwrite mode
> -----------------------------------------------------------
>
>                 Key: SPARK-16410
>                 URL: https://issues.apache.org/jira/browse/SPARK-16410
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1, 1.6.2
>            Reporter: Ian Hellstrom
>
> According to the [API documentation|http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter], the write mode {{overwrite}} should _overwrite the existing data_, which suggests that the data is removed, i.e. the table is truncated. 
> However, that is now what happens in the [source code|https://github.com/apache/spark/blob/0ad6ce7e54b1d8f5946dde652fa5341d15059158/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L421]:
> {code}
> if (mode == SaveMode.Overwrite && tableExists) {
>         JdbcUtils.dropTable(conn, table)
>         tableExists = false
>       }
> {code}
> This clearly shows that the table is first dropped and then recreated. This causes two major issues:
> * Existing indexes, partitioning schemes, etc. are completely lost.
> * The case of identifiers may be changed without the user understanding why.
> In my opinion, the table should be truncated, not dropped. Overwriting data is a DML operation and should not cause DDL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org