You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2016/11/29 05:57:58 UTC

[jira] [Resolved] (SPARK-18544) Append with df.saveAsTable writes data to wrong location

     [ https://issues.apache.org/jira/browse/SPARK-18544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin resolved SPARK-18544.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.1.0

> Append with df.saveAsTable writes data to wrong location
> --------------------------------------------------------
>
>                 Key: SPARK-18544
>                 URL: https://issues.apache.org/jira/browse/SPARK-18544
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Eric Liang
>            Priority: Blocker
>             Fix For: 2.1.0
>
>
> When using saveAsTable in append mode, data will be written to the wrong location for non-managed Datasource tables. The following example illustrates this.
> It seems somehow pass the wrong table path to InsertIntoHadoopFsRelation from DataFrameWriter. Also, we should probably remove the repair table call at the end of saveAsTable in DataFrameWriter. That shouldn't be needed in either the Hive or Datasource case.
> {code}
> scala> spark.sqlContext.range(100).selectExpr("id", "id as A", "id as B").write.partitionBy("A", "B").mode("overwrite").parquet("/tmp/test")
> scala> sql("create table test (id long, A int, B int) USING parquet OPTIONS (path '/tmp/test') PARTITIONED BY (A, B)")
> scala> sql("msck repair table test")
> scala> sql("select * from test where A = 1").count
> res6: Long = 1
> scala> spark.sqlContext.range(10).selectExpr("id", "id as A", "id as B").write.partitionBy("A", "B").mode("append").saveAsTable("test")
> scala> sql("select * from test where A = 1").count
> res8: Long = 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org