You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Eric Liang (JIRA)" <ji...@apache.org> on 2016/11/22 21:02:58 UTC

[jira] [Created] (SPARK-18544) Append with df.saveAsTable writes data to wrong location

Eric Liang created SPARK-18544:
----------------------------------

             Summary: Append with df.saveAsTable writes data to wrong location
                 Key: SPARK-18544
                 URL: https://issues.apache.org/jira/browse/SPARK-18544
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Eric Liang


When using saveAsTable in append mode, data will be written to the wrong location for non-managed Datasource tables. The following example illustrates this.

It seems somehow pass the wrong table path to InsertIntoHadoopFsRelation from DataFrameWriter. Also, we should probably remove the repair table call at the end of saveAsTable in DataFrameWriter. That shouldn't be needed in either the Hive or Datasource case.

{code}
scala> spark.sqlContext.range(10000).selectExpr("id", "id as A", "id as B").write.partitionBy("A", "B").mode("overwrite").parquet("/tmp/test_10k")

scala> sql("msck repair table test_10k")

scala> sql("select * from test_10k where A = 1").count
res6: Long = 1

scala> spark.sqlContext.range(10).selectExpr("id", "id as A", "id as B").write.partitionBy("A", "B").mode("append").parquet("/tmp/test_10k")

scala> sql("select * from test_10k where A = 1").count
res8: Long = 1
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org