You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maria Rebelka (JIRA)" <ji...@apache.org> on 2019/07/16 08:58:00 UTC
[jira] [Created] (SPARK-28411) insertInto with overwrite
inconsistent behaviour Python/Scala
Maria Rebelka created SPARK-28411:
-------------------------------------
Summary: insertInto with overwrite inconsistent behaviour Python/Scala
Key: SPARK-28411
URL: https://issues.apache.org/jira/browse/SPARK-28411
Project: Spark
Issue Type: Bug
Components: PySpark, SQL
Affects Versions: 2.4.0, 2.2.1
Reporter: Maria Rebelka
The df.write.mode("overwrite").insertInto("table") has inconsistent behaviour between Scala and Python. In Python, insertInto ignores "mode" parameter and appends by default. Only when changing syntax to df.write.insertInto("table", overwrite=True) we get expected behaviour.
This is a native Spark syntax, expected to be the same between languages... Also, in other write methods, like saveAsTable or write.parquet "mode" seem to be respected.
Reproduce, Python, ignore "overwrite":
{{}}
{code:java}
df = spark.createDataFrame(sc.parallelize([(1, 2),(3,4)]),['i','j'])
# create the table and load data
df.write.saveAsTable("spark_overwrite_issue")
# insert overwrite, expected result - 2 rows
df.write.mode("overwrite").insertInto("spark_overwrite_issue")
spark.sql("select * from spark_overwrite_issue").count()
# result - 4 rows, insert appended data instead of overwrite{code}
Reproduce, Scala, works as expected:
{code:java}
val df = Seq((1, 2),(3,4)).toDF("i","j")
df.write.mode("overwrite").insertInto("spark_overwrite_issue")
spark.sql("select * from spark_overwrite_issue").count()
# result - 2 rows{code}
Tested on Spark 2.2.1 (EMR) and 2.4.0 (Databricks)
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org