You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Furcy Pin (Jira)" <ji...@apache.org> on 2020/05/07 10:38:00 UTC

[jira] [Created] (SPARK-31657) CSV Writer writes no header for empty DataFrames

Furcy Pin created SPARK-31657:
---------------------------------

             Summary: CSV Writer writes no header for empty DataFrames
                 Key: SPARK-31657
                 URL: https://issues.apache.org/jira/browse/SPARK-31657
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
    Affects Versions: 2.4.1
         Environment: Local pyspark 2.41
            Reporter: Furcy Pin


When writing a DataFrame as csv with the Header option set to true,
the header is not written when the DataFrame is empty.

This creates failures for processes that read the csv back.

Example (please notice the limit(0) in the second example):
```

 
{code:java}
Welcome to
 ____ __
 / __/__ ___ _____/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /__ / .__/\_,_/_/ /_/\_\ version 2.4.1
 /_/
Using Python version 2.7.17 (default, Nov 7 2019 10:07:09)
SparkSession available as 'spark'.
>>> df1 = spark.sql("SELECT 1 as a")
>>> df1.limit(1).write.mode("OVERWRITE").option("Header", True).csv("data/test/csv")
>>> spark.read.option("Header", True).csv("data/test/csv").show()
+---+
| a|
+---+
| 1|
+---+
>>> 
>>> df1.limit(0).write.mode("OVERWRITE").option("Header", True).csv("data/test/csv")
>>> spark.read.option("Header", True).csv("data/test/csv").show()
++
||
++
++
{code}
 


Expected behavior:
{code:java}
>>> df1.limit(0).write.mode("OVERWRITE").option("Header", True).csv("data/test/csv")
>>> spark.read.option("Header", True).csv("data/test/csv").show()
+---+
| a|
+---+
+---+{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org