You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Robert V (JIRA)" <ji...@apache.org> on 2019/01/21 17:46:00 UTC
[jira] [Created] (SPARK-26678) Empty values end up as quoted empty
strings in CSV files
Robert V created SPARK-26678:
--------------------------------
Summary: Empty values end up as quoted empty strings in CSV files
Key: SPARK-26678
URL: https://issues.apache.org/jira/browse/SPARK-26678
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.4.0
Reporter: Robert V
h1. Problem statement
Empty string values were written as unquoted strings prior Spark version 2.4.0.
From version 2.4.0 empty string values end up as "" values in CSV files which is a problem if an application was expected to not wrap empty values in quotes (which is certainly a case if the CSV is intended to be used in Microsoft PowerBI for example as it doesn't handle CSV files with double quotes).
The following code ends up with the following results in the different versions of Spark:
||Spark version||Code||Result||
|2.3.0|{code:java}
val df = List("aa", "", "bb").toDF("name")
df.coalesce(1).write.option("header", "true").csv("/23.csv")
{code}|{noformat}
name
aa
bb
{noformat}|
|2.4.0|{code:java}
val df = List("aa", "", "bb").toDF("name")
df.coalesce(1).write.option("header", "true").csv("/24.csv")
{code}|{noformat}
name
aa
""
bb
{noformat}|
|2.4.0|{code:java}
val df = List("aa", "", "bb").toDF("name")
df.coalesce(1).write.option("header", "true").option("quote", "").csv("/24-2.csv")
{code}|{noformat}
name
aa
""
bb
{noformat}|
If the intention was to produce standard-looking CSV files (even though CSV standard doesn't exists) we still need a way to disable automatic quoting.
h1. Proposed solution
Using the option
{code:java}
option("quote", "")
{code}
should disable quotes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org