You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Guo Wei (Jira)" <ji...@apache.org> on 2021/12/08 06:03:00 UTC

[jira] [Comment Edited] (SPARK-37575) Empty strings and null values are both saved as quoted empty Strings "" rather than "" (for empty strings) and nothing(for null values)

    [ https://issues.apache.org/jira/browse/SPARK-37575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454974#comment-17454974 ] 

Guo Wei edited comment on SPARK-37575 at 12/8/21, 6:02 AM:
-----------------------------------------------------------

As default writerSettings in CSVOptions,  nullValue is "",   emptyValueInWrite is "\"\""
{code:java}
val nullValue = parameters.getOrElse("nullValue", "")
val emptyValue = parameters.get("emptyValue")
val emptyValueInWrite = emptyValue.getOrElse("\"\"")
writerSettings.setNullValue(nullValue)
writerSettings.setEmptyValue(emptyValueInWrite) {code}
but the final result is not expected?


was (Author: wayne guo):
As default writerSettings in CSVOptions,  nullValue is "",   emptyValueInWrite is "\"\""

 
{code:java}
val nullValue = parameters.getOrElse("nullValue", "")
val emptyValue = parameters.get("emptyValue")
val emptyValueInWrite = emptyValue.getOrElse("\"\"")
writerSettings.setNullValue(nullValue)
writerSettings.setEmptyValue(emptyValueInWrite) {code}
but the final result is not expected?

 

> Empty strings and null values are both saved as quoted empty Strings "" rather than "" (for empty strings) and nothing(for null values)
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-37575
>                 URL: https://issues.apache.org/jira/browse/SPARK-37575
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Guo Wei
>            Priority: Major
>
> As mentioned in sql migration guide([https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24]),
> {noformat}
> Since Spark 2.4, empty strings are saved as quoted empty strings "". In version 2.3 and earlier, empty strings are equal to null values and do not reflect to any characters in saved CSV files. For example, the row of "a", null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as a,,"",1. To restore the previous behavior, set the CSV option emptyValue to empty (not quoted) string.{noformat}
>  
> But actually, both empty strings and null values are saved as quoted empty Strings "" rather than "" (for empty strings) and nothing(for null values)。
> code:
> {code:java}
> val data = List("spark", null, "").toDF("name")
> data.coalesce(1).write.csv("spark_csv_test")
> {code}
>  actual result:
> {noformat}
> line1: spark
> line2: ""
> line3: ""{noformat}
> expected result:
> {noformat}
> line1: spark
> line2: 
> line3: ""
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org