You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "eugen yushin (JIRA)" <ji...@apache.org> on 2017/07/17 14:38:00 UTC

[jira] [Created] (SPARK-21442) Spark CSV writer trims trailing spaces

eugen yushin created SPARK-21442:
------------------------------------

             Summary: Spark CSV writer trims trailing spaces
                 Key: SPARK-21442
                 URL: https://issues.apache.org/jira/browse/SPARK-21442
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
    Affects Versions: 2.1.1
         Environment: Version 2.1.0-mapr-1703
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

            Reporter: eugen yushin


Looks like Spark truncates trailing spaces saving data with csv codec. Check the following example for more details (note extra space at the end of "Johny " field):
{code}
scala> case class SampleRow(field1: String, field2: String)
defined class SampleRow

scala> val fooDS = Seq(SampleRow("Johny ", "Doe"), SampleRow("Ivan", "Susanin")).toDS()
fooDS: org.apache.spark.sql.Dataset[SampleRow] = [field1: string, field2: string]

scala> fooDS.collect.foreach(println)
SampleRow(Johny ,Doe)
SampleRow(Ivan,Susanin)

scala> fooDS.show()
+------+-------+
|field1| field2|
+------+-------+
|Johny |    Doe|
|  Ivan|Susanin|
+------+-------+

scala> import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.SaveMode

scala> fooDS.write.option("delimiter", "|").mode(SaveMode.Overwrite).csv("file:///tmp/spaces.txt")

cat /tmp/spaces.txt/*
Johny|Doe
Ivan|Susanin
{code}

I expect space before the pipe at the first line in output file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org