You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "eugen yushin (JIRA)" <ji...@apache.org> on 2017/07/17 14:46:00 UTC

[jira] [Comment Edited] (SPARK-21442) Spark CSV writer trims trailing spaces

    [ https://issues.apache.org/jira/browse/SPARK-21442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089927#comment-16089927 ] 

eugen yushin edited comment on SPARK-21442 at 7/17/17 2:45 PM:
---------------------------------------------------------------

Agree, thanks for quick response. Feel free to close as duplicate


was (Author: eyushin):
Agree, thanks for quick response 

> Spark CSV writer trims trailing spaces
> --------------------------------------
>
>                 Key: SPARK-21442
>                 URL: https://issues.apache.org/jira/browse/SPARK-21442
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 2.1.0, 2.1.1
>         Environment: version 2.1.0-mapr-1703
> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
> and 
> version 2.1.1
> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
>            Reporter: eugen yushin
>
> Looks like Spark truncates trailing spaces saving data with csv codec. Check the following example for more details (note extra space at the end of "Johny " field):
> {code}
> scala> case class SampleRow(field1: String, field2: String)
> defined class SampleRow
> scala> val fooDS = Seq(SampleRow("Johny ", "Doe"), SampleRow("Ivan", "Susanin")).toDS()
> fooDS: org.apache.spark.sql.Dataset[SampleRow] = [field1: string, field2: string]
> scala> fooDS.collect.foreach(println)
> SampleRow(Johny ,Doe)
> SampleRow(Ivan,Susanin)
> scala> fooDS.show()
> +------+-------+
> |field1| field2|
> +------+-------+
> |Johny |    Doe|
> |  Ivan|Susanin|
> +------+-------+
> scala> import org.apache.spark.sql.SaveMode
> import org.apache.spark.sql.SaveMode
> scala> fooDS.write.option("delimiter", "|").mode(SaveMode.Overwrite).csv("file:///tmp/spaces.txt")
> cat /tmp/spaces.txt/*
> Johny|Doe
> Ivan|Susanin
> {code}
> I expect space before the pipe at the first line in output file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org