You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "eugen yushin (JIRA)" <ji...@apache.org> on 2017/07/17 14:38:00 UTC
[jira] [Created] (SPARK-21442) Spark CSV writer trims trailing
spaces
eugen yushin created SPARK-21442:
------------------------------------
Summary: Spark CSV writer trims trailing spaces
Key: SPARK-21442
URL: https://issues.apache.org/jira/browse/SPARK-21442
Project: Spark
Issue Type: Bug
Components: Input/Output
Affects Versions: 2.1.1
Environment: Version 2.1.0-mapr-1703
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
Reporter: eugen yushin
Looks like Spark truncates trailing spaces saving data with csv codec. Check the following example for more details (note extra space at the end of "Johny " field):
{code}
scala> case class SampleRow(field1: String, field2: String)
defined class SampleRow
scala> val fooDS = Seq(SampleRow("Johny ", "Doe"), SampleRow("Ivan", "Susanin")).toDS()
fooDS: org.apache.spark.sql.Dataset[SampleRow] = [field1: string, field2: string]
scala> fooDS.collect.foreach(println)
SampleRow(Johny ,Doe)
SampleRow(Ivan,Susanin)
scala> fooDS.show()
+------+-------+
|field1| field2|
+------+-------+
|Johny | Doe|
| Ivan|Susanin|
+------+-------+
scala> import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.SaveMode
scala> fooDS.write.option("delimiter", "|").mode(SaveMode.Overwrite).csv("file:///tmp/spaces.txt")
cat /tmp/spaces.txt/*
Johny|Doe
Ivan|Susanin
{code}
I expect space before the pipe at the first line in output file.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org