You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/07/10 01:42:00 UTC
[jira] [Resolved] (SPARK-21356) CSV datasource failed to parse a
value having newline in its value
[ https://issues.apache.org/jira/browse/SPARK-21356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-21356.
----------------------------------
Resolution: Invalid
I am resolving this as the workaround looks so easy and I am not sure if it makes sense to allow newline in its value without quotes for now.
> CSV datasource failed to parse a value having newline in its value
> ------------------------------------------------------------------
>
> Key: SPARK-21356
> URL: https://issues.apache.org/jira/browse/SPARK-21356
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: Hyukjin Kwon
> Priority: Trivial
>
> This is related with SPARK-21355. I guess this is also a rather corner case. I found this during testing SPARK-21289.
> It looks a bug in Univocity.
> The codes below failed to parse newline in the value.
> {code}
> scala> spark.read.csv(Seq("a\nb", "abc").toDS).show()
> +---+
> |_c0|
> +---+
> | a|
> |abc|
> +---+
> {code}
> But working around can be easily done with quotes as below:
> {code}
> scala> spark.read.csv(Seq("\"a\nb\"", "abc").toDS).show()
> +---+
> |_c0|
> +---+
> |a
> b|
> |abc|
> +---+
> {code}
> Meaning this works:
> with the file below:
> {code}
> "a
> b",abc
> {code}
> {code}
> scala> spark.read.option("multiLine", true).csv("tmp.csv").show()
> +---+---+
> |_c0|_c1|
> +---+---+
> |a
> b|abc|
> +---+---+
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org