You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/07/10 01:42:00 UTC

[jira] [Resolved] (SPARK-21356) CSV datasource failed to parse a value having newline in its value

     [ https://issues.apache.org/jira/browse/SPARK-21356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-21356.
----------------------------------
    Resolution: Invalid

I am resolving this as the workaround looks so easy and I am not sure if it makes sense to allow newline in its value without quotes for now.

> CSV datasource failed to parse a value having newline in its value
> ------------------------------------------------------------------
>
>                 Key: SPARK-21356
>                 URL: https://issues.apache.org/jira/browse/SPARK-21356
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Hyukjin Kwon
>            Priority: Trivial
>
> This is related with SPARK-21355. I guess this is also a rather corner case. I found this during testing SPARK-21289.
> It looks a bug in Univocity.
> The codes below failed to parse newline in the value.
> {code}
> scala> spark.read.csv(Seq("a\nb", "abc").toDS).show()
> +---+
> |_c0|
> +---+
> |  a|
> |abc|
> +---+
> {code}
> But working around can be easily done with quotes as below:
> {code}
> scala> spark.read.csv(Seq("\"a\nb\"", "abc").toDS).show()
> +---+
> |_c0|
> +---+
> |a
> b|
> |abc|
> +---+
> {code}
> Meaning this works:
> with the file below:
> {code}
> "a
> b",abc
> {code}
> {code}
> scala> spark.read.option("multiLine", true).csv("tmp.csv").show()
> +---+---+
> |_c0|_c1|
> +---+---+
> |a
> b|abc|
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org