You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Florian FERREIRA (Jira)" <ji...@apache.org> on 2023/03/03 11:17:00 UTC

[jira] [Created] (SPARK-42661) CSV Reader - multiline without quoted fields

Florian FERREIRA created SPARK-42661:
----------------------------------------

             Summary: CSV Reader - multiline without quoted fields
                 Key: SPARK-42661
                 URL: https://issues.apache.org/jira/browse/SPARK-42661
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.3.1
         Environment: unquoted data
{code}
NAME,Address,CITY
Atlassian,Level 6 341 George Street
Sydney NSW 2000 Australia,Sydney
Github,88 Colin P Kelly Junior Street
San Francisco CA 94107 USA,San Francisco
{code}

quoted data : 
{code}
"NAME","Address","CITY"
"Atlassian","Level 6 341 George Street
Sydney NSW 2000 Australia","Sydney"
"Github","88 Colin P Kelly Junior Street
San Francisco CA 94107 USA","San Francisco"
{code}
            Reporter: Florian FERREIRA


Hello,

We are facing an issue with the CSV format.
When we try to read a "multiline file without quoted fields" the expected result is not good.

With quoted fields, all is ok. ( cf the screenshot ) 

You can reproduce it easily with this code (just replace file path ) :
{code:java}
spark.read.options(Map(
        "multiline" -> "true",
        "quote" -> "",
        "header" -> "true",
      )).csv("/Users/fferreira/correct_multiline.csv").show(false)

spark.read.options(Map(
        "multiline" -> "true",
        "header" -> "true",      )).csv("/Users/fferreira/correct_multiline_with_quote.csv").show(false)
{code}
We continue to investigate on our side.

Thanks you.

!image-2023-03-03-12-11-21-258.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org