You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Florian FERREIRA (Jira)" <ji...@apache.org> on 2023/03/03 11:17:00 UTC
[jira] [Created] (SPARK-42661) CSV Reader - multiline without quoted fields
Florian FERREIRA created SPARK-42661:
----------------------------------------
Summary: CSV Reader - multiline without quoted fields
Key: SPARK-42661
URL: https://issues.apache.org/jira/browse/SPARK-42661
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.3.1
Environment: unquoted data
{code}
NAME,Address,CITY
Atlassian,Level 6 341 George Street
Sydney NSW 2000 Australia,Sydney
Github,88 Colin P Kelly Junior Street
San Francisco CA 94107 USA,San Francisco
{code}
quoted data :
{code}
"NAME","Address","CITY"
"Atlassian","Level 6 341 George Street
Sydney NSW 2000 Australia","Sydney"
"Github","88 Colin P Kelly Junior Street
San Francisco CA 94107 USA","San Francisco"
{code}
Reporter: Florian FERREIRA
Hello,
We are facing an issue with the CSV format.
When we try to read a "multiline file without quoted fields" the expected result is not good.
With quoted fields, all is ok. ( cf the screenshot )
You can reproduce it easily with this code (just replace file path ) :
{code:java}
spark.read.options(Map(
"multiline" -> "true",
"quote" -> "",
"header" -> "true",
)).csv("/Users/fferreira/correct_multiline.csv").show(false)
spark.read.options(Map(
"multiline" -> "true",
"header" -> "true", )).csv("/Users/fferreira/correct_multiline_with_quote.csv").show(false)
{code}
We continue to investigate on our side.
Thanks you.
!image-2023-03-03-12-11-21-258.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org