You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Heath Abelson (JIRA)" <ji...@apache.org> on 2018/11/13 16:16:00 UTC

[jira] [Created] (SPARK-26040) CSV Row delimiters not consistent between platforms

Heath Abelson created SPARK-26040:
-------------------------------------

             Summary: CSV Row delimiters not consistent between platforms
                 Key: SPARK-26040
                 URL: https://issues.apache.org/jira/browse/SPARK-26040
             Project: Spark
          Issue Type: Bug
          Components: Java API
    Affects Versions: 2.3.0
            Reporter: Heath Abelson


Running a spark job on *nix platforms, only unix style row delimiters (\n) are recognized. When running the job on windows, only windows style delimiters are recognized (\r\n).

The result is that, when trying to read a csv generated my MS excel, on spark running on Linux, extra characters are included in field names and field values that are last on the line.

Ideally, the CSV parser would be able to handle the 2 different flavors of line endings regardless of what platform the job is being run on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org