You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Yaohua Zhao (Jira)" <ji...@apache.org> on 2022/07/06 03:30:00 UTC

[jira] [Created] (SPARK-39689) Support 2-chars lineSep in CSV datasource

Yaohua Zhao created SPARK-39689:
-----------------------------------

             Summary: Support 2-chars lineSep in CSV datasource
                 Key: SPARK-39689
                 URL: https://issues.apache.org/jira/browse/SPARK-39689
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: Yaohua Zhao


Univocity parser allows to set line separator to 1 to 2 characters ([code|https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/Format.java#L103]), CSV options should not block this usage ([code|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala#L218]).

 

Due to the limitation around the `normalizedNewLine` (https://github.com/uniVocity/univocity-parsers/issues/170), setting 2 chars as a line separator could cause some weird/bad behaviors. Thus, we probably should leave this proposed fix as an undocumented feature and warn users to do this.

 

A more proper fix could be further investigated in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org