You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/07/06 21:40:00 UTC

[jira] [Assigned] (SPARK-39689) Support 2-chars lineSep in CSV datasource

     [ https://issues.apache.org/jira/browse/SPARK-39689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-39689:
------------------------------------

    Assignee:     (was: Apache Spark)

> Support 2-chars lineSep in CSV datasource
> -----------------------------------------
>
>                 Key: SPARK-39689
>                 URL: https://issues.apache.org/jira/browse/SPARK-39689
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Yaohua Zhao
>            Priority: Major
>
> Univocity parser allows to set line separator to 1 to 2 characters ([code|https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/Format.java#L103]), CSV options should not block this usage ([code|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala#L218]).
>  
> Due to the limitation around the `normalizedNewLine` (https://github.com/uniVocity/univocity-parsers/issues/170), setting 2 chars as a line separator could cause some weird/bad behaviors. Thus, we probably should leave this proposed fix as an undocumented feature and warn users to do this.
>  
> A more proper fix could be further investigated in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org