You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/07/06 21:40:00 UTC
[jira] [Assigned] (SPARK-39689) Support 2-chars lineSep in CSV datasource
[ https://issues.apache.org/jira/browse/SPARK-39689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-39689:
------------------------------------
Assignee: (was: Apache Spark)
> Support 2-chars lineSep in CSV datasource
> -----------------------------------------
>
> Key: SPARK-39689
> URL: https://issues.apache.org/jira/browse/SPARK-39689
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.3.0
> Reporter: Yaohua Zhao
> Priority: Major
>
> Univocity parser allows to set line separator to 1 to 2 characters ([code|https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/Format.java#L103]), CSV options should not block this usage ([code|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala#L218]).
>
> Due to the limitation around the `normalizedNewLine` (https://github.com/uniVocity/univocity-parsers/issues/170), setting 2 chars as a line separator could cause some weird/bad behaviors. Thus, we probably should leave this proposed fix as an undocumented feature and warn users to do this.
>
> A more proper fix could be further investigated in the future.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org