You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2023/02/07 12:22:00 UTC

[jira] [Commented] (SPARK-42373) Remove unused blank line removal from CSVExprUtils

    [ https://issues.apache.org/jira/browse/SPARK-42373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685268#comment-17685268 ] 

Apache Spark commented on SPARK-42373:
--------------------------------------

User 'ted-jenks' has created a pull request for this issue:
https://github.com/apache/spark/pull/39927

> Remove unused blank line removal from CSVExprUtils
> --------------------------------------------------
>
>                 Key: SPARK-42373
>                 URL: https://issues.apache.org/jira/browse/SPARK-42373
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.1
>            Reporter: Willi Raschkowski
>            Priority: Minor
>
> The non-multiline CSV read codepath contains references to removal of blank lines throughout. This is not necessary as blank lines are removed by the parser. Furthermore, it causes confusion, indicating that blank lines are removed at this point when instead they are already omitted from the data. The multiline code-path does not explicitly remove blank lines leading to what looks like disparity in behavior between the two.
> The codepath for {{DataFrameReader.csv(dataset: Dataset[String])}} does need to explicitly skip lines, and this should be respected in {{CSVUtils}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org