You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew Ash (JIRA)" <ji...@apache.org> on 2016/08/24 19:46:22 UTC

[jira] [Commented] (SPARK-17227) Allow configuring record delimiter in csv

    [ https://issues.apache.org/jira/browse/SPARK-17227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435539#comment-15435539 ] 

Andrew Ash commented on SPARK-17227:
------------------------------------

Rob and I work together, and we've seen datasets in mostly-CSV format that have non-standard record delimiters ('\0' character for instance).

For some broader context, we've created our own CSV text parser and use that in all our various internal products that use Spark, but would like to contribute this additional flexibility back to the Spark community at large and in the process eliminate the need for our internal CSV datasource.

Here are the tickets Rob just opened that we would require to eliminate our internal CSV datasource:

SPARK-17222
SPARK-17224
SPARK-17225
SPARK-17226
SPARK-17227

The basic question then, is would the Spark community accept patches that extend Spark's CSV parser to cover these features?  We're willing to write the code and get the patches through code review, but would rather know up front if these changes would never be accepted into mainline Spark due to philosophical disagreements around what Spark's CSV datasource should be.

> Allow configuring record delimiter in csv
> -----------------------------------------
>
>                 Key: SPARK-17227
>                 URL: https://issues.apache.org/jira/browse/SPARK-17227
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Robert Kruszewski
>            Priority: Minor
>
> Instead of hard coded "\n"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org