You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "koert kuipers (JIRA)" <ji...@apache.org> on 2016/04/27 20:22:12 UTC

[jira] [Commented] (SPARK-12420) Have a built-in CSV data source implementation

    [ https://issues.apache.org/jira/browse/SPARK-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260656#comment-15260656 ] 

koert kuipers commented on SPARK-12420:
---------------------------------------

hello, i see that the (admittedly somewhat crazy sounding) option "treatEmptyValuesAsNulls" is missing. this used to be in spark-csv. can someone point me to where it is or what the alternative is?

we use this setting together with the setting for "nullValue" so that empty values come in a nulls, and nulls get written back out as empty values. this is very typical behavior that is the default for many other frameworks such as scalding when reading csv files.

so for example a line in a file like this:
{noformat}
a,,5
{noformat}
should become Row("a", null, 5) (this is where the "treatEmptyValuesAsNulls" kicks in)

and going in the other direction Row("a", null, 5) should be written out again as:
{noformat}
a,,5
{noformat}
(this is where "nullValue" kicks in)


> Have a built-in CSV data source implementation
> ----------------------------------------------
>
>                 Key: SPARK-12420
>                 URL: https://issues.apache.org/jira/browse/SPARK-12420
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Reynold Xin
>         Attachments: Built-in CSV datasource in Spark.pdf
>
>
> CSV is the most common data format in the "small data" world. It is often the first format people want to try when they see Spark on a single node. Making this built-in for the most common source can provide a better experience for first-time users.
> We should consider inlining https://github.com/databricks/spark-csv



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org