You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/03/26 14:09:41 UTC

[jira] [Comment Edited] (SPARK-14726) Support for sampling when inferring schema in CSV data source

    [ https://issues.apache.org/jira/browse/SPARK-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247378#comment-15247378 ] 

Hyukjin Kwon edited comment on SPARK-14726 at 3/26/17 2:09 PM:
---------------------------------------------------------------

This is currently not supported. I will work on this if it is decided to be supported. [~rxin]


was (Author: hyukjin.kwon):
This is currently not supported. I can work on this but I feel a bit hesitating because I believe CSV data source is ported mainly for "small data world". But I believe there are a lot of users dealing with large CSV files. 
I will work on this if it is decided to be supported. [~rxin]

> Support for sampling when inferring schema in CSV data source
> -------------------------------------------------------------
>
>                 Key: SPARK-14726
>                 URL: https://issues.apache.org/jira/browse/SPARK-14726
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Bomi Kim
>
> Currently, I am using CSV data source and trying to get used to Spark 2.0 because it has built-in CSV data source.
> I realized that CSV data source infers schema with all the data. JSON data source supports sampling ratio option.
> It would be great if CSV data source has this option too (or is this supported already?).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org