You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Hyukjin Kwon <gu...@gmail.com> on 2016/02/19 03:25:55 UTC

Ability to auto-detect input data for datasources (by file extension).

Hi all,

I am planning to submit a PR for
https://issues.apache.org/jira/browse/SPARK-8000.

Currently, file format is not detected by the file extension unlike
compression codecs are being detected.

I am thinking of introducing another interface (a function) at
DataSourceRegister just like shortName() at in order to specify possible
file exceptions so that we can detect datasources by file extensions just
like Hadoop does for compression codecs.

Since adding an interface should be carefully done, I want to first ask if
this approach looks appropriate.

Could you please give me some feedback for this?


Thanks!

Re: Ability to auto-detect input data for datasources (by file extension).

Posted by Reynold Xin <rx...@databricks.com>.
Thanks for the email.

Don't make it that complicated. We just want to simplify the common cases
(e.g. csv/parquet), and don't need this to work for everything out there.


On Thu, Feb 18, 2016 at 9:25 PM, Hyukjin Kwon <gu...@gmail.com> wrote:

> Hi all,
>
> I am planning to submit a PR for
> https://issues.apache.org/jira/browse/SPARK-8000.
>
> Currently, file format is not detected by the file extension unlike
> compression codecs are being detected.
>
> I am thinking of introducing another interface (a function) at
> DataSourceRegister just like shortName() at in order to specify possible
> file exceptions so that we can detect datasources by file extensions just
> like Hadoop does for compression codecs.
>
> Since adding an interface should be carefully done, I want to first ask if
> this approach looks appropriate.
>
> Could you please give me some feedback for this?
>
>
> Thanks!
>