You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Manoj Samel <ma...@gmail.com> on 2014/01/19 06:47:19 UTC

Which of the hadoop file formats are supported by Spark ?

Hadoop ecosystem has various file formats besides text, e.g. seq, RC and
others like parquet files for efficient columnar storage.

Which of these are supported by spark ?

Thanks,

Re: Which of the hadoop file formats are supported by Spark ?

Posted by Tathagata Das <ta...@gmail.com>.
Spark was built using the standard Hadoop libraries of InputFormat and
OutputFormat, so any InputFormat and OutputFormat should ideally be
supported. Besides the simplified interfaces for text files
(sparkContext.textFile(...)
) and seq file (sparkContext.sequenceFile(...) ), you can specify your own
InputFormat and OutputFormat in sparkContext.hadoopFile(...). As suggested
in the first response, checkout the API.

TD


On Sat, Jan 18, 2014 at 10:16 PM, Ankur Chauhan <ac...@brightcove.com>wrote:

> You may also want to consider Parquet (http://parquet.io). It is pretty
> efficient http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/
>
> -- Ankur Chauhan

Re: Which of the hadoop file formats are supported by Spark ?

Posted by Ankur Chauhan <ac...@brightcove.com>.
You may also want to consider Parquet (http://parquet.io). It is pretty efficient http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/

-- Ankur Chauhan 

Re: Which of the hadoop file formats are supported by Spark ?

Posted by Nan Zhu <zh...@gmail.com>.
Hi,

text and seq are definitely supported

you can check
http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.SparkContext

I don't think other types have been considered...anyone correct me?

Best,

Nan



On Sun, Jan 19, 2014 at 12:47 AM, Manoj Samel <ma...@gmail.com>wrote:

> Hadoop ecosystem has various file formats besides text, e.g. seq, RC and
> others like parquet files for efficient columnar storage.
>
> Which of these are supported by spark ?
>
> Thanks,
>