You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Harsha HN <99...@gmail.com> on 2014/09/25 18:44:58 UTC

Working on LZOP Files

Hi,

Anybody using LZOP files to process in Spark?

We have a huge volume of LZOP files in HDFS to process through Spark. In
MapReduce framework, it automatically detects the file format and sends the
decompressed version to Mappers.
Any such support in Spark?
As of now I am manually downloading, decompressing it before processing.

Thanks,
Harsha

Re: Working on LZOP Files

Posted by Andrew Ash <an...@andrewash.com>.

Hi Harsha,

I use LZOP files extensively on my Spark cluster -- see my writeup for how
to do this on this mailing list post:
http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCAOoZ679ehwvT1g8=qHd2n11Z4EXOBJkP+q=Aj0qE_=sHHYLBaA@mail.gmail.com%3E

Maybe we should better document how to use LZO with Spark because it can be
tricky to get the lzo jars, native libraries, and hadoopFile() calls all
set up correctly.

Andrew

On Thu, Sep 25, 2014 at 9:44 AM, Harsha HN <99...@gmail.com> wrote:

> Hi,
>
> Anybody using LZOP files to process in Spark?
>
> We have a huge volume of LZOP files in HDFS to process through Spark. In
> MapReduce framework, it automatically detects the file format and sends the
> decompressed version to Mappers.
> Any such support in Spark?
> As of now I am manually downloading, decompressing it before processing.
>
> Thanks,
> Harsha
>