You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by David Thomas <dt...@gmail.com> on 2014/02/04 21:54:50 UTC

RDD of binary files

I have a set of binary files and I would like to create an RDD out of them
and pipe them through an external process. So how do I create an RDD of
such objects? For quick prototyping, can I do it without using HDFS?

Re: RDD of binary files

Posted by Nick Pentreath <ni...@gmail.com>.

You should be able to use a custom Hadoop file:

sc.newAPIHadoopFile(...)

Use FileInputFormat with longWritable as the key class and BinaryWritable as the value class.

This will read the files from an input directory which can be a local file system for testing.

Take a look at the code for sc.textFile to see how it gets set up with the inputFormat and writable classes if you get stuck.

—
Sent from Mailbox for iPhone

On Tue, Feb 4, 2014 at 10:55 PM, David Thomas <dt...@gmail.com> wrote:

> I have a set of binary files and I would like to create an RDD out of them
> and pipe them through an external process. So how do I create an RDD of
> such objects? For quick prototyping, can I do it without using HDFS?