You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mark Kerzner <ma...@gmail.com> on 2011/03/02 06:45:27 UTC

How to make zip files as Hadoop input

Hi,

I have a bunch of zip files that I want to serve as input to a MapReduce
job. My initial design was to list them in a text file and then give this
list file as input. The list file would be read, and each line would be
handed off to a node to process, which would pick up the corresponding zip
file and work on it.

But I feel that a better design is possible, and that my way is redundant.
Can I just give the input directory as input? How do I make sure each node
gets a file to process?

Thank you,
Mark

Re: How to make zip files as Hadoop input

Posted by Nitin Khandelwal <ni...@germinait.com>.

Hi,
You can actually make your own input format and reader which will read one
file from a directory and give it to a node. If You are using hadoop 0.19
then extending MultiFilesplit format can do this task for you . If you are
using Hadoop 0.20 or greater then your  your inputformat can extend
fileInputformat and yor reader can extend recordreader.
Thanks and Regards,
Nitin

On 2 March 2011 11:15, Mark Kerzner <ma...@gmail.com> wrote:

> Hi,
>
> I have a bunch of zip files that I want to serve as input to a MapReduce
> job. My initial design was to list them in a text file and then give this
> list file as input. The list file would be read, and each line would be
> handed off to a node to process, which would pick up the corresponding zip
> file and work on it.
>
> But I feel that a better design is possible, and that my way is redundant.
> Can I just give the input directory as input? How do I make sure each node
> gets a file to process?
>
> Thank you,
> Mark
>

-- 

Nitin Khandelwal