You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Manoj Babu <ma...@gmail.com> on 2012/07/12 17:03:36 UTC

How to use CombineFileInputFormat in Hadoop?

Gentles,

I want to use the CombineFileInputFormat of Hadoop 0.20.0 / 0.20.2 such
that it processes 1 file per record and also doesn't compromise on data -
locality (which it normally takes care of).

It is mentioned in Tom White's Hadoop Definitive Guide but he has not shown
how to do it. Instead, he moves on to Sequence Files.

I am pretty confused on what is the meaning of processed variable in a
record reader. Any code example would be of tremendous help.

Thanks in advance..
Cheers!
Manoj.

Re: How to use CombineFileInputFormat in Hadoop?

Posted by Harsh J <ha...@cloudera.com>.

Hey Manoj,

I find the asker name here quite strange, although it is the same
question, ha: http://stackoverflow.com/questions/10380200/how-to-use-combinefileinputformat-in-hadoop

Anyhow, here's one example:
http://blog.yetitrails.com/2011/04/dealing-with-lots-of-small-files-in.html

On Thu, Jul 12, 2012 at 8:33 PM, Manoj Babu <ma...@gmail.com> wrote:
> Gentles,
>
> I want to use the CombineFileInputFormat of Hadoop 0.20.0 / 0.20.2 such that
> it processes 1 file per record and also doesn't compromise on data -
> locality (which it normally takes care of).
>
> It is mentioned in Tom White's Hadoop Definitive Guide but he has not shown
> how to do it. Instead, he moves on to Sequence Files.
>
> I am pretty confused on what is the meaning of processed variable in a
> record reader. Any code example would be of tremendous help.
>
> Thanks in advance..
>
> Cheers!
> Manoj.
>



-- 
Harsh J