You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Zak, Richard [USA]" <za...@bah.com> on 2009/01/22 15:28:05 UTC

Hadoop with many input/output files?

I am seeing the MultiFileInputFormat and the MultipleOutputFormat
Input/Output formats for the Job configuration.  How can I properly use
them?  I had previously used the default Input and Output Format types,
which for my PDF concatenation project, merely reduced Hadoop to a
scheduler.
 
The idea is per directory, to concatenate all PDFs in said directory to
one PDF, and for this I'm using iText.
 
How can I use these Format types?  What would be in my input into the
mapper and what would my InputKeyValue and OutputKeyValue classes be?
Thank you!  I can't find documentation on these other than the Javadoc,
which doesn't help much.
 
Richard J. Zak

Re: Hadoop with many input/output files?

Posted by Mark Kerzner <ma...@gmail.com>.
I have a very similar question: how do I recursively list all files in a
given directory, to the end that all files are processed by MapReduce? If I
just copy them to the output, let's say, is there any problem dropping them
all in the same output directory in HDFS? To use a bad example, Windows
chokes on many files in one directory.
Thank you,
Mark

On Thu, Jan 22, 2009 at 8:28 AM, Zak, Richard [USA] <za...@bah.com>wrote:

> I am seeing the MultiFileInputFormat and the MultipleOutputFormat
> Input/Output formats for the Job configuration.  How can I properly use
> them?  I had previously used the default Input and Output Format types,
> which for my PDF concatenation project, merely reduced Hadoop to a
> scheduler.
>
> The idea is per directory, to concatenate all PDFs in said directory to
> one PDF, and for this I'm using iText.
>
> How can I use these Format types?  What would be in my input into the
> mapper and what would my InputKeyValue and OutputKeyValue classes be?
> Thank you!  I can't find documentation on these other than the Javadoc,
> which doesn't help much.
>
> Richard J. Zak
>