You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Dan Tamowski <ta...@gmail.com> on 2008/03/07 16:14:15 UTC

Custom Input Formats

Hello,

First, I am currently subscribed to the digest, could you please cc me at
tamowski.d@gmail.com with any replies. I really appreciate it.

I have a few questions regarding input formats. Specifically, I want to use
one complete text file per input format. I understand that I must implement
both FileInputFormat and and RecordReader. From there, however, I am not
sure what to do. Can I include these in my MR project or do I need to keep
them in a separate jar and reference that in HADOOP-CLASSPATH? Also should
HADOOP-CLASSPATH point to a directory of jars or does it mimic the
space-delimited manifest.mf? Finally, are there any examples of user-defined
input formats available anywhere?

Thanks,

Dan

Re: Custom Input Formats

Posted by Amar Kamat <am...@yahoo-inc.com>.
On Fri, 7 Mar 2008, Dan Tamowski wrote:

> Hello,
>
> First, I am currently subscribed to the digest, could you please cc me at
> tamowski.d@gmail.com with any replies. I really appreciate it.
>
> I have a few questions regarding input formats. Specifically, I want to use
> one complete text file per input format. I understand that I must implement
> both FileInputFormat and and RecordReader. From there, however, I am not
> sure what to do.
In your jobconf, set the input format to the one you wrote before 
submitting the job to the job client.
> Can I include these in my MR project or do I need to keep
> them in a separate jar and reference that in HADOOP-CLASSPATH?
You can keep it separate. You need InputFormat in the class where you 
configure your job and run the job using the job client. For example see 
how examples are run and see how RandomWriter is implemented (see 
src/examples/org/apache/hadoop/examples).
> Also should HADOOP-CLASSPATH point to a directory of jars or does it mimic the
> space-delimited manifest.mf?
Job jars are not a part of hadoop system. The job jars are passed to 
HADOOP from the command line (-jar <jarfile>).
> Finally, are there any examples of user-defined input formats available 
> anywhere?
See RandomWriter#RandomInputFormat.
Amar
>
> Thanks,
>
> Dan
>