You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Something Something <ma...@gmail.com> on 2009/12/21 18:42:49 UTC

InputFormat related question...

In my application I have a file in this format:

The first line of the file contains the data to be processed, and *each* of
the remaining lines contain parameters that will be used to slice & dice the
data in various ways.  In other words, each mapper needs two lines - the 1st
line from this file that contains data and another line that contains
parameters.

I looked at NLineInputFormat which can be used for "parameter sweeps", but
it's not quite what I want.  I believe this format returns N no. of
consecutive lines to the mapper, correct?

What's the best way to handle this case?  Do I have to write a special
InputFormat class?  Please help.  Thanks.

Re: InputFormat related question...

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Hi,

If you want map task to process two lines at a time, you need to write a RecordReader which constructs two lines per record. LineRecordReader makes one line as one record.
You can extend NLineInputFormat for generating splits and return your new RecordReader for reading records from split.
Hope this helps you.

Thanks
Amareshwari

On 12/21/09 11:12 PM, "Something Something" <ma...@gmail.com> wrote:

In my application I have a file in this format:

The first line of the file contains the data to be processed, and *each* of
the remaining lines contain parameters that will be used to slice & dice the
data in various ways.  In other words, each mapper needs two lines - the 1st
line from this file that contains data and another line that contains
parameters.

I looked at NLineInputFormat which can be used for "parameter sweeps", but
it's not quite what I want.  I believe this format returns N no. of
consecutive lines to the mapper, correct?

What's the best way to handle this case?  Do I have to write a special
InputFormat class?  Please help.  Thanks.