You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rahul Tenany <tr...@gmail.com> on 2008/11/15 14:06:48 UTC

NLine Input Format

Hi,   I am writing a Binary Search Tree on Hadoop and for the same i require
to use NLineInputFormat. I'll read n lines at a time, convert the numbers in
each line from string to int and then insert them into the binary tree. Once
the binary tree is made i'll search for elements in it. But even if i set
that input format as NLineInputFormat and set the
mapred.line.input.format.linespermap
to 10, i am able to read only 1 line at the time. Any idea where am i going
wrong? How can i find whether NLineInputFormat is working or not?

I want my program to work for any object that is comparable and not just
integers, so in there any way i can read NObjects at a time?

I am completely stuck. Any help will be appreciated.

Thanks
Rahul

Re: NLine Input Format

Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.
Rahul Tenany wrote:
> Hi Amareshwari,
>     It is in the ToolRunner.run() method that i am setting the 
> FileInputFormat as NLineInputFormat and in the same function i am 
> setting the mapred.line.input.format.linespermap property. Will that 
> not work? How can i overload LineRecordReader, so that it returns the 
> value as N Lines?
>
Setting Configuration in run() method will also work. You have to extend 
LineRecordReader and override method next() to return N lines as value 
instead of 1 line.

Thanks
Amareshwari

> Thanks
> Rahul
>
> On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu 
> <amarsri@yahoo-inc.com <ma...@yahoo-inc.com>> wrote:
>
>     Hi Rahul,
>
>     How did you set the configuration
>     "mapred.line.input.format.linespermap" and your input format? You
>     have to set them in hadoop-site.xml or pass them through -D option
>     to the job.
>     NLineInputFormat will split N lines of input as one split. So,
>     each map gets N lines.
>     But the RecordReader is still LineRecordReader, which reads one
>     line at time, thereby Key is the offset in the file and Value is
>     the line.
>     If you want N lines as Key, you may to override LineRecordReader.
>
>     Thanks
>     Amareshwari
>
>
>     Rahul Tenany wrote:
>
>         Hi,   I am writing a Binary Search Tree on Hadoop and for the
>         same i require
>         to use NLineInputFormat. I'll read n lines at a time, convert
>         the numbers in
>         each line from string to int and then insert them into the
>         binary tree. Once
>         the binary tree is made i'll search for elements in it. But
>         even if i set
>         that input format as NLineInputFormat and set the
>         mapred.line.input.format.linespermap
>         to 10, i am able to read only 1 line at the time. Any idea
>         where am i going
>         wrong? How can i find whether NLineInputFormat is working or not?
>
>         I want my program to work for any object that is comparable
>         and not just
>         integers, so in there any way i can read NObjects at a time?
>
>         I am completely stuck. Any help will be appreciated.
>
>         Thanks
>         Rahul
>
>          
>
>
>


Re: NLine Input Format

Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.
Rahul Tenany wrote:
> Hi Amareshwari,
>     It is in the ToolRunner.run() method that i am setting the 
> FileInputFormat as NLineInputFormat and in the same function i am 
> setting the mapred.line.input.format.linespermap property. Will that 
> not work? How can i overload LineRecordReader, so that it returns the 
> value as N Lines?
>
> Thanks
> Rahul
>
> On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu 
> <amarsri@yahoo-inc.com <ma...@yahoo-inc.com>> wrote:
>
>     Hi Rahul,
>
>     How did you set the configuration
>     "mapred.line.input.format.linespermap" and your input format? You
>     have to set them in hadoop-site.xml or pass them through -D option
>     to the job.
>     NLineInputFormat will split N lines of input as one split. So,
>     each map gets N lines.
>     But the RecordReader is still LineRecordReader, which reads one
>     line at time, thereby Key is the offset in the file and Value is
>     the line.
>     If you want N lines as Key, you may to override LineRecordReader.
>
>     Thanks
>     Amareshwari
>
>
>     Rahul Tenany wrote:
>
>         Hi,   I am writing a Binary Search Tree on Hadoop and for the
>         same i require
>         to use NLineInputFormat. I'll read n lines at a time, convert
>         the numbers in
>         each line from string to int and then insert them into the
>         binary tree. Once
>         the binary tree is made i'll search for elements in it. But
>         even if i set
>         that input format as NLineInputFormat and set the
>         mapred.line.input.format.linespermap
>         to 10, i am able to read only 1 line at the time. Any idea
>         where am i going
>         wrong? How can i find whether NLineInputFormat is working or not?
>
>         I want my program to work for any object that is comparable
>         and not just
>         integers, so in there any way i can read NObjects at a time?
>
>         I am completely stuck. Any help will be appreciated.
>
>         Thanks
>         Rahul
>
>          
>
>
>
One more thing, I don't think you need to use NLineInputFormat for your 
requirement. NLineInputFormat splits N lines as one split, thus each map 
processes N lines. In your application, you don't want each map to 
process just N lines, but you want value as N lines, right? So, you 
should right a new input format extending FileInputFormat and 
getRecordReader should return your new RecordReader implementation. Does 
this make sense?

Thanks
Amareshwari


Re: NLine Input Format

Posted by Rahul Tenany <tr...@gmail.com>.
Hi Amareshwari,    It is in the ToolRunner.run() method that i am setting
the FileInputFormat as NLineInputFormat and in the same function i am
setting the mapred.line.input.format.linespermap property. Will that not
work? How can i overload LineRecordReader, so that it returns the value as N
Lines?

Thanks
Rahul

On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu <
amarsri@yahoo-inc.com> wrote:

> Hi Rahul,
>
> How did you set the configuration "mapred.line.input.format.linespermap"
> and your input format? You have to set them in hadoop-site.xml or pass them
> through -D option to the job.
> NLineInputFormat will split N lines of input as one split. So, each map
> gets N lines.
> But the RecordReader is still LineRecordReader, which reads one line at
> time, thereby Key is the offset in the file and Value is the line.
> If you want N lines as Key, you may to override LineRecordReader.
>
> Thanks
> Amareshwari
>
>
> Rahul Tenany wrote:
>
>> Hi,   I am writing a Binary Search Tree on Hadoop and for the same i
>> require
>> to use NLineInputFormat. I'll read n lines at a time, convert the numbers
>> in
>> each line from string to int and then insert them into the binary tree.
>> Once
>> the binary tree is made i'll search for elements in it. But even if i set
>> that input format as NLineInputFormat and set the
>> mapred.line.input.format.linespermap
>> to 10, i am able to read only 1 line at the time. Any idea where am i
>> going
>> wrong? How can i find whether NLineInputFormat is working or not?
>>
>> I want my program to work for any object that is comparable and not just
>> integers, so in there any way i can read NObjects at a time?
>>
>> I am completely stuck. Any help will be appreciated.
>>
>> Thanks
>> Rahul
>>
>>
>>
>
>

Re: NLine Input Format

Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.
Hi Rahul,

How did you set the configuration "mapred.line.input.format.linespermap" 
and your input format? You have to set them in hadoop-site.xml or pass 
them through -D option to the job.
NLineInputFormat will split N lines of input as one split. So, each map 
gets N lines.
But the RecordReader is still LineRecordReader, which reads one line at 
time, thereby Key is the offset in the file and Value is the line.
If you want N lines as Key, you may to override LineRecordReader.

Thanks
Amareshwari

Rahul Tenany wrote:
> Hi,   I am writing a Binary Search Tree on Hadoop and for the same i require
> to use NLineInputFormat. I'll read n lines at a time, convert the numbers in
> each line from string to int and then insert them into the binary tree. Once
> the binary tree is made i'll search for elements in it. But even if i set
> that input format as NLineInputFormat and set the
> mapred.line.input.format.linespermap
> to 10, i am able to read only 1 line at the time. Any idea where am i going
> wrong? How can i find whether NLineInputFormat is working or not?
>
> I want my program to work for any object that is comparable and not just
> integers, so in there any way i can read NObjects at a time?
>
> I am completely stuck. Any help will be appreciated.
>
> Thanks
> Rahul
>
>