You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Udaya Lakshmi <ud...@gmail.com> on 2010/01/28 10:01:30 UTC

Fileformat query

Hi all..
   I have searched the documentation but could not find a input file
format which will give line number as the key and line as the value.
Did I miss something? Can someone give me a clue of how to implement
one such input file format.

Thanks,
Udaya.

Re: Fileformat query

Posted by Udaya Lakshmi <ud...@gmail.com>.
Thank you Jeff.

On 1/29/10, Jeff Zhang <zj...@gmail.com> wrote:
> Sorry for my mistake, the idea of writing your own InputFormat seems not a
> good idea. The cost of getting the line number of each split is a little
> high.
>
>
>
> On Fri, Jan 29, 2010 at 8:40 AM, Jeff Zhang <zj...@gmail.com> wrote:
>
>> I'm afraid you have to write your own InputFormat if you really want to
>> make the line number as the key.
>> And I believe you can reuse most of the code of TextInputFormat, since
>> your
>> InputFormat is almost the same as TextInputFormat except the key.
>>
>>
>>
>>
>> On Thu, Jan 28, 2010 at 7:35 AM, Edward Capriolo
>> <ed...@gmail.com>wrote:
>>
>>> On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <ud...@gmail.com>
>>> wrote:
>>> > Hi all..
>>> >   I have searched the documentation but could not find a input file
>>> > format which will give line number as the key and line as the value.
>>> > Did I miss something? Can someone give me a clue of how to implement
>>> > one such input file format.
>>> >
>>> > Thanks,
>>> > Udaya.
>>> >
>>>
>>>
>>> Udaya,
>>>
>>> When using the standard File Input Format:
>>>
>>> public void map(LongWritable key, Text value, OutputCollector<Text,
>>> IntWritable> output, Reporter reporter) throws IOException {
>>>
>>> key represents the byte offset of the key in the input file. There is
>>> no easy way for translate the byte offset to a logical line number,
>>> unless all lines were fixed width (not usually the case)
>>>
>>> Edward
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Fileformat query

Posted by Jeff Zhang <zj...@gmail.com>.
Sorry for my mistake, the idea of writing your own InputFormat seems not a
good idea. The cost of getting the line number of each split is a little
high.



On Fri, Jan 29, 2010 at 8:40 AM, Jeff Zhang <zj...@gmail.com> wrote:

> I'm afraid you have to write your own InputFormat if you really want to
> make the line number as the key.
> And I believe you can reuse most of the code of TextInputFormat, since your
> InputFormat is almost the same as TextInputFormat except the key.
>
>
>
>
> On Thu, Jan 28, 2010 at 7:35 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <ud...@gmail.com>
>> wrote:
>> > Hi all..
>> >   I have searched the documentation but could not find a input file
>> > format which will give line number as the key and line as the value.
>> > Did I miss something? Can someone give me a clue of how to implement
>> > one such input file format.
>> >
>> > Thanks,
>> > Udaya.
>> >
>>
>>
>> Udaya,
>>
>> When using the standard File Input Format:
>>
>> public void map(LongWritable key, Text value, OutputCollector<Text,
>> IntWritable> output, Reporter reporter) throws IOException {
>>
>> key represents the byte offset of the key in the input file. There is
>> no easy way for translate the byte offset to a logical line number,
>> unless all lines were fixed width (not usually the case)
>>
>> Edward
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Re: Fileformat query

Posted by Jeff Zhang <zj...@gmail.com>.
I'm afraid you have to write your own InputFormat if you really want to make
the line number as the key.
And I believe you can reuse most of the code of TextInputFormat, since your
InputFormat is almost the same as TextInputFormat except the key.



On Thu, Jan 28, 2010 at 7:35 AM, Edward Capriolo <ed...@gmail.com>wrote:

> On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <ud...@gmail.com> wrote:
> > Hi all..
> >   I have searched the documentation but could not find a input file
> > format which will give line number as the key and line as the value.
> > Did I miss something? Can someone give me a clue of how to implement
> > one such input file format.
> >
> > Thanks,
> > Udaya.
> >
>
>
> Udaya,
>
> When using the standard File Input Format:
>
> public void map(LongWritable key, Text value, OutputCollector<Text,
> IntWritable> output, Reporter reporter) throws IOException {
>
> key represents the byte offset of the key in the input file. There is
> no easy way for translate the byte offset to a logical line number,
> unless all lines were fixed width (not usually the case)
>
> Edward
>



-- 
Best Regards

Jeff Zhang

Re: Fileformat query

Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi <ud...@gmail.com> wrote:
> Hi all..
>   I have searched the documentation but could not find a input file
> format which will give line number as the key and line as the value.
> Did I miss something? Can someone give me a clue of how to implement
> one such input file format.
>
> Thanks,
> Udaya.
>


Udaya,

When using the standard File Input Format:

public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException {

key represents the byte offset of the key in the input file. There is
no easy way for translate the byte offset to a logical line number,
unless all lines were fixed width (not usually the case)

Edward