You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ishwar ramani <rv...@gmail.com> on 2009/10/09 01:23:35 UTC

retrieving sequenceFile Postion of Key in mapper task

Hi,

I need to get the position of the key being processed in a mapper task.
My inputFile is a sequence file ....

I tried the Context, but the best i could get was the inputsplit
position and the
file name ....


My other option is to start recording the pos in the key value while generating
the sequence file.
But that would mean rewriting all the files i already have :(

any thoughts?

ishwar

Re: retrieving sequenceFile Postion of Key in mapper task

Posted by ishwar ramani <rv...@gmail.com>.
thanks. that worked  fine ....


On Thu, Oct 8, 2009 at 10:45 PM, Ahad Rana <ah...@commoncrawl.org> wrote:
> Oops, memory fails me. To correct my previous statement, for block
> compressed files, getPosition reflects the position in the input stream of
> the NEXT compressed block of data, so you have to watch for the change in
> position after reading the key/value to capture a block transition.
> Ahad.
>
> On Thu, Oct 8, 2009 at 10:22 PM, Ahad Rana <ah...@commoncrawl.org> wrote:
>
>> Hi Ishwar,
>> You can implement a custom MapRunner and retrieve the position from the
>> reader before calling your map function. Be aware though, that for block
>> compressed files, the position returned represents block start position, not
>> the individual record position.
>>
>> Ahad.
>>
>>
>> On Thu, Oct 8, 2009 at 4:23 PM, ishwar ramani <rv...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I need to get the position of the key being processed in a mapper task.
>>> My inputFile is a sequence file ....
>>>
>>> I tried the Context, but the best i could get was the inputsplit
>>> position and the
>>> file name ....
>>>
>>>
>>> My other option is to start recording the pos in the key value while
>>> generating
>>> the sequence file.
>>> But that would mean rewriting all the files i already have :(
>>>
>>> any thoughts?
>>>
>>> ishwar
>>>
>>
>>
>

Re: retrieving sequenceFile Postion of Key in mapper task

Posted by Ahad Rana <ah...@commoncrawl.org>.
Oops, memory fails me. To correct my previous statement, for block
compressed files, getPosition reflects the position in the input stream of
the NEXT compressed block of data, so you have to watch for the change in
position after reading the key/value to capture a block transition.
Ahad.

On Thu, Oct 8, 2009 at 10:22 PM, Ahad Rana <ah...@commoncrawl.org> wrote:

> Hi Ishwar,
> You can implement a custom MapRunner and retrieve the position from the
> reader before calling your map function. Be aware though, that for block
> compressed files, the position returned represents block start position, not
> the individual record position.
>
> Ahad.
>
>
> On Thu, Oct 8, 2009 at 4:23 PM, ishwar ramani <rv...@gmail.com> wrote:
>
>> Hi,
>>
>> I need to get the position of the key being processed in a mapper task.
>> My inputFile is a sequence file ....
>>
>> I tried the Context, but the best i could get was the inputsplit
>> position and the
>> file name ....
>>
>>
>> My other option is to start recording the pos in the key value while
>> generating
>> the sequence file.
>> But that would mean rewriting all the files i already have :(
>>
>> any thoughts?
>>
>> ishwar
>>
>
>

Re: retrieving sequenceFile Postion of Key in mapper task

Posted by Ahad Rana <ah...@commoncrawl.org>.
Hi Ishwar,
You can implement a custom MapRunner and retrieve the position from the
reader before calling your map function. Be aware though, that for block
compressed files, the position returned represents block start position, not
the individual record position.

Ahad.

On Thu, Oct 8, 2009 at 4:23 PM, ishwar ramani <rv...@gmail.com> wrote:

> Hi,
>
> I need to get the position of the key being processed in a mapper task.
> My inputFile is a sequence file ....
>
> I tried the Context, but the best i could get was the inputsplit
> position and the
> file name ....
>
>
> My other option is to start recording the pos in the key value while
> generating
> the sequence file.
> But that would mean rewriting all the files i already have :(
>
> any thoughts?
>
> ishwar
>