You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Rob Podolski <ro...@yahoo.co.uk> on 2011/11/30 07:28:53 UTC

Can I set mappers to use overlapping record ranges from a sequence file?

Hi

I have a computation to do for a large input - a single large sequence file.  Ideally I would like to set a specific number of mappers and designate each to process over a specific range of records in the input sequence file.  For various reasons, the record ranges that I would want to pass to each mapper would be over-lapping (e.g. mapper 1 record ranges 1 - 1000, mapper 2 record ranges 700 - 2000 etc).

Is it possible to do this? If so how would I go about it?  InputFormat does not seem to cater for this.  Perhaps Hadoop might not be the right 'parallel' framework for me to do this in.  

thnks.

Re: Can I set mappers to use overlapping record ranges from a sequence file?

Posted by Robert Evans <ev...@yahoo-inc.com>.
I have never done this, but I think it should be possible.  You will likely not get much data locality doing this, but you should be able create your own input format and have it write out entries in the split file that indicate the ranges you wanted.  I may be wrong but I thought that the InputFormat is both the producer and only consumer of the split data.

--Bobby Evans

On 11/30/11 12:28 AM, "Rob Podolski" <ro...@yahoo.co.uk> wrote:

Hi

I have a computation to do for a large input - a single large sequence file.  Ideally I would like to set a specific number of mappers and designate each to process over a specific range of records in the input sequence file.  For various reasons, the record ranges that I would want to pass to each mapper would be over-lapping (e.g. mapper 1 record ranges 1 - 1000, mapper 2 record ranges 700 - 2000 etc).

Is it possible to do this? If so how would I go about it?  InputFormat does not seem to cater for this.  Perhaps Hadoop might not be the right 'parallel' framework for me to do this in.

thnks.