You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Chris Dyer <re...@umd.edu> on 2008/03/20 21:13:50 UTC

using a set of MapFiles - getting the right partition

Hi all--
I would like to have a reducer generate a MapFile so that in later
processes I can look up the values associated with a few keys without
processing an entire sequence file.  However, if I have N reducers, I
will generate N different map files, so to pick the right map file I
will need to use the same partitioner as was used when partitioning
the keys to reducers (the reducer I have running emits one value for
each key it receives and no others).  Should this be done manually, ie
something like readers[partioner.getPartition(...)] or is there
another recommended method?

Eventually, I'm going to migrate to using HBase to store the key/value
pairs (since I'd to take advantage of HBase's ability to cache common
pairs in memory for faster retrieval), but I'm interested in seeing
what the performance is like just using MapFiles.

Thanks,
Chris

Re: using a set of MapFiles - getting the right partition

Posted by Doug Cutting <cu...@apache.org>.

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/MapFileOutputFormat.html#getEntry(org.apache.hadoop.io.MapFile.Reader[],%20org.apache.hadoop.mapred.Partitioner,%20K,%20V)

MapFileOutputFormat#getEntry() does this.

Use MapFileOutputFormat#getReaders() to create the readers parameter.

Doug

Chris Dyer wrote:
> Hi all--
> I would like to have a reducer generate a MapFile so that in later
> processes I can look up the values associated with a few keys without
> processing an entire sequence file.  However, if I have N reducers, I
> will generate N different map files, so to pick the right map file I
> will need to use the same partitioner as was used when partitioning
> the keys to reducers (the reducer I have running emits one value for
> each key it receives and no others).  Should this be done manually, ie
> something like readers[partioner.getPartition(...)] or is there
> another recommended method?
> 
> Eventually, I'm going to migrate to using HBase to store the key/value
> pairs (since I'd to take advantage of HBase's ability to cache common
> pairs in memory for faster retrieval), but I'm interested in seeing
> what the performance is like just using MapFiles.
> 
> Thanks,
> Chris