You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sandhya E <sa...@gmail.com> on 2007/07/18 07:05:53 UTC

(Unknown)

Hi

I have two MapReduces running sequentially to accomplish a job. I first
started running the jobs locally in a single machine.
First MapReduce produces a set of keys which were stored inmemory in a Set
instead of output.collect in the reduce. and the second MapReduce working on
different input files looked up the keys from the Set to act on the input
lines. But now I want to run the MapReduces on a small cluster. In memory
storage will not work here. How can the second Map running on various
machines load all the keys from first MapReduce before it starts working on
input files. Any ideas..?

Many Thanks
Sandhya

Re:

Posted by Ted Dunning <td...@veoh.com>.


Would the MapFile class help?


On 7/17/07 10:05 PM, "Sandhya E" <sa...@gmail.com> wrote:

> Hi
> 
> I have two MapReduces running sequentially to accomplish a job. I first
> started running the jobs locally in a single machine.
> First MapReduce produces a set of keys which were stored inmemory in a Set
> instead of output.collect in the reduce. and the second MapReduce working on
> different input files looked up the keys from the Set to act on the input
> lines. But now I want to run the MapReduces on a small cluster. In memory
> storage will not work here. How can the second Map running on various
> machines load all the keys from first MapReduce before it starts working on
> input files. Any ideas..?
> 
> Many Thanks
> Sandhya