You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Raghava Mutharaju <m....@gmail.com> on 2010/02/13 13:07:42 UTC

is this architecture possible?

Hello all,

      Is the following architecture possible?

A distributed key-value store is used (HBase). So along with values, there
would be a timestamp associated with the values. Map & Reduce tasks are
executed iteratively. Map, in each iteration should take in values which
were added in the previous iteration to the store (perhaps the ones with
latest timestamp?). Reduce should take in Map's output as well as the
<key,value> pairs from the store whose key(s) match the key(s) that reduce
has to process in the current iteration. The output of reduce goes to the
store.

If this is possible, which classes (eg: InputFormat, run() of Reduce) should
be extended so that instead of the regular operation the above operation
takes place. If this is not possible, are there any alternatives to achieve
the same?

Thank you.

Regards,
Raghava.