You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by rab ra <ra...@gmail.com> on 2013/11/07 16:17:29 UTC

Sending map process to multiple nodes, special use case

Hello

In one of my use case, I am sending map processes to large number of hadoop
nodes. Assuming that the nodes are obtained from public cloud. I would like
to ensure that the security of the nodes are not compromised. For
this,planning to implement voting mechanism wherein multiple copies, lets
say 3,  of same map process is sent to 3 different nodes. In this regard, i
have the following question.

1. I am using NLineInputFormat, wherein each line is sent to one map
process. Is there any mechanism in hadoop to create 3 similar map processes
for single line? This I can mimic by writing same lines thrice in the input
file which is referred by NLineInputFormat. Is there any other elegant way
to do this?

2. Is there any mechanism with which I can ensure similar map processes are
sent to three different nodes?. Any way to control scheduling of map
processes to specific nodes. For example, map1 should go to node 1, and so
on.

3. Or is there any scheduler that implements voting mechanism that I can
use in conjunction with hadoop?

I am not sure about my above approach. Basically, I would like to ensure
the results generated by the nodes are correct and can be trusted. For
instance, I am sending one map process to three nodes. I verify the results
from these three nodes and if one node has given different result, it is
assumed that the node need to verified.

Is there any other possible approach, please share with me.

regards
rab