You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Brock Palen <br...@mlds-networks.com> on 2009/05/29 15:56:10 UTC

Hadoop on demand intelligence with persistent HDFS.

We have an existing HPC cluster which runs torque+moab.

There has been requests for a persistent HDFS on a subset of the  
nodes (~20 nodes of the 800).

We can use Moab to force HOD jobs to go only to those 20 nodes.  Thus  
there should be hadoop map-reduce workers on nodes that have HDFS  
local also.

Question is then, will work given to an HOD instance, will hadoop  
move the computation to the closest nodes with the data on HDFS?

I know hadoop does this when it runs the whole cluster,  but what  
does HOD do with external HDFS, even if HOD nodes overlap some (maybe  
not all) the HDFS nodes.

Rack locality wont matter, currently, the 20 nodes will be all the  
same blade,

Our goal is to keep running our normal HPC workload, but provide a  
HDFS that sticks around, and also provides decent performance,  
relative to normal Hadoop clusters.



Brock Palen
brockp@mlds-networks.com
www.mlds-networks.com
MLDS Owner Senior Tech.