You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Eric Caspole <er...@amd.com> on 2011/07/13 16:14:16 UTC

Using ram disk for cluster.local.dir

I have a 1 node pseudo cluster with plenty of RAM and 5 HDs. As an  
experiment, I set mapreduce.cluster.local.dir to point to a ram disk.  
For this experiment I am running 8GB terasort, so I made a 9GB ram  
disk. This change sped up the run time of the job by ~16% versus  
pointing mapreduce.cluster.local.dir to a csv list of 5 HDs.

I have two questions about this -

- Will this work in a cluster situation where say I have a 12GB ram  
disk per cluster node, and I am working on a 128GB terasort, or does  
the cluster.local.dir free space size per node have to be big enough  
to hold all intermediate results? My hunch is yes but I am not sure.

- By googling I found that there is very little info about people  
trying to use ram disks with hadoop in this way, so it seems like  
there is a technical reason people do not do it, perhaps the size  
related issue I mentioned. Are there other gotchas about trying to  
use  a ram disk like this? It seems like a quick and dirty way to get  
some performance.

Thanks,
Eric