You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Eric Caspole <er...@amd.com> on 2011/07/13 16:14:16 UTC
Using ram disk for cluster.local.dir
I have a 1 node pseudo cluster with plenty of RAM and 5 HDs. As an
experiment, I set mapreduce.cluster.local.dir to point to a ram disk.
For this experiment I am running 8GB terasort, so I made a 9GB ram
disk. This change sped up the run time of the job by ~16% versus
pointing mapreduce.cluster.local.dir to a csv list of 5 HDs.
I have two questions about this -
- Will this work in a cluster situation where say I have a 12GB ram
disk per cluster node, and I am working on a 128GB terasort, or does
the cluster.local.dir free space size per node have to be big enough
to hold all intermediate results? My hunch is yes but I am not sure.
- By googling I found that there is very little info about people
trying to use ram disks with hadoop in this way, so it seems like
there is a technical reason people do not do it, perhaps the size
related issue I mentioned. Are there other gotchas about trying to
use a ram disk like this? It seems like a quick and dirty way to get
some performance.
Thanks,
Eric