You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by maneesh varshney <mv...@gmail.com> on 2011/07/19 03:08:58 UTC

Hadoop Discrete Event Simulator

Hello,

Perhaps somebody can point out if there have been efforts to "simulate"
Hadoop clusters.

What I mean is a discrete event simulator that models the hosts and the
networks and run hadoop algorithms for some synthetic workload. Something
similar to network simulators (for example, ns2).

If such as tool is available, I was hoping to use it for:
a. Getting a general sense of how the HDFS and MapReduce algorithms work.
For example, if I were to store 1TB data over 100 nodes, how would the
blocks get distributed.
b. Use the simulation to optimize my configuration parameters. For example,
the relationship between performance and number of cluster node, or number
of replicas, and so on.

The need for point b. above is to be able to study/analyze the performance
without (or before) actually running the algorithms on an actual cluster.

Thanks in advance,
Maneesh

PS: I apologize if this question has been asked earlier. I could not seem to
locate the search feature in the mailing list archive.

RE: Hadoop Discrete Event Simulator

Posted by Je...@shell.com.
Maneesh, 

You may want to check this out

https://issues.apache.org/jira/browse/HADOOP-5005

-----Original Message-----
From: maneesh varshney [mailto:mvarshney@gmail.com] 
Sent: Monday, July 18, 2011 8:09 PM
To: common-user@hadoop.apache.org
Subject: Hadoop Discrete Event Simulator

Hello,

Perhaps somebody can point out if there have been efforts to "simulate"
Hadoop clusters.

What I mean is a discrete event simulator that models the hosts and the
networks and run hadoop algorithms for some synthetic workload.
Something
similar to network simulators (for example, ns2).

If such as tool is available, I was hoping to use it for:
a. Getting a general sense of how the HDFS and MapReduce algorithms
work.
For example, if I were to store 1TB data over 100 nodes, how would the
blocks get distributed.
b. Use the simulation to optimize my configuration parameters. For
example,
the relationship between performance and number of cluster node, or
number
of replicas, and so on.

The need for point b. above is to be able to study/analyze the
performance
without (or before) actually running the algorithms on an actual
cluster.

Thanks in advance,
Maneesh

PS: I apologize if this question has been asked earlier. I could not
seem to
locate the search feature in the mailing list archive.