You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Robert Chansler (JIRA)" <ji...@apache.org> on 2008/09/10 18:39:44 UTC
[jira] Updated: (HADOOP-4142) Synthetic Load Generator for NameNode testing -- Next Generation

     [ https://issues.apache.org/jira/browse/HADOOP-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-4142:
------------------------------------

    Description: 
A review of the Synthetic Load Generator identified several candidates for improvement. None are so urgent as to stand in the way of the present facility, but all are of sufficient interest to merit inclusion in this note.


  was:
A review of the Synthetic Load Generator identified several candidates for improvement. None are so urgent as to stand in the way of the present facility, but all are of sufficient interest to merit inclusion in this note.

* The SLG should model the proper number of connections to the NameNode. This might be accomplished by using different user identities for each connection.

* There may be more appropriate statistical distributions that could have been chosen. Should requests have Poisson statistics? Is a log-normal distribution more important for files sizes? With a special case of zero-length files?

* "Intensity" might be a more convenient expression of rate parameters (events/second)

* Does a static initial name space topology bias results?

* Does the pairing of create with delete bias results?

* Should there be a correlation between listing a directory and reading its files?

* To the extent possible, tests should be repeatable. This can't be accomplished in an absolute sense, but at least the behavior of the SLG should be as reproducible as possible. A reasonable default for simulation applications is that each sequence of events be generated independently, and that the seeds for each generator come from another reproducible sequence.

* Many of the questions of event distribution could evaded if the SLG could replay a sequence of events from live log. Maybe there should be published a log that represents one standard Hadoop of load.


* The SLG should model the proper number of connections to the NameNode. This might be accomplished by using different user identities for each connection.

* There may be more appropriate statistical distributions that could have been chosen. Should requests have Poisson statistics? Is a log-normal distribution more important for files sizes? With a special case of zero-length files?

* "Intensity" might be a more convenient expression of rate parameters (events/second)

* Does a static initial name space topology bias results?

* Does the pairing of create with delete bias results?

* Should there be a correlation between listing a directory and reading its files?

* To the extent possible, tests should be repeatable. This can't be accomplished in an absolute sense, but at least the behavior of the SLG should be as reproducible as possible. A reasonable default for simulation applications is that each sequence of events be generated independently, and that the seeds for each generator come from another reproducible sequence.

* Many of the questions of event distribution could evaded if the SLG could replay a sequence of events from live log. Maybe there should be published a log that represents one standard Hadoop of load.

> Synthetic Load Generator for NameNode testing -- Next Generation
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4142
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4142
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: dfs
>            Reporter: Robert Chansler
>
> A review of the Synthetic Load Generator identified several candidates for improvement. None are so urgent as to stand in the way of the present facility, but all are of sufficient interest to merit inclusion in this note.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.