You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Sanjay Radia (JIRA)" <ji...@apache.org> on 2007/10/03 19:55:50 UTC

[jira] Created: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster
----------------------------------------------------------------------------------------------------------------------------------------

Key: HADOOP-1989
URL: https://issues.apache.org/jira/browse/HADOOP-1989
Project: Hadoop
Issue Type: Improvement
Components: dfs
Reporter: Sanjay Radia
Priority: Minor

Proposal is to add an implementation for a Simulated Data Node.
This will
- allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
- allow one to run performance benchmarks on the Name node without having a large cluster.
- Inject faults for testing (e.g. one can add random faults based probability parameters).

The idea is that the Simulated Data Node will
- discard any data written to blocks (but remember the blocks and their sizes)
- generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).

The Simulated Data Node can also be used for fault injection.
The data node can be parameterized with probabilities that allow one to control:
- Delays on reads and writes, creates, etc
- IO Exceptions
- Loss of blocks
- Failures

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542648 ] 

dhruba borthakur commented on HADOOP-1989:
------------------------------------------

I still get a merge failure on DataChecksum.java. I managed to merge it by hand. I am running the unit test and saw a test failure on TestCrcCorruption:

Testcase: testCrcCorruption took 12.692 sec
    Caused an ERROR
Java heap space
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.fs.FSInputChecker.set(FSInputChecker.java:396)
    at org.apache.hadoop.fs.FSInputChecker.<init>(FSInputChecker.java:71)
    at org.apache.hadoop.dfs.DFSClient$BlockReader.<init>(DFSClient.java:697)
    at org.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:755)
    at org.apache.hadoop.dfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:1144)
    at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1211)
    at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:66)
    at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:56)
    at org.apache.hadoop.dfs.DFSTestUtil.checkFiles(DFSTestUtil.java:150)
    at org.apache.hadoop.dfs.TestCrcCorruption.thistest(TestCrcCorruption.java:181)
    at org.apache.hadoop.dfs.TestCrcCorruption.testCrcCorruption(TestCrcCorruption.java:223)


> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-1989:
----------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thank you Sanjay!

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt, SimulatedStoragePatchSubmit9.patch
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Mukund Madhugiri (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mukund Madhugiri updated HADOOP-1989:
-------------------------------------

    Status: Open  (was: Patch Available)

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1989:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Sanjay!

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sanjay Radia updated HADOOP-1989:
---------------------------------

    Status: Patch Available  (was: Reopened)

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt, SimulatedStoragePatchSubmit9.patch
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Mukund Madhugiri (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mukund Madhugiri updated HADOOP-1989:
-------------------------------------

    Status: Patch Available  (was: Open)

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting reopened HADOOP-1989:
----------------------------------


Sorry, I'm coming in late here, but the 'datanodecluster' command should not be added to bin/hadoop, since it's not a command for end-users, but only for testing.  It can easily be run by developers using the slightly more verbose syntax:
{noformat}
bin/hadoop org.apache.hadoop.dfs.DataNodeCluster
{noformat}
The shortcut commands in bin/hadoop should be for end users, not developers.


> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542823 ] 

Sanjay Radia commented on HADOOP-1989:
--------------------------------------

I am seeing no merge conflicts with DataChecksum.java. (Just did an update from SVN).
(Also  ran the above test with no problems.)
Email me your merged Datachecksum.java file, will compare with my merged file.

sanjay

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540526 ] 

Doug Cutting commented on HADOOP-1989:
--------------------------------------

> For example startDataNode refers to the simulated impl if it is starting the simulated fsdataset [ ... ]

This could be handled other ways, that do not require test code in trunk, no?  For example, you could replace 'new SimulatedFoo' with (FooInterface)ReflectionUtils.newInstance("org.apache.hadoop.dfs.SimulatedFoo"), so that there's no compile-time dependency.  This could be inside a factory method, so that it's only done once per simulated class.

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Sanjay Radia
>            Priority: Minor
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541131 ] 

Konstantin Shvachko commented on HADOOP-1989:
---------------------------------------------

- DataNode.java
-# warning: import java.lang.reflect.Constructor; 
-# throw new IOException(e.toString()); should use StringUtils.stringifyException(e), otherwise the stack information will be lost.
-# scheduleBlockReport() should replace lines 690 - 691.
- FSDataset.java
-# private getMetaDataInStream(Block b) should be removed.

Everything else looks good.

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Sanjay Radia
>            Priority: Minor
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538148 ] 

Konstantin Shvachko commented on HADOOP-1989:
---------------------------------------------

AbstractFSDataset.java
- AbstractFSDataset should be an interface rather than an abstract class, since it does not implement any methods, 
and all methods are declared abstract public. Therefore, it should be called FSDatasetInteface.
I also think it should not be dependent on FSContants.
- @author field should be removed. See HADOOP-1147.
- Javadoc: missing descriptions of public methods.
- Long lines.
- Too many line breaks between methods.

SimulatedFSDataset.java
- redundant import java.io.OutputStream;
- Needs javadoc desciption of SimulatedFSDataset class.
- needs line break between class and method declarations.
- subclasses BInfo and Storage should private.
- Storage is not a good name for the class, since it is already used. May be SimulatedStorage.
Is it possible to reuse classes like DF or DatanodeInfo here?

DataNode.java
- Javadoc description for method startDataNode().
- You do not need extra variable sendBlockReportatNextHeartbeat. 
Instead the new method sendBlockReport() should set 
      lastHeartbeat=0;
      lastBlockReport=0;
as it is done in case DatanodeProtocol.DNA_REGISTER:
I'd then call this method scheduleBlockReport(), and it should not be public.
- In readMetadata() both methods 
{code}
        checksumIn = data.getMetaDataInStream(block);
        long fileSize = data.getMetaDataLength(block);
{code}
perform access to the data-node block map, which is not efficient.
Can it be optimized?

DataChecksum.java
- Would it be more clean to have SimulatedFSDataset.getChecksumHeader(checkSum)
rather than DataChecksum.getHeader() so that to keep all simulated methods
inside the simulated classes?

simulatedStreams
- SimulatedInputStream and SimulatedOutputStream should be private subclasses of
SimulatedFSDataset, because they are not used outside of the dataset directly.

PendingReplicationBlocks.java
- remove(Block block){
empty line included.

ClusterTestDFS.java
- This is the only place where AbstractFSDataset.getVolumeNames() is used.
I think it toString() should be use here insted, getVolumeNames() can then be 
removed from the abstract class.

TestFileCreation.java
- In conf.setBoolean("dfs.datanode.simulateddatastorage", true);
constant CONFIG_PROPERTY_SIMULATED for should be used or not used consistently in all cases. 
May be it is more consistent with hadoop current practices to use config name directly.

TestSetrepIncreasing.java
- testSetrepIncreasingSimulatedStorage(): Tabs are off.
- same constant as in TestFileCreation.

TestSmallBlock.java
- Tabs should be 2 and replaced by spaces.

TestInjectionForSimulatedStorage.java
- A lot of redundant imports.
- writeFile(): formatting.

MiniDFSCluter
- The NOTE: in Javadoc for MiniDFSCluter constructor does not make sense any more.
- the line
 	if (dataSet.getClass() != SimulatedFSDataset.class)  
should probably read
  	if (dataSet instanceof SimulatedFSDataset) 

TestPRead.java
- methods should be separated by a blank line.

TestReplication.java
- System.out.println() should be removed. LOG should be used instead if necessary.

TestSimulatedFSDataset.java
- redundant import org.apache.hadoop.dfs.AbstractFSDataset.BlockWriteStreams;
- testWriteRead(): bytesAdded is never used


> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Sanjay Radia
>            Priority: Minor
>         Attachments: SimulatedStoragePatchSubmit.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sanjay Radia updated HADOOP-1989:
---------------------------------

    Attachment: SimulatedStoragePatchSubmit8.txt

My last patch had a conflict.
Updated

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sanjay Radia updated HADOOP-1989:
---------------------------------

    Attachment: SimulatedStoragePatchSubmit9.patch

removed datanodecluster from bin/hadoop

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt, SimulatedStoragePatchSubmit9.patch
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543162 ] 

Hadoop QA commented on HADOOP-1989:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12369558/SimulatedStoragePatchSubmit8.txt
against trunk revision r595406.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1104/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1104/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1104/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1104/console

This message is automatically generated.

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sanjay Radia updated HADOOP-1989:
---------------------------------

    Attachment: SimulatedStoragePatchSubmit6.txt

Updated patch. 
This patch addresses the isssue raised by Doug. 
SimulatedFSDataset is now under test.
sanjay

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Sanjay Radia
>            Priority: Minor
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541528 ] 

Hadoop QA commented on HADOOP-1989:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12369208/SimulatedStoragePatchSubmit7.txt
against trunk revision r593743.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1088/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1088/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1088/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1088/console

This message is automatically generated.

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543177 ] 

Sanjay Radia commented on HADOOP-1989:
--------------------------------------

BTW the datanodecluster command will work for non-simulated data nodes - it creates multiple datanodes in one JVM.
So if you had a large machine and wanted to run multiple datanodes (real not simulated) in one JVM you could do this.
However one could argue that one may want to do this only for testing.

I will remove the datanodecluster  shortly.

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559235#action_12559235 ] 

Hadoop QA commented on HADOOP-1989:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12373150/SimulatedStoragePatchSubmit9.patch
against trunk revision r612200.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1600/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1600/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1600/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1600/console

This message is automatically generated.

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt, SimulatedStoragePatchSubmit9.patch
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sanjay Radia updated HADOOP-1989:
---------------------------------

    Attachment: SimulatedStoragePatchSubmit5.txt

The attached patch addresses Konstantine's feedback on the previous patch.
It also add a new class DataNodeCluster that allows one to run a DataNode cluster in a single address space (the
name node can be in a separate address space). This class allows one to run multiple instances of the simulated
data node in a single VM; this is useful for benchmarking with a real Name node and a large number of 
simulated data nodes. The hadoop command has been modified to allow one to run this as:
      bin/hadoop datanodecluster

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Sanjay Radia
>            Priority: Minor
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sanjay Radia updated HADOOP-1989:
---------------------------------

    Attachment: SimulatedStoragePatchSubmit.txt

This patch implements the a Simulated Data Node.
Key changes:
Abstract class AbstractFSDataset which has two main implementations:
FSDataset - the existing real storage 
SimulatedFSDataset - the new simulated storage.

DataNode, the class that communicates with the clients and the name node is mostly
unchanged - it now uses real or simulated storage.

The DataNode class obtained Files from the FSdataset. Now it obtains
InputStream or Outputstream.

When creating a MiniDFSCluster set the config property property named
SimulatedFSDataset.CONFIG_PROPERTY_SIMULATED
(i.e. the property string "dfs.datanode.simulateddatastorage") to true to create
simulated storage.

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Sanjay Radia
>            Priority: Minor
>         Attachments: SimulatedStoragePatchSubmit.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sanjay Radia updated HADOOP-1989:
---------------------------------

        Fix Version/s: 0.16.0
             Assignee: Sanjay Radia
    Affects Version/s: 0.16.0
               Status: Patch Available  (was: Open)

See attached patch (SimulatedStoragePatchSubmit7.txt)

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540298 ] 

Sanjay Radia commented on HADOOP-1989:
--------------------------------------

Tried to put the simulated implementations inside the test tree; unfortunately, it required that the test tree be compiled before the main src tree.
For example startDataNode refers to the simulated impl if it is starting the simulated fsdataset. Note that we are reusing 
the datanode implementation; only the fsdataset is different.


> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Sanjay Radia
>            Priority: Minor
>         Attachments: SimulatedStoragePatchSubmit.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1989:
-------------------------------------

    Status: Patch Available  (was: Open)

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543218 ] 

Hudson commented on HADOOP-1989:
--------------------------------

Integrated in Hadoop-Nightly #305 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/305/])

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541248 ] 

Hadoop QA commented on HADOOP-1989:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12369208/SimulatedStoragePatchSubmit7.txt
against trunk revision r592860.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests -1.  The patch failed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1079/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1079/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1079/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1079/console

This message is automatically generated.

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sanjay Radia updated HADOOP-1989:
---------------------------------

    Attachment: SimulatedStoragePatchSubmit7.txt

Updated - fixed the few things Konstantine mentioned

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Sanjay Radia
>            Priority: Minor
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538604 ] 

Doug Cutting commented on HADOOP-1989:
--------------------------------------

Shouldn't the simulated implementations reside in the test tree?  They're not needed in the production jar, are they?

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Sanjay Radia
>            Priority: Minor
>         Attachments: SimulatedStoragePatchSubmit.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542937 ] 

Sanjay Radia commented on HADOOP-1989:
--------------------------------------


Did the failed CRC test just work when you ran it again or did you fix 
something?

sanjay



> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532226 ] 

Stu Hood commented on HADOOP-1989:
----------------------------------

I can't believe this hasn't already been implemented! Very good idea sir.

Perhaps each Simulated Data Node should be a pool of dummy nodes that grabs a few ports and contacts and replies to the Namenode with a thread pool.

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Sanjay Radia
>            Priority: Minor
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1989:
-------------------------------------

    Status: Open  (was: Patch Available)

This patch is not merging cleanly with trunk. Can you pl upload the patch again?

I saw a few empty line-changes in FSDataset.java. Also, there is a change to log4j.properties which might not be related to this patch.

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt, SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.