You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Konstantin Boudnik (JIRA)" <ji...@apache.org> on 2009/06/04 23:41:07 UTC

[jira] Created: (HADOOP-5974) Add orthogonal fault injection mechanism/framework

Add orthogonal fault injection mechanism/framework
--------------------------------------------------

                 Key: HADOOP-5974
                 URL: https://issues.apache.org/jira/browse/HADOOP-5974
             Project: Hadoop Core
          Issue Type: Test
          Components: test
            Reporter: Konstantin Boudnik
            Assignee: Konstantin Boudnik


It'd be great to have a fault injection mechanism for Hadoop.

Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.

Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5974) Add orthogonal fault injection mechanism/framework

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717385#action_12717385 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-5974:
------------------------------------------------

>% ant run-test-hdfs -DallFaultProbability=0 -DBlockReceiverFaultProbability=10

The naming convention may be better to have something like fault.probability.*, fault.probability.datanode.BlockReceiver, etc.

> Add orthogonal fault injection mechanism/framework
> --------------------------------------------------
>
>                 Key: HADOOP-5974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5974
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: test
>            Reporter: Konstantin Boudnik
>            Assignee: Konstantin Boudnik
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5974) Add orthogonal fault injection mechanism/framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717422#action_12717422 ] 

Konstantin Boudnik commented on HADOOP-5974:
--------------------------------------------

It seems that none of current Maven repos have AspectJ1.6.4 in place. The latest version available is 1.5.4, which won't work because Hadoop is Java6 project.

Any idea how to add a latest version of a library to a Maven repository?

> Add orthogonal fault injection mechanism/framework
> --------------------------------------------------
>
>                 Key: HADOOP-5974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5974
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: test
>            Reporter: Konstantin Boudnik
>            Assignee: Konstantin Boudnik
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5974) Add orthogonal fault injection mechanism/framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717776#action_12717776 ] 

Konstantin Boudnik commented on HADOOP-5974:
--------------------------------------------

Great! Thanks for the pointer - I saw only 1.5.4 in there and somehow missed the latest version. It worked, so I will publish the patch shortly.

> Add orthogonal fault injection mechanism/framework
> --------------------------------------------------
>
>                 Key: HADOOP-5974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5974
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: test
>            Reporter: Konstantin Boudnik
>            Assignee: Konstantin Boudnik
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5974) Add orthogonal fault injection mechanism/framework

Posted by "Giridharan Kesavan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717569#action_12717569 ] 

Giridharan Kesavan commented on HADOOP-5974:
--------------------------------------------

we can file a jira with codehaus with the location of the aspectj jar file and its pom , so they can help us in uploading the latest version of aspectj to the mvn repository.

BTW I see different aspectj jar file in here .. some of them are at version-1.5.4 and some are at version 1.6.4
http://www.mvnrepository.com/search.html?query=aspectj

Could you please mention the name of the aspectj jar that you are lookin for?
 

> Add orthogonal fault injection mechanism/framework
> --------------------------------------------------
>
>                 Key: HADOOP-5974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5974
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: test
>            Reporter: Konstantin Boudnik
>            Assignee: Konstantin Boudnik
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5974) Add orthogonal fault injection mechanism/framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716737#action_12716737 ] 

Konstantin Boudnik commented on HADOOP-5974:
--------------------------------------------

Here's an overall proposition of the framework layout:
- AspectJ 1.6 should be used as the base framework
- additional set of classes needs to be developed to control and configure injection of the faults at the runtime. In the first version of the framework, I'd recommend to go with with randomly (in terms of their happening, not their location in the application code)  injected faults
- randomization level might be configured through system properties from the command line or set in a separate configuration file
- to completely turn off faults injection for a class the probability level has to be set to 0% ('zero'); setting to 100% will achieve the opposite effect
- build.xml has to be extended with a new target ('injectfaults') to weave needed aspects in place after the normal compilation of Java classes is done; JUnit targets will have to be modified to pass new probability configuration parameters into spawn JVM
- aspects' source code will be place under test/src/aop; package structure will mimic the original one of Hadoop. Say an aspect for FSDataset has to belong to org.apache.hadoop.hdfs.server.datanode

Some examples of new build/test execution interface:

To weave (build-in) aspects in place: 
- % ant injectfaults 
- To execute HDFS tests (turn everything off, but BlockReceiver faults, which set at 10% level): 
% ant run-test-hdfs -DallFaultProbability=0 -DBlockReceiverFaultProbability=10 


> Add orthogonal fault injection mechanism/framework
> --------------------------------------------------
>
>                 Key: HADOOP-5974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5974
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: test
>            Reporter: Konstantin Boudnik
>            Assignee: Konstantin Boudnik
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5974) Add orthogonal fault injection mechanism/framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716420#action_12716420 ] 

Konstantin Boudnik commented on HADOOP-5974:
--------------------------------------------

I would like to propose the following initial requirements for Fault Injection (FI) solution for Hadoop:

# Has to be orthogonal to existing source code and test base: no need of direct code or tests modifications, preferably based on a cross-cut model
# Fully detachable: insert/remove faults from the  system without hassle: a separate build target has to be set to introduce faults in place with a single command. Removal should be equally easy.
# High level of fault abstractions: implementation of faults' logic has to be done in high-level language, e.g. Java
# Need to reuse existing unit/functional tests if possible 
# Fine grained configuration at runtime: fully deterministic or random injection of the faults should be configured at runtime through a configuration file or a set of system properties - no source code modifications or re-compilation required.
# If an off-shelf solution is used it's better comes under Apache's compatible open-source license


> Add orthogonal fault injection mechanism/framework
> --------------------------------------------------
>
>                 Key: HADOOP-5974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5974
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: test
>            Reporter: Konstantin Boudnik
>            Assignee: Konstantin Boudnik
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5974) Add orthogonal fault injection mechanism/framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717373#action_12717373 ] 

Konstantin Boudnik commented on HADOOP-5974:
--------------------------------------------

My patch is pretty much ready and requires a couple of libraries to be added to the Hadoop project. These libraries aren't associated with any of Apache's projects: they are under Eclipse Software License and are distributed from their website.

I'm not sure what is the 'rule of thumb' to add the libraries to ivy configuration for Hadoop? Or shall they be added statically, e.g. into SVN repository? I assume that the latter is a bad idea generally, which leaves us with the former option.

Can any of the watchers comment on this, please?

> Add orthogonal fault injection mechanism/framework
> --------------------------------------------------
>
>                 Key: HADOOP-5974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5974
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: test
>            Reporter: Konstantin Boudnik
>            Assignee: Konstantin Boudnik
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-5974) Add orthogonal fault injection mechanism/framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716737#action_12716737 ] 

Konstantin Boudnik edited comment on HADOOP-5974 at 6/5/09 12:49 PM:
---------------------------------------------------------------------

Here's an overall proposition of the framework layout:
- AspectJ 1.6 should be used as the base framework
- additional set of classes needs to be developed to control and configure injection of the faults at the runtime. In the first version of the framework, I'd recommend to go with with randomly (in terms of their happening, not their location in the application code)  injected faults
- randomization level might be configured through system properties from the command line or set in a separate configuration file
- to completely turn off faults injection for a class the probability level has to be set to 0% ('zero'); setting to 100% will achieve the opposite effect
- build.xml has to be extended with a new target ('injectfaults') to weave needed aspects in place after the normal compilation of Java classes is done; JUnit targets will have to be modified to pass new probability configuration parameters into spawn JVM
- aspects' source code will be place under test/src/aop; package structure will mimic the original one of Hadoop. Say an aspect for FSDataset has to belong to org.apache.hadoop.hdfs.server.datanode

Some examples of new build/test execution interface:

To weave (build-in) aspects in place: 
- % ant injectfaults 
To execute HDFS tests (turn everything off, but BlockReceiver faults, which set at 10% level): 
- % ant run-test-hdfs -DallFaultProbability=0 -DBlockReceiverFaultProbability=10 


      was (Author: cos):
    Here's an overall proposition of the framework layout:
- AspectJ 1.6 should be used as the base framework
- additional set of classes needs to be developed to control and configure injection of the faults at the runtime. In the first version of the framework, I'd recommend to go with with randomly (in terms of their happening, not their location in the application code)  injected faults
- randomization level might be configured through system properties from the command line or set in a separate configuration file
- to completely turn off faults injection for a class the probability level has to be set to 0% ('zero'); setting to 100% will achieve the opposite effect
- build.xml has to be extended with a new target ('injectfaults') to weave needed aspects in place after the normal compilation of Java classes is done; JUnit targets will have to be modified to pass new probability configuration parameters into spawn JVM
- aspects' source code will be place under test/src/aop; package structure will mimic the original one of Hadoop. Say an aspect for FSDataset has to belong to org.apache.hadoop.hdfs.server.datanode

Some examples of new build/test execution interface:

To weave (build-in) aspects in place: 
- % ant injectfaults 
- To execute HDFS tests (turn everything off, but BlockReceiver faults, which set at 10% level): 
% ant run-test-hdfs -DallFaultProbability=0 -DBlockReceiverFaultProbability=10 

  
> Add orthogonal fault injection mechanism/framework
> --------------------------------------------------
>
>                 Key: HADOOP-5974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5974
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: test
>            Reporter: Konstantin Boudnik
>            Assignee: Konstantin Boudnik
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-5974) Add orthogonal fault injection mechanism/framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716737#action_12716737 ] 

Konstantin Boudnik edited comment on HADOOP-5974 at 6/5/09 12:49 PM:
---------------------------------------------------------------------

Here's an overall proposition of the framework layout:
- AspectJ 1.6 should be used as the base framework
- additional set of classes needs to be developed to control and configure injection of the faults at the runtime. In the first version of the framework, I'd recommend to go with with randomly (in terms of their happening, not their location in the application code)  injected faults
- randomization level might be configured through system properties from the command line or set in a separate configuration file
- to completely turn off faults injection for a class the probability level has to be set to 0% ('zero'); setting to 100% will achieve the opposite effect
- build.xml has to be extended with a new target ('injectfaults') to weave needed aspects in place after the normal compilation of Java classes is done; JUnit targets will have to be modified to pass new probability configuration parameters into spawn JVM
- aspects' source code will be place under test/src/aop; package structure will mimic the original one of Hadoop. Say an aspect for FSDataset has to belong to org.apache.hadoop.hdfs.server.datanode

Some examples of new build/test execution interface:

To weave (build-in) aspects in place: 
- % ant injectfaults 

To execute HDFS tests (turn everything off, but BlockReceiver faults, which set at 10% level): 
- % ant run-test-hdfs -DallFaultProbability=0 -DBlockReceiverFaultProbability=10 


      was (Author: cos):
    Here's an overall proposition of the framework layout:
- AspectJ 1.6 should be used as the base framework
- additional set of classes needs to be developed to control and configure injection of the faults at the runtime. In the first version of the framework, I'd recommend to go with with randomly (in terms of their happening, not their location in the application code)  injected faults
- randomization level might be configured through system properties from the command line or set in a separate configuration file
- to completely turn off faults injection for a class the probability level has to be set to 0% ('zero'); setting to 100% will achieve the opposite effect
- build.xml has to be extended with a new target ('injectfaults') to weave needed aspects in place after the normal compilation of Java classes is done; JUnit targets will have to be modified to pass new probability configuration parameters into spawn JVM
- aspects' source code will be place under test/src/aop; package structure will mimic the original one of Hadoop. Say an aspect for FSDataset has to belong to org.apache.hadoop.hdfs.server.datanode

Some examples of new build/test execution interface:

To weave (build-in) aspects in place: 
- % ant injectfaults 
To execute HDFS tests (turn everything off, but BlockReceiver faults, which set at 10% level): 
- % ant run-test-hdfs -DallFaultProbability=0 -DBlockReceiverFaultProbability=10 

  
> Add orthogonal fault injection mechanism/framework
> --------------------------------------------------
>
>                 Key: HADOOP-5974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5974
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: test
>            Reporter: Konstantin Boudnik
>            Assignee: Konstantin Boudnik
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5974) Add orthogonal fault injection mechanism/framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717391#action_12717391 ] 

Konstantin Boudnik commented on HADOOP-5974:
--------------------------------------------

Thanks for the suggestion, Nicholas. I like your way (the prefixing with 
fault.probability) better and I'm putting it into the patch right away.

As for suffix of the name it'd be completely up to the aspects developers to 
name it. However, I agree that datanode.BlockReceiver would more mnemonically 
appealing.



> Add orthogonal fault injection mechanism/framework
> --------------------------------------------------
>
>                 Key: HADOOP-5974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5974
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: test
>            Reporter: Konstantin Boudnik
>            Assignee: Konstantin Boudnik
>
> It'd be great to have a fault injection mechanism for Hadoop.
> Having such solution in place will allow to increase test coverage of error handling and recovery mechanisms, reduce reproduction time and increase the reproduction rate of the problems.
> Ideally, the system has to be orthogonal to the current code and test base. E.g. faults have to be injected at build time and would have to be configurable, e.g. all faults could be turned off, or only some of them would be allowed to happen. Also, fault injection has to be separated from production build. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.