You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2009/10/25 07:29:59 UTC

[jira] Created: (HADOOP-6332) Large-scale Automated Test Framework

Large-scale Automated Test Framework
------------------------------------

                 Key: HADOOP-6332
                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
             Project: Hadoop Common
          Issue Type: New Feature
          Components: test
            Reporter: Arun C Murthy
             Fix For: 0.21.0


Hadoop would benefit from having a large-scale, automated, test-framework.

This jira is meant to be a master-jira to track relevant details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837968#action_12837968 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

@Stephen: the main reason to use code injection is to completely hide testing handles from any chance of misusing by a stranger. Apparently many of the contracts (interfaces, APIs) we are interested in a course of testing either unveil internal states of key Hadoop components or allow to perform 'undesirable' actions such as killing a job, a tasktracker, or a datanode it'd be unwise to keep them in the A-grade production code. Therefore, code injection seems to be the right technique for this. 

Next version of the patch is coming any minute now. It will be clear that all interfaces exposed to test are defined statically. Their implementation is injected though, which shouldn't concern anyone but framework developers.

Now, a particular implementation of injection doesn't really matter. We could've go with ASM or BCEL for the purpose. It happens that we have readily available AspectJ providing high-level language capabilities, Eclipse integration, etc. That explain the choice of the framework.

As for an extra burden for future contributors: instrumentation is used for internal framework mechanics and shouldn't be exposed to the test developers. Thus, if one simply want to develop a cluster test she/he can do it from a vanilla Eclipse without AJDT installed. Or from IDEA (which I personally prefer and use all the time, except when I need to develop/fix some aspects). Or from vim (not like I suggest to do it :-)


> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866041#action_12866041 ] 

Hadoop QA commented on HADOOP-6332:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12444175/HADOOP-6332.0.22.patch
  against trunk revision 941662.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 48 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/66/console

This message is automatically generated.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866055#action_12866055 ] 

Hadoop QA commented on HADOOP-6332:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12444175/HADOOP-6332.0.22.patch
  against trunk revision 941662.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 48 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/514/console

This message is automatically generated.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: HADOOP-6332.0.22.patch

Missing tests list file is added.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Status: Open  (was: Patch Available)

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778994#action_12778994 ] 

Steve Loughran commented on HADOOP-6332:
----------------------------------------

I could do mon tue or wed next week, that is 23, 24 or 25 of November, at or after 20:00 GMT, which is what, midday pacific? We could start with getting everyone interested in the problem to talk about what their use cases/needs are, and then discuss how to go about meeting them

I'll be connecting from home in the UK; assuming the majority of participants in the it's probably best if someone in the bay area hosts the Skype meeting for lower latency and higher reliability. Any volunteers?



> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869371#action_12869371 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

I'm not saying it is _impossible_ to do as a separate project. Packaging problem isn't an issue. In fact, current approach will publish instrumented artifacts separately too. 

Now, to weave aspects one doesn't need to have source code available at the build time: compiled aspects should be sufficient. However, keeping the framework out of the Hadoop's source tree has two fold problem:
  - all visible changes in the bulld system will be the same + a lot of stuff from {{src/test/aop/build/aop.xml}} will have to be brought into the Common, HDFS, and MR builds anyway.
  - we'll need to have a source code dependency on Hadoop's subprojects in the framework development time to make sure the aspects are binding right, etc.

These are disadvantages. And I really don't see any advantage of the separation besides of reducing the number of source files under {{src/test/system}}. 

Also, please keep in mind that this test framework is Hadoop specific so it seems logical to keep them together.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Alex Loddengaard (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772280#action_12772280 ] 

Alex Loddengaard commented on HADOOP-6332:
------------------------------------------

A potential use case for this tool would be to let Hadoop users put their jobs in a "test" and run them nightly on a (pseudo-distributed) cluster.  I believe that having a framework that can use a running cluster or setup/teardown a new cluster will be handy for the mentioned use case.  I also don't think that the setup/teardown stuff should dirty the code too much.

Similarly, Nigel and I spoke a while back about having some sort of web dashboard where users posted version compatibility notes.  Imagine a list of Hadoop users with check boxes next to each user that says "My 0.20.1 jobs worked in 0.20.2."  I think this tool can play a role in the implementation of this idea, and setup/teardown and connecting to an existing cluster both make sense, I think.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Status: Patch Available  (was: Open)

Verification for the patch with {{mvn:install}} support

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Stephen Watt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806045#action_12806045 ] 

Stephen Watt commented on HADOOP-6332:
--------------------------------------

I've written some of the necessary implementation classes to get a rough draft of this framework running. At present, it appears what we have is the ability to define and run the tests on a specific cluster, with some basic stop/start and fault injection features for the cluster management. However, after passing in all the correct values to the ShellProcessManager constructor (the class that identifies the cluster you want to run your unit test on) and attempting to call start() on my concrete implemention of the AbstractMasterSlaveCluster, I get the exception described below. Is anyone else seeing this ? I get this on both OS/x and Linux.

Note: The directory exists and start-all works just fine.

Exception in thread "main" java.io.IOException: Cannot run program "start-all.sh" (in directory "/home/hadoop/hadoop-0.20.1/bin"): error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
	at org.apache.hadoop.util.Shell.run(Shell.java:134)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
	at org.apache.hadoop.test.system.process.ShellProcessManager.execute(ShellProcessManager.java:71)
	at org.apache.hadoop.test.system.process.ShellProcessManager.start(ShellProcessManager.java:62)
	at org.apache.hadoop.test.system.AbstractMasterSlaveCluster.start(AbstractMasterSlaveCluster.java:64)
	at org.apache.hadoop.test.CheckClusterTest.main(CheckClusterTest.java:24)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:53)
	at java.lang.ProcessImpl.start(ProcessImpl.java:91)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772272#action_12772272 ] 

Arun C Murthy commented on HADOOP-6332:
---------------------------------------

Steve - Lots of tests may well work with an already running cluster, but having utilities to setup/teardoan clusters (in a pluggable manner) is well within the scope of this jira, I think. We need to be able to poke the corners of these areas in Hadoop in an automated manner too...

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated HADOOP-6332:
-----------------------------------

    Attachment: 6332.patch

Work in progress patch.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Stephen Watt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867153#action_12867153 ] 

Stephen Watt commented on HADOOP-6332:
--------------------------------------

Hi Cos/Sharad

I noticed this JIRA also got moved from targeting 0.21 to 0.22. Can you elaborate on that decision ? I presume that is why the patches are targeting the trunk.

What is y20 security and Herriot ?







> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787351#action_12787351 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

Looks like I can't stop. Tests shouldn't be written for JUnit v.3. They have to be JUnit v.4 instead: [annotations and all that|http://wiki.apache.org/hadoop/HowToDevelopUnitTests].

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771467#action_12771467 ] 

Steve Loughran commented on HADOOP-6332:
----------------------------------------

There's a number of use cases that a big test framework can handle, and while they shouldn't be interdependent, it would be nice to have tests that work with all


# Bringing up Hadoop clusters by asking IaaS systems for the machines, instantiating the cluster, then testing it to see it works. This is what I do. I normally just run Paolo Castagna's citerank code against the cluster; its a small dataset MR sequence that can take a couple of hours to run through.
# Testing that the latest build works on a pre-allocated physical/virtual cluster. You don't need to ask for the machines, you may need to push out the JARs/RPMs
# Testing that physical cluster works at the speeds to be expected from the #of disks and cores.
# Testing that MR algorithms work and work at scale
# Testing all the corner bits of Hadoop. The code, the web pages, etc.
# Testing the handling of the code (and/or opts team ) to simulated failures
# Exploring the configuration space of the cluster. That is the combination of options of the -site.xml files, and the servers/network on which Hadoop runs. This is surprisingly hard to do thoroughly, and it isn't done at scale right now. For example, I dont think anyone tests to see what happens on a big cluster when you set the replication factor to 10 for a big job, or crank it back to 1. 

It would be good to have a way to test all of this -or at least have the foundation for doing so.

Now, have I left any use cases out?

Like I said, I'd love a skype-based phone conf on the topic, the people who have done stuff in this area can talk about what they've done.


> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769758#action_12769758 ] 

Arun C Murthy commented on HADOOP-6332:
---------------------------------------

Some utility apis to provide a flavour for what we are trying to accomplish:

{noformat}
  /**
   * Sources of logs and outputs.
   */
  public enum LogSource {
    NAMENODE,
    DATANODE,
    JOBTRACKER,
    TASKTRACKER,
    TASK
  }

  /**
   * Setup a Hadoop Cluster.
   * @param conf {@link Configuration} for the cluster
   * @throws IOException
   */
  public static void setupCluster(Configuration conf) throws IOException;
  
  /**
   * Tear down the Hadoop Cluster
   * @param conf {@link Configuration} for the cluster
   * @throws IOException
   */
  public static void tearDownCluster(Configuration conf) throws IOException;

  /**
   * Kill all Hadoop Daemons running on the given rack.
   * @param rackId rack on which all map-reduce daemons should be killed
   * @throws IOException
   * @throws InterruptedException
   */
  public static void killRack(Cluster cluster, String rackId) 
  throws IOException, InterruptedException;

  /**
   * Fetch logs from the hadoop daemon from <code>startTime</code> to 
   * <code>endTime</code> and place them in <code>dst</code>.
   * @param cluster Map-Reduce {@link Cluster}
   * @param daemon hadoop daemon from which to fetch logs
   * @param startTime start time
   * @param endTime end time
   * @param dst destination for storing fetched logs
   * @throws IOException
   */
  public static void fetchDaemonLogs(Cluster cluster, Testable daemon, 
                                     long startTime, long endTime, 
                                     Path dst) 
  throws IOException;

  /**
   * Fetch deamon logs and check if they have the <code>pattern</code>.
   * @param cluster map-reduce <code>Cluster</code>
   * @param source log source
   * @param startTime start time
   * @param endTime end time
   * @param pattern pattern to check
   * @param fetch if <code>true</code> fetch the logs into <code>dir</code>,
   *              else do not fetch
   * @param dir directory to place the fetched logs
   * @return <code>true</code> if the logs contain <code>pattern</code>,
   *         <code>false</code> otherwise
   * @throws IOException
   */
  public static boolean checkDaemonLogs(Cluster cluster, 
                                        LogSource source,
                                        long startTime, long endTime,
                                        String pattern,
                                        boolean fetch, Path dir)
  throws IOException;

{noformat}

----

It's very likely each of these utility methods will turn around and call shell-scripts etc. to actually accomplish the desired functionality... it's convenient to have the person implementing a specific test-case not worry about the details and continue to work in the familiar junit-environment (for hadoop devs).


> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: 6332-phase2.fix2.patch

Using {{$(something)}} screws up our XML processing :( Has to be fixed. 
This patch is on top of 6332-phase2.fix2.patch. Not to commit here for it will be done as a part of forward port patch later.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868943#action_12868943 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

Have spoken with Sharad off-line and his suggestion is to change Maven id for the framework artifacts to {{hadoop-core-system-test}}. I'll open a sub-task for it and do the patch.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867221#action_12867221 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

I made the move from 0.21 to 0.22 to emphasize that the working is getting done on trunk. All patches should be applicable to 0.21 as well. Sorry for the confusion. I have just checked - the patch is applicable for 0.21 and will be committed in both 0.21 and trunk. I'll fix the JIRA's target.

y20 is the Yahoo! internal release of Hadoop 0.20 where the initial work on this framework has been performed. The original framework patches were published against that source code hence the forward port work.

Herriot is the 'working' name of the framework. It came after James Herriot, a veterinarian :) 

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Stephen Watt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772600#action_12772600 ] 

Stephen Watt commented on HADOOP-6332:
--------------------------------------

I support this proposal. At IBM, we're active users of Hadoop, however we run into issues where we need to be able to test Hadoop on other versions of Java required for non-standard architectures. For instance, we'd like to investigate putting Hadoop through its paces on AS/400, z/OS or OS/390. To do that we have to use non-Sun Java distributions (such as IBM Java) as Sun does not provide a JVM for those architectures. This proposal would provide a means that would standardize and streamline how we provide real world testing for these architectures.

At present, I'm using the Terabyte Gen/Sort/Validate jobs as they produce their own data, which greatly simplifies the test scripts, and they are easy to scale up and down.

Lastly, from what I can gather, the framework is likely to be able incorporate existing cluster environments. Thus, if one is executing a M/R test it would run over whatever dfs the cluster is using, be it HDFS, Kosmos or S3. However, I only see an S3 sub-JIRA for this. Is the intent to purely support HDFS ? 

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771303#action_12771303 ] 

Hemanth Yamijala commented on HADOOP-6332:
------------------------------------------

I agree with Konstantin. I think we see a possibility of being able to develop useful automated tests that run on a large cluster with what we already have thereby reducing the start up time - a huge step forward from where we are in Hadoop currently.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771274#action_12771274 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

bq. I'm not sure JUnit is ideal...
We need something to provide a basic test harness functionality such start/stop test/suite and such. JUnit has its ups and downs. The main benefit is that we already have it all around the place. On the other hand there's not much alternatives. TestNG might be a candidate but it has a HUGE disadvantage: it doesn't support per test VM forking. 

Also, I'd prefer to keep a number of tools at minimum, if possible.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Tom White (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794134#action_12794134 ] 

Tom White commented on HADOOP-6332:
-----------------------------------

I would prefer to see a role-based approach in ClusterProcessManager (and other classes) since having explicit master/slave roles makes it difficult to support clusters with a separate namenode and jobtracker, or ZooKeeper (where all nodes are peers).

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Stephen Watt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788297#action_12788297 ] 

Stephen Watt commented on HADOOP-6332:
--------------------------------------

Great discussion. Kos, et al.. do you think we are at point where we can consider starting to write some code ? Another thing we need to do is identify which of the functional tests we are going to port. 12/11 is my last day but I will be back on 1/4. Not sure how much run way we have before 0.21 is due but I'd like to see if we can at least have the framework plus a couple of tests available in time for the release.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

           Status: Resolved  (was: Patch Available)
     Hadoop Flags: [Reviewed]
    Fix Version/s: 0.21.0
                       (was: 0.22.0)
       Resolution: Fixed

All subtasks are completed and I'm resolving this as fixed. HDFS/MR specific parts of the framework are tracked by HDFS-1134 and MAPREDUCE-1774 respectively. 

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.21.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Stephen Watt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780170#action_12780170 ] 

Stephen Watt commented on HADOOP-6332:
--------------------------------------

Steve Loughran has created this WikiPage for our call - http://wiki.apache.org/hadoop/TestingNov2009

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869402#action_12869402 ] 

Doug Cutting commented on HADOOP-6332:
--------------------------------------

> And I really don't see any advantage of the separation

If the long-term intention is still to split HDFS and Mapreduce into separate projects, then we should reduce their interdependencies, i.e. reduce what's in Common rather than add more things into Common.


> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868540#action_12868540 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

One relatively small issue I want to get an input on is about publishing the artifacts of the test framework to the Maven's repo.

I am creating artifacts with {{hadoop-core-system}} id right now. Does it sound like a good choice of the name? Anyone has any comments or a better suggestion?

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Status: Open  (was: Patch Available)

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Status: Open  (was: Patch Available)

Need to rerun the verification

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: HADOOP-6332.0.22.patch

This patch also adds a capability to mvn-install Herriot artifacts locally with id {{hadoop-core-system}}. Now it can be pulled with internal resolver into HDFS and MR subprojects.

Clearly, the Maven deployment will have to be added at some point.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865392#action_12865392 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

The audit warning is about absence of Apache License boiler plate in tests list file. I don't think it is possible to have it there. Besides, similar files in HDFS and MR don't have it. Let's punt on this.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865388#action_12865388 ] 

Hadoop QA commented on HADOOP-6332:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12444028/HADOOP-6332.0.22.patch
  against trunk revision 941662.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 52 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/512/console

This message is automatically generated.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: HADOOP-6332.0.22.patch

Herriot artifacts are being produced as expected. 
Pushing them to maven is needed later on.

This patch is ready to be used as a base for HDFS and MR forward patches of Herriot.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.21.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: HADOOP-6332.0.22.patch

Addressing comments. {{jar-test-system}} is removed from the build. 

Some additional investigation shows that in the current 0.20 implementation Herriot build also ships existing functional tests only. This clearly needs to be fixed for 0.20 and trunk. But for the common's trunk we don't need to target because there's no system tests just for the common component.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Status: Patch Available  (was: Open)

Run verification one more time.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: 6332-phase2.patch

This is the second portion of main Herriot functionality including some of the tests already linked to the JIRA.

This patch isn't for commit to the Apache 0.20 branch, but is the reference material for coming forward port to the trunk (0.22). During the forward port process the tests (about 7 of them or so) from this patch will be taken out and finally replaced with the patches attached to the linked JIRAs.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: HADOOP-6332.0.22.patch

In this version of the path all old functionality of the build works as before.
Herriot artifacts aren't produced yet, but this seems to be pretty minor fix.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.21.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787127#action_12787127 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

Great, thanks for putting it together, Stephen! And you're correct about the deployment: it should be out of the scope of this JIRA.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Stephen Watt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837872#action_12837872 ] 

Stephen Watt commented on HADOOP-6332:
--------------------------------------

@Sharad

Thanks for the patch. Is there a reason why we're now incorporating Aspect Oriented Programming into the test framework ? 

While I can appreciate the features it offers, when one considers the effort involved in getting an AOP runtime setup in an IDE, which is required to get folks writing and contributing test cases to the framework, I'm worried the additional effort / complexity is going to scare off would be contributors.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-6332:
----------------------------------

    Description: 
Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.

----

The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.

There are several pieces we need to achieve this goal:

# A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
# Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.

----

Related note: we should break up our tests into at least 3 categories:
# src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
# src/test/integration -> Current junit tests with Mini* clusters etc.
# src/test/system -> HADOOP-6332 and it's children

  was:
Hadoop would benefit from having a large-scale, automated, test-framework.

This jira is meant to be a master-jira to track relevant details.


> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770016#action_12770016 ] 

Steve Loughran commented on HADOOP-6332:
----------------------------------------

Some initial thoughts.

* Yes, this is good, maybe should have a skype conf or something on the topic, everyone can show what they have already. I know Aaron's done some work.
* I'm not sure JUnit is ideal, because its test reports don't scale up to aggregated tests from different machines, different logs, partial failures. But it is a great way to start tests from the IDE/build too.
* If we could move all tests against a functional cluster into JARs that run against a live cluster, they could be used for some of the system collaboration work, and let people test against different hadoop deployments (physical, VM-with RPMs installed, etc)
* I would split cluster setup/teardown from the tests themselves for that reason, and because the startup and teardown delays are why the normal tests take so long. Tests that rely on a working cluster are different from those that push the cluster through its lifecycle and explore the corner cases of the cluster/hadoop configuration space.


> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869351#action_12869351 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

The main reason this has been done as a part of _test infrastructure_ for Common (Hdfs, MR are coming) is that the framework is non-invasive and doesn't have any footprint in the production code of Hadoop. However, system tests need more functionality than a regular public API provides. To achieve this we had to use AOP. For the very least, compiled aspects have to be provided and then woven into Hadoop's classes. Framework part (aspects and all) might be kept separate from the main code tree. But at any rate this means changes in the build process. And it also will add a lot of complexity to the framework development/maintenance. 



> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869707#action_12869707 ] 

Doug Cutting commented on HADOOP-6332:
--------------------------------------

>  MR and HDFS today share lot of things from common like Configuration etc. Similarly Herriot's common functionality is abstracted and put in common.

Our long-term goal should probably be to diminish Common as a grab-bag of shared bits of code for MR and HDFS.  Rather, it would be better if the shared bits were separate projects or subprojects that are independently useful.  So, if we think Herriot is of use beyond HDFS and MR then perhaps it ought to be a separate project.  Similarly, long-term, RPC and Configuration might eventually become artifacts that other projects can use independently, rather than as a part of Common.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment:     (was: HADOOP-6332.patch)

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806294#action_12806294 ] 

Sharad Agarwal commented on HADOOP-6332:
----------------------------------------

We @Yahoo are working on this. I will post a patch in couple of days after getting it in a reasonable shape.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869361#action_12869361 ] 

Chris Douglas commented on HADOOP-6332:
---------------------------------------

Is it a packaging problem? As the source is available through maven (HADOOP-6635, HDFS-1047, MAPREDUCE-1613), if Hudson published snapshots, would that be sufficient?

That it doesn't affect the production code seems to support the argument that Herriot should be a subproject...

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770017#action_12770017 ] 

Steve Loughran commented on HADOOP-6332:
----------------------------------------

correction, s/Aaron/r/Alex/

Other thing: these entry points effectively become the way to start/stop Hadoop clusters. 

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876312#action_12876312 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

Adding a reference to a scripting framework discussed in past (HADOOP-6248)

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.21.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: HADOOP-6332.patch
                HADOOP-6332-MR.patch

I've split the patch to its Common and Mapreduce parts. It should be easier to maintain now.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated HADOOP-6332:
-----------------------------------

    Attachment: 6332_v1.patch

As mentioned above, the abstraction of cluster setup and teardown is well within the scope of this JIRA. The attached patch tries to address this. Also it provides placeholders for exposing additional APIs from daemon processes, and a client interface to talk to daemons.
About this patch:
* System test framework classes are in org.apache.hadoop.test.system
   ** DaemonClient provide the interface to manage a particular remote daemon process.
   ** DaemonProtocol is a RPC interface for a daemon. Note this needs to be woven in the server side code via aspectj.
   ** MasterSlaveCluster interface provides access to master and slaves client handles.
* Abstraction for remote process management is org.apache.hadoop.test.system.process.ClusterProcessManager. The default implementation being ShellProcessManager which will use hadoop bin scripts to start/stop the daemon. Apart from process management, if later we want to push the tar balls on cluster nodes etc, then this interface can be exploited.
* The implementation for mapreduce is in org.apache.hadoop.mapreduce. (Needs to be done in MAPREDUCE-1154. Putting here for easy reference.)
 ** JTProtocol interface implementation needs to be woven in Jobtracker code. Similary TTProtocol in TaskTracker code.
 ** JTClient and TTClient are the client classes. Note that JTClient composes org.apache.hadoop.mapreduce.Cluster class. For maintainability, the intention is to do minimum weaving and if possible avoid it on client side. The verification utilites which are generic and can be used for all system test cases can be in JTClient/TTClient/MRCluster classes.
 ** Tests will create MRCluster via using MRClusterFactory class. A sample test class is TestCluster. Perhaps we can have a test suite where cluster is setup and teardown once. The tests in a particular suite are expected to be side effects free.

Thoughts ?

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867082#action_12867082 ] 

Sharad Agarwal commented on HADOOP-6332:
----------------------------------------

+1

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: HADOOP-6332.0.22.patch

Addressing audit warning: missed Apache license boiler plate.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778777#action_12778777 ] 

Jeff Hammerbacher commented on HADOOP-6332:
-------------------------------------------

Hey,

Where do we stand on this issue? Should we try to arrange a call soon?

Thanks,
Jeff

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870043#action_12870043 ] 

Doug Cutting commented on HADOOP-6332:
--------------------------------------

> I agree that Common shouldn't be treated as the hadoop.util project, but this seems correct.

Okay, sounds reasonable to me.


> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Tags: herriot

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.21.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869525#action_12869525 ] 

Sharad Agarwal commented on HADOOP-6332:
----------------------------------------

Herriot is the *test* code. Shouldn't test code stay with the project which it is intended for? MR and HDFS today share lot of things from common like Configuration etc. Similarly Herriot's common functionality is abstracted and put in common.

If cluttering of build files and src tree is a concern, it can be a contrib project.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869792#action_12869792 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

bq. Do we expect to add system tests specific to Common?
Hmm, it depends. I'd say {{org/apache/hadoop/fs}} is the good candidate for this sort of tests.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828065#action_12828065 ] 

Sharad Agarwal commented on HADOOP-6332:
----------------------------------------

bq. is the intention that one write their own Test Case as a normal Hadoop Job (such as TeraSort) as a separate activity and then one would get a handle to the MRCluster in the @before method, and then start the test by calling Job.submit() in the @test method and then be able to pass the jobID back to the JTClient to do whatever you needed to with it at that point ?
The intention is that a Test case can submit a job and be able to assert the state of various entities - Job/JT/TT: datastructures, filesystem etc. Also it can potentially control the daemons by simulating a particular failure scenario. Should be clearer once I will post the patch.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869267#action_12869267 ] 

Doug Cutting commented on HADOOP-6332:
--------------------------------------

Should we really be adding this to Common, or might this be better as a new Herriot subproject?

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787346#action_12787346 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

On a more deeper look, i'd suggest not to have hardcoded command names and environment variables. Instead, it'd make sense to have a configuration file which will describe whatever's needed. I can see why hard coded names of the scripts are used for MapReduce, but I'd advocate to avoid such practice wherever possible.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869739#action_12869739 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

Doug, Herriot concept is apparently very useful beyond HDFS/MR projects. However, the concrete implementation is very specific for these components. As Sharad had mentioned above this is Hadoop *test* code. 

At the moment it seems technically possible to separate Common's part of Herriot from Common itself. But this is so until we don't have any system tests specific for Common.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik reassigned HADOOP-6332:
------------------------------------------

    Assignee: Konstantin Boudnik  (was: Sharad Agarwal)

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.21.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867302#action_12867302 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

I'll wait for til tomorrow in case someone has more comments and will commit it.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: HADOOP-6332.patch

This patch adds Herriot sources to the source.jar file; removes a dependency on JUnit v3, and fixes some of JavaDocs issues. Also, a couple of import optimizations are done. 

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Stephen Watt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806438#action_12806438 ] 

Stephen Watt commented on HADOOP-6332:
--------------------------------------

@Sharad/Cos

1) Thanks for the offer to post the patch. FYI, you might want to check the pathing in the new patch, as the existing ones go back one too many directories so it cannot find build.xml. Its not too big of an issue as I am running everything from eclipse at present.

2) Its nice to have the TestCluster sample JUnit Test as a starting point in using the framework. 

3) In the current patch there is a "Cluster" class referenced on line 15 of JTClient, but not implemented anywhere in the patch.

3) Design: At present, we have the framework for cluster management and we have the M/R JobTracker Client. As to how someone would use this to run system tests on Hadoop... is the intention that one write their own Test Case as a normal Hadoop Job (such as TeraSort) as a separate activity and then one would get a handle to the MRCluster in the @before method, and then start the test by calling Job.submit() in the @test method and then be able to pass the jobID back to the JTClient to do whatever you needed to with it at that point ?  


> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790555#action_12790555 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

Sharad, do you think it makes sense to split the patch into Common and MR respective parts? The patch is getting bigger and harder to apply to two different subprojects. We might keep both in here for now just for the convenience sake...

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch, 6332_v2.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik reassigned HADOOP-6332:
------------------------------------------

    Assignee: Sharad Agarwal  (was: Arun C Murthy)

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869423#action_12869423 ] 

Chris Douglas commented on HADOOP-6332:
---------------------------------------

bq. all visible changes in the bulld system will be the same + a lot of stuff from src/test/aop/build/aop.xml will have to be brought into the Common, HDFS, and MR builds anyway.
bq. we'll need to have a source code dependency on Hadoop's subprojects in the framework development time to make sure the aspects are binding right, etc

This is why I'm asking about packaging. Building (and supporting) artifacts for Herriot in Common, HDFS, and MapReduce as part of their normal compile is sub-optimal. What is required to compile the aspects? If source is not required, can the AOP code live in the Herriot project and be compiled against the jars published by maven?

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869827#action_12869827 ] 

Chris Douglas commented on HADOOP-6332:
---------------------------------------

I had a conversation with Cos and learned that I completely misapprehended Herriot's scope. As a subproject, its purpose would be to pull down Hadoop jars and instrument them. While it would be possible to structure it this way, adding a target to produce instrumented jars is far more coherent and maintainable than maintaining a parallel build system. I agree that Common shouldn't be treated as the hadoop.util project, but this seems correct.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: 6332.patch

A tiny inconsistency in the build.xml has been discovered. Fixed.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773896#action_12773896 ] 

Steve Loughran commented on HADOOP-6332:
----------------------------------------

@Arun -pushing out configurations to clusters partially explores the config space, but not very broadly; more leading edge tricks involve machine generation of very different configurations, and/or pseudo-RNG driven configuration option generation

Some videos on this topic

#Skoll: Distributed Continuous QA
http://www.cs.umd.edu/~atif/papers/MemonICSE2004.pdf 
http://video.google.ca/videoplay?docid=8839342624264709864

# How we test -these are tests that run under junit from Ant/IDE, but can then bring up a cluster and run junit underneath. It gets complex
http://www.youtube.com/watch?v=NKshZGUWHJ4

So, while I agree, you do need ways to bring up clusters -indeed, I have some I can demo, I do think it can be best done outside the junit test run itself
# Ant tasks to allocate machines from different IaaS systems -that includes selecting from a list of physical machines you have to hand. 
# whatever we use to explore the configuration space runs very differently from inside a Junit test run, because you want to create clusters with different options, *then run the entire test suite*. What is key is to get the output from that run and merge it with everything else.

Like I said, we should have a phone conf about this before anyone starts coding, I'd like to see what Alex has done and I can show what I have, I'd like to hear from Stephen about how IBM run their tests too. How about everyone who is at apachecon meet up and talk about this, and then next week we can have an online gettogether in some timezone that works for everyone?




> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Stephen Watt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780147#action_12780147 ] 

Stephen Watt commented on HADOOP-6332:
--------------------------------------

I will be hosting this meeting via a skype conference. Please contact me and send me your skype name if you would like to be added to the participant list. The conference will be on 11/23 at 20:00 GMT (2PM CST / 12PM PST).

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Affects Version/s: 0.21.0
                           (was: 0.22.0)

The JIRA should clearly target 0.21 and above. My earlier change was confusing.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771912#action_12771912 ] 

Steve Loughran commented on HADOOP-6332:
----------------------------------------

Thinking about this a bit more, I want to make clear that I dont think we should abandon JUnit as a java language for writing tests in. It's simple, its extensible, it works. I do think that the report output format is limited, but that can be done with a better test runner, one that pulls in output from >1 process at the specific log levels, and such like. That's a feature to add later.

What I do want to do is decouple cluster instantiation from those tests that just need a working cluster -all those whose setup/teardown create MiniMR and MiniDFS clusters. These in-VM clusters are good for debugging and getting all the log output, but unrealistic -single VM, no native code, not started via the shell scripts.

One option is to leave the existing test suites alone, and start some new , hadoop-cluster-test, that 
# Lets people bring up their own clusters how they choose (out of scope). However the cluster comes up, some properties file needs to be set up with the URLs of the filesystem and job tracker
# Contains tests that are written to be run against large, live clusters. The setup code doesn't need to bring up a cluster, it may need to clean up old output
# Possibly: has a shared static dataset for real testing. Size is the issue here, but some things could be generated , driven by pseudo random numbers for replicability.
# Publishes its test code as a JAR + build.xml that can be run against your own cluster
# Somewhere to experiment with better logging, test execution.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865836#action_12865836 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

bq. system-test.xml need not go in common
While I'm mostly agree that {{system-test.xml}} shouldn't be in common (a config file in common shouldn't have any knowledge about upstream dependencies), I am reluctant to split it. The problem with the split as I see it is that both copies of the file in HDFS and MR will mostly contains the same information with some minor differences. However, considering the exposing upstream dependencies is worst I will make the split and post new patch shortly.

bq.  jar-test-system ant target
Thanks for catching this one. Looks like we have the same problem in original implementation and it has been missed. Will fix it.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: HADOOP-6332.0.22.patch

Has submitted wrong file previously. Correcting. The previous comment is valid though.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865837#action_12865837 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

Actually, I'm wrong about having a problem in the original {{jar-test-system}} implementation. Looks like in the trunk the {{jar-test}} is implemented slightly different which causes this effect. Hmm... 

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reassigned HADOOP-6332:
-------------------------------------

    Assignee: Arun C Murthy

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: HADOOP-6332.0.22.patch

Very first draft of forward patch for Common's trunk. It works through all four patches posted earlier for yahoo-0.20. 

Right now build is passing. However, core tests are broken and no Herriot artifacts are being created. Will be fixing these bugs in the next a couple of days.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.21.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

               Status: Patch Available  (was: Open)
    Affects Version/s: 0.22.0
        Fix Version/s: 0.22.0
                           (was: 0.21.0)

Patch seems to be ready for verification.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787420#action_12787420 ] 

Sharad Agarwal commented on HADOOP-6332:
----------------------------------------

Thanks Konstantin for looking at this.
bq. shall we get rid off {[MasterSlaveCluster}} and use AbstractMasterSlaveCluster instead?
Since tests are using MRCluster directly, having MasterSlaveCluster interface is not adding value. When I started, I was not exposing MRCluster to tests. It was a private class in MRClusterFactory. Instead tests were using MasterSlaveCluster interface directly. 
So yes we can get rid of MasterSlaveCluster.

bq. it seems like some of the classes in the proposed patch might benefit from HDFS-326.
Some of the APIs proposed by HDFS-326 and this patch are indeed same. The thought here is that these APIs are injected and only for tests. If some of these APIs (via HDFS-326 or otherwise) are in future considered worthy of having in the production code then we can easily get rid of those from the test injection code and promote in the regular code base.

bq. It was my understanding that the group attending the call felt that deployment (cluster setup/teardown) was not within the scope of the JIRA.
Let me clarify here about the setup and teardown. By cluster setup/teardown I mean cluster start/stop and not deployment. I agree that deployment should not be in the scope of this JIRA. But seems like tests will benefit by having a control on start/stop daemons (for example to test lost/blacklisted TT, tests may want to kill a TT). How and which tar balls are pushed and deployed are not in scope of this because test cases need not bother about it. 
To work with already started cluster, a config flag something like NO_CLUSTER_START can be set which will let test suites skip the cluster start/stop step.
Make sense ?

bq. On a more deeper look, i'd suggest not to have hard coded command names and environment variables.
Perhaps we can have default names set up in the code but can be overridden via setting a property.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Status: Patch Available  (was: Open)

Rechecking the patch once more.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: 6332.patch

This is patch for y20-security which might have conflicts with current 0.20-branch.
We'll be proving a forward port patch for the trunk soon.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: 6332-phase2.fix1.patch

In the secured environment a client should make a privileged RPC call to access a FileSystem instance from an NN. Thus the fix.

This patch has to be applied on top of 6332-phase2.patch. Not for the inclusion here. 

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Status: Patch Available  (was: Open)

Issues found by test-patch are fixed. Resubmitting.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869428#action_12869428 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

bq. What is required to compile the aspects? If source is not required, can the AOP code live in the Herriot project and be compiled against the jars published by maven?
Thanks for reminder - it is totally disappeared from my minds... Either the source code of the 'target' classes is needed for successful weaving or (as you suggesting) we'll have to instrument target jars pulled down from Maven. Which is ... well, suboptimal. 

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Status: Open  (was: Patch Available)

The patch missed a file

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865372#action_12865372 ] 

Hadoop QA commented on HADOOP-6332:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12444019/HADOOP-6332.0.22.patch
  against trunk revision 941662.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 49 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/511/console

This message is automatically generated.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866760#action_12866760 ] 

Hadoop QA commented on HADOOP-6332:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12444343/HADOOP-6332.0.22.patch
  against trunk revision 941662.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 48 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 2 release audit warnings (more than the trunk's current 1 warnings).

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/517/console

This message is automatically generated.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated HADOOP-6332:
-----------------------------------

    Attachment: 6332_v2.patch

Changes from previous patch:
- Added a representative set of observability APIs to DameonProtocol, JTProtocol and TTProtocol.
- Introduced org.apache.hadoop.mapreduce.MRFault enum. The thought is to have the capability for tests to switch on/off a set of faults.
- Added couple of representative verification APIs in JTClient.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch, 6332_v2.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Stephen Watt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787122#action_12787122 ] 

Stephen Watt commented on HADOOP-6332:
--------------------------------------

Here is a wiki link that provides a synopsis of the discussions from the call as well as a proposed solution

http://wiki.apache.org/hadoop/SystemTestingConfCallSynopsis

NB: It was my understanding that the group attending the call felt that deployment (cluster setup/teardown) was not within the scope of the JIRA. The proposed solution involved a testing runtime that could be pointed at a variety of existing clusters, but the deployment of the clusters themselves were a separate concern. 

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869405#action_12869405 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

Right, agree. Makes sense. Are we going to fully isolate MR from Common? I.e. the two won't have even jar dependencies? Cause this is exactly how MR (HDFS) parts of test framework depends on Common part of the framework - via a jar dependency. 

If you suggest to cut this off then we'll have to introduce another one from the test framework's artifact instead. it doesn't appear very natural to me in case of a software system and its embed test framework, but it can done of course.


> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869761#action_12869761 ] 

Doug Cutting commented on HADOOP-6332:
--------------------------------------

> But this is so until we don't have any system tests specific for Common.

Do we expect to add system tests specific to Common?


> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788261#action_12788261 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

Thanks for the answers, Sharad. All of it makes sense. One comment though:

bq. But seems like tests will benefit by having a control on start/stop daemons (for example to test lost/blacklisted TT, tests may want to kill a TT). How and which tar balls are pushed and deployed are not in scope of this because test cases need not bother about it.

Right, actual bits push has to be done somewhere else: Hudson or else.

bq. To work with already started cluster, a config flag something like NO_CLUSTER_START can be set which will let test suites skip the cluster start/stop step.
My thought on this was that the cluster's component restart part should be done in a way consistent with setup/teardown approach of pretty much any test framework like JUnit. If a test needs to start/stop a cluster then it needs to specify {{@Before}} and {{@After}} methods which will do that using provided control primitives (e.g. start_datanode.sh, stop_datanode.sh or whatever).


> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787111#action_12787111 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

Looks good overall. A couple of comments:
- shall we get rid off {[MasterSlaveCluster}} and use {{AbstractMasterSlaveCluster}} instead? It will add more flexibility in the future
- it seems like some of the classes in the proposed patch might benefit from HDFS-326. Shall these two be more synchronized with each other? Otherwise, we might end up with two sets of protocols serving similar purpose, but differently implemented.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6332:
---------------------------------------

    Attachment: HADOOP-6332-MR.patch
                HADOOP-6332.patch

Now with modifications for the build so the system can be compiled from ant environment and all jar can be created, etc.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.21.0
>
>         Attachments: 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865728#action_12865728 ] 

Sharad Agarwal commented on HADOOP-6332:
----------------------------------------

Skimmed the patch. Some minor comments:
- system-test.xml need not go in common. It is not used and required by common code. We can split it into hdfs-system-test.xml and mapred-system-test.xml when working for respective forward ports. 
Also $(YINST_ROOT) must be removed.
- jar-test-system ant target is building jar with unit tests. This should have only system tests. (Right now we don't have any system tests in common. So perhaps we can drop this target for now.)

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6332) Large-scale Automated Test Framework

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867753#action_12867753 ] 

Konstantin Boudnik commented on HADOOP-6332:
--------------------------------------------

I have committed it to the trunk and 0.21 branch. Have ran all tests locally once more. All seems ok.

> Large-scale Automated Test Framework
> ------------------------------------
>
>                 Key: HADOOP-6332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6332
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>    Affects Versions: 0.21.0
>            Reporter: Arun C Murthy
>            Assignee: Konstantin Boudnik
>             Fix For: 0.22.0
>
>         Attachments: 6332-phase2.fix1.patch, 6332-phase2.fix2.patch, 6332-phase2.patch, 6332.patch, 6332.patch, 6332.patch, 6332_v1.patch, 6332_v2.patch, HADOOP-6332-MR.patch, HADOOP-6332-MR.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.0.22.patch, HADOOP-6332.patch, HADOOP-6332.patch
>
>
> Hadoop would benefit from having a large-scale, automated, test-framework. This jira is meant to be a master-jira to track relevant work.
> ----
> The proposal is a junit-based, large-scale test framework which would run against _real_ clusters.
> There are several pieces we need to achieve this goal:
> # A set of utilities we can use in junit-based tests to work with real, large-scale hadoop clusters. E.g. utilities to bring up to deploy, start & stop clusters, bring down tasktrackers, datanodes, entire racks of both etc.
> # Enhanced control-ability and inspect-ability of the various components in the system e.g. daemons such as namenode, jobtracker should expose their data-structures for query/manipulation etc. Tests would be much more relevant if we could for e.g. query for specific states of the jobtracker, scheduler etc. Clearly these apis should _not_ be part of the production clusters - hence the proposal is to use aspectj to weave these new apis to debug-deployments.
> ----
> Related note: we should break up our tests into at least 3 categories:
> # src/test/unit -> Real unit tests using mock objects (e.g. HDFS-669 & MAPREDUCE-1050).
> # src/test/integration -> Current junit tests with Mini* clusters etc.
> # src/test/system -> HADOOP-6332 and it's children

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.