You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2009/04/03 13:24:12 UTC

[jira] Created: (HADOOP-5621) MapReducer to run junit tests under Hadoop

MapReducer to run junit tests under Hadoop
------------------------------------------

Key: HADOOP-5621
URL: https://issues.apache.org/jira/browse/HADOOP-5621
Project: Hadoop Core
Issue Type: New Feature
Affects Versions: 0.21.0
Reporter: Steve Loughran

This is something I mentioned to some people last week, thought I would start a discussion on it.

We could run junit tests as a MapReduce job with
# a mapper that takes a list of classes, one per line
# extracts the test suite from each class, and then invokes each test method. This would be a new junit test runner.
# saves the result (and any exceptions) as the output. Also saves any machine specific details.
# It also needs to grab the System.out and System.err channels, to map them to specific tests.
# Measure how long the tests took (incuding setup/teardown time)
# Add an ant task <listresources> to take filesets and other patterns, and generate text files from the contents (with stripping of prefixes and suffices, directory separator substition, file begin/end values, etc, etc). I have this with tests already.

The result would be that you could point listresources at a directory tree and create a text file listing all tests to run. These could be executed across multiple hosts and the results correlated. It would be, initially, a MapExpand, as the output would be bigger than the input

Feature creep then becomes the analysis

# Add another MR class which runs through all failing tests and creates a new list of test classes that failed. This could be rescheduled on different runs, and makes for a faster cycle (only run failing tests until they work)
# Add something to only get failing tests, summarise them (somehow) in a user readable form
# Something to get partially failing tests and highlight machine differences.
# Add something to compare tests over time, detect those which are getting slower?
# an MR to regenerate the classic Ant junit XML reports, for presentation in other tools (like hudson)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5621) MapReducer to run junit tests under Hadoop

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695580#action_12695580 ] 

Steve Loughran commented on HADOOP-5621:
----------------------------------------

HADOOP-1527 is running Ant's <junit> task as the job, runs the tests in a different VM, generating, text output which it then tries to parse in. It will need ant.jar, ant-optional.jar and junit on the path.

This one does it in VM -which has a price -less robust against JVM crashes, and there isn't any timeout enforcement logic there yet. Where the big difference is is that the output comes out serialized, one test result is emitted for every method that passes/fails, with the exceptions and stack trace included. This makes it a more powerful format for analysis.

The brittleness/timeout is a problem, but then its a problem in <junit> too. What could be done would be to exec() a child process for doing the work, with it throwing back the serialized test result, with the MR host process killing things if they took too long -but still saving everything that got forwarded out. That would be a trickier undertaking, but possible. 

Also: I pass down a configuration to the test cases if they have the interface to receive it. Right now, that is the same config that the MR job gets, but you could imagine more configuration-driven work where every map line includes not just the test method but some config information, so someone could do more configuration driven work. Of course, if you want to do that then the test reports had better include the entire configuration, for better examination of what triggered the failures.

It's currently built against junit3.8.2; moving to junit4 is worthwhile, but I'd have to use a slightly different test runner

What '1257 has is an ant task that looks more like a drop in replacement for <junit>; a specific launcher for the job. This proposals workflow would start off generating a list of tests to run then pushing that out as part of a blocking job submission. 

> MapReducer to run junit tests under Hadoop
> ------------------------------------------
>
>                 Key: HADOOP-5621
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5621
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> This is something I mentioned to some people last week, thought I would start a discussion on it.
> We could run junit tests as a MapReduce job with
> # a mapper that takes a list of classes, one per line
> # extracts the test suite from each class, and then invokes each test method. This would be a new junit test runner.
> # saves the result (and any exceptions) as the output. Also saves any machine specific details. 
> # It also needs to grab the System.out and System.err channels, to map them to specific tests.
> # Measure how long the tests took (incuding setup/teardown time)
> # Add an ant task <listresources> to take filesets and other patterns, and generate text files from the contents (with stripping of prefixes and suffices, directory separator substition, file begin/end values, etc, etc). I have this with tests already.
> The result would be that you could point listresources at a directory tree and create a text file listing all tests to run. These could be executed across multiple hosts and the results correlated. It would be, initially, a MapExpand, as the output would be bigger than the input
> Feature creep then becomes the analysis
> # Add another MR class which runs through all failing tests and creates a new list of test classes that failed. This could be rescheduled on different runs, and makes for a faster cycle (only run failing tests until they work)
> # Add something to only get failing tests, summarise them (somehow) in a user readable form
> # Something to get partially failing tests and highlight machine differences. 
> # Add something to compare tests over time, detect those which are getting slower?
> # an MR to regenerate the classic Ant junit XML reports, for presentation in other tools (like hudson)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-5621) MapReducer to run junit tests under Hadoop

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran reassigned HADOOP-5621:
--------------------------------------

    Assignee: Steve Loughran

> MapReducer to run junit tests under Hadoop
> ------------------------------------------
>
>                 Key: HADOOP-5621
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5621
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> This is something I mentioned to some people last week, thought I would start a discussion on it.
> We could run junit tests as a MapReduce job with
> # a mapper that takes a list of classes, one per line
> # extracts the test suite from each class, and then invokes each test method. This would be a new junit test runner.
> # saves the result (and any exceptions) as the output. Also saves any machine specific details. 
> # It also needs to grab the System.out and System.err channels, to map them to specific tests.
> # Measure how long the tests took (incuding setup/teardown time)
> # Add an ant task <listresources> to take filesets and other patterns, and generate text files from the contents (with stripping of prefixes and suffices, directory separator substition, file begin/end values, etc, etc). I have this with tests already.
> The result would be that you could point listresources at a directory tree and create a text file listing all tests to run. These could be executed across multiple hosts and the results correlated. It would be, initially, a MapExpand, as the output would be bigger than the input
> Feature creep then becomes the analysis
> # Add another MR class which runs through all failing tests and creates a new list of test classes that failed. This could be rescheduled on different runs, and makes for a faster cycle (only run failing tests until they work)
> # Add something to only get failing tests, summarise them (somehow) in a user readable form
> # Something to get partially failing tests and highlight machine differences. 
> # Add something to compare tests over time, detect those which are getting slower?
> # an MR to regenerate the classic Ant junit XML reports, for presentation in other tools (like hudson)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5621) MapReducer to run junit tests under Hadoop

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695354#action_12695354 ] 

Steve Loughran commented on HADOOP-5621:
----------------------------------------


I should add that my prototype work is currently in a different SCM repo, still LGPL, but moving to Apache licensed shortly. 

http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components/hadoop/src/org/smartfrog/services/hadoop/junitmr/

One thing I also do in this code is pass down a configuration to every test case. This will give the hadoop-only tests more information. It would make them only work in this test runner though. 

{code}
if (testSuite instanceof JUnitHadoopContext) {
    JUnitHadoopContext ctx = (JUnitHadoopContext) testSuite;
    ctx.setConfiguration(context.getConfiguration());
}
{code}

The listresources task, to create a text file listing all tests in a source tree, is elsewhere
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/ant/src/org/smartfrog/tools/ant/

this stuff is just a start
# need to capture the IO streams
# need the post-processing
# the test runner probably needs to be more robust
# I'd like to add a SkippedException so that when a test case chooses not to run, it throws that and the outcome is logged differently.




> MapReducer to run junit tests under Hadoop
> ------------------------------------------
>
>                 Key: HADOOP-5621
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5621
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> This is something I mentioned to some people last week, thought I would start a discussion on it.
> We could run junit tests as a MapReduce job with
> # a mapper that takes a list of classes, one per line
> # extracts the test suite from each class, and then invokes each test method. This would be a new junit test runner.
> # saves the result (and any exceptions) as the output. Also saves any machine specific details. 
> # It also needs to grab the System.out and System.err channels, to map them to specific tests.
> # Measure how long the tests took (incuding setup/teardown time)
> # Add an ant task <listresources> to take filesets and other patterns, and generate text files from the contents (with stripping of prefixes and suffices, directory separator substition, file begin/end values, etc, etc). I have this with tests already.
> The result would be that you could point listresources at a directory tree and create a text file listing all tests to run. These could be executed across multiple hosts and the results correlated. It would be, initially, a MapExpand, as the output would be bigger than the input
> Feature creep then becomes the analysis
> # Add another MR class which runs through all failing tests and creates a new list of test classes that failed. This could be rescheduled on different runs, and makes for a faster cycle (only run failing tests until they work)
> # Add something to only get failing tests, summarise them (somehow) in a user readable form
> # Something to get partially failing tests and highlight machine differences. 
> # Add something to compare tests over time, detect those which are getting slower?
> # an MR to regenerate the classic Ant junit XML reports, for presentation in other tools (like hudson)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5621) MapReducer to run junit tests under Hadoop

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695510#action_12695510 ] 

Nigel Daley commented on HADOOP-5621:
-------------------------------------

Steve, how's this different than HADOOP-1257?

> MapReducer to run junit tests under Hadoop
> ------------------------------------------
>
>                 Key: HADOOP-5621
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5621
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> This is something I mentioned to some people last week, thought I would start a discussion on it.
> We could run junit tests as a MapReduce job with
> # a mapper that takes a list of classes, one per line
> # extracts the test suite from each class, and then invokes each test method. This would be a new junit test runner.
> # saves the result (and any exceptions) as the output. Also saves any machine specific details. 
> # It also needs to grab the System.out and System.err channels, to map them to specific tests.
> # Measure how long the tests took (incuding setup/teardown time)
> # Add an ant task <listresources> to take filesets and other patterns, and generate text files from the contents (with stripping of prefixes and suffices, directory separator substition, file begin/end values, etc, etc). I have this with tests already.
> The result would be that you could point listresources at a directory tree and create a text file listing all tests to run. These could be executed across multiple hosts and the results correlated. It would be, initially, a MapExpand, as the output would be bigger than the input
> Feature creep then becomes the analysis
> # Add another MR class which runs through all failing tests and creates a new list of test classes that failed. This could be rescheduled on different runs, and makes for a faster cycle (only run failing tests until they work)
> # Add something to only get failing tests, summarise them (somehow) in a user readable form
> # Something to get partially failing tests and highlight machine differences. 
> # Add something to compare tests over time, detect those which are getting slower?
> # an MR to regenerate the classic Ant junit XML reports, for presentation in other tools (like hudson)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.