You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Sandeep Baldawa <sa...@gmail.com> on 2011/12/04 23:01:21 UTC

Hadoop for running tests in parallel

Hi,

I read few basic things on Hadoop and was interested to know the opinion of
experts on few things. These questions might be bit vague, feel free to ask
me questions, if the below is not clear.

- Can Hadoop framework be used for running large number of tests(here we
are talking about at least a million a day) on machines in different
clusters in parallel?.
- Do we have any use case of a large organization using Hadoop for testing
purpose?, if so can you please point me to the resources.
- Hadoop looks promising to me from my initial analysis, but I am not so
sure if it would work in a large heterogeneous platform(different types of
clusters, machines, configurations etc) where we are testing in a very
complex environment.

Again I just started looking into Hadoop, just a week back, so sincere
apologies if my questions appear beginners type for this forum.

Best,
Sandeep

Re: Hadoop for running tests in parallel

Posted by Sandeep Baldawa <sa...@gmail.com>.
Thanks a lot Steve for the information. This is very useful.

>From my initial look on Hadoop, it looks very promising in testing do main.

On Tue, Dec 6, 2011 at 9:36 AM, Steve Loughran <st...@apache.org> wrote:

> On 04/12/11 22:01, Sandeep Baldawa wrote:
>
>> Hi,
>>
>> I read few basic things on Hadoop and was interested to know the opinion
>> of
>> experts on few things. These questions might be bit vague, feel free to
>> ask
>> me questions, if the below is not clear.
>>
>> - Can Hadoop framework be used for running large number of tests(here we
>> are talking about at least a million a day) on machines in different
>> clusters in parallel?.
>> - Do we have any use case of a large organization using Hadoop for testing
>> purpose?, if so can you please point me to the resources.
>> - Hadoop looks promising to me from my initial analysis, but I am not so
>> sure if it would work in a large heterogeneous platform(different types of
>> clusters, machines, configurations etc) where we are testing in a very
>> complex environment.
>>
>> Again I just started looking into Hadoop, just a week back, so sincere
>> apologies if my questions appear beginners type for this forum.
>>
>> Best,
>> Sandeep
>>
>>
> I have in the past used hadoop to run junit tests: each record in the map
> should be the classname of a test (or even better, method); the aggregation
> would be collect tests by test id and then show which worked which didn't,
> and which were intermittent
>
> The very nature of a well managed hadoop cluster means that every system
> should be identical; it's not diverse enough to throw up problems. and
> junit tests are pretty short lived.
>
> What would be more useful would be for each record to not only identify
> the test, but also the list parameters to supply to it; these parameters
> could be machine-generated to give better coverage of the n-dimensional
> configuration space without having to explore all points in this space,
> which is obviously quite large
>
> At the very least, I should update my exception and test result records,
> make them serializable in the new APIs and stick them in the source tree
> somewhere, as junit is one way to qualify every node for behaving roughly
> the same
>
> -steve
>
> Also: search for papers on "gridunit"
>

Re: Hadoop for running tests in parallel

Posted by Steve Loughran <st...@apache.org>.
On 04/12/11 22:01, Sandeep Baldawa wrote:
> Hi,
>
> I read few basic things on Hadoop and was interested to know the opinion of
> experts on few things. These questions might be bit vague, feel free to ask
> me questions, if the below is not clear.
>
> - Can Hadoop framework be used for running large number of tests(here we
> are talking about at least a million a day) on machines in different
> clusters in parallel?.
> - Do we have any use case of a large organization using Hadoop for testing
> purpose?, if so can you please point me to the resources.
> - Hadoop looks promising to me from my initial analysis, but I am not so
> sure if it would work in a large heterogeneous platform(different types of
> clusters, machines, configurations etc) where we are testing in a very
> complex environment.
>
> Again I just started looking into Hadoop, just a week back, so sincere
> apologies if my questions appear beginners type for this forum.
>
> Best,
> Sandeep
>

I have in the past used hadoop to run junit tests: each record in the 
map should be the classname of a test (or even better, method); the 
aggregation would be collect tests by test id and then show which worked 
which didn't, and which were intermittent

The very nature of a well managed hadoop cluster means that every system 
should be identical; it's not diverse enough to throw up problems. and 
junit tests are pretty short lived.

What would be more useful would be for each record to not only identify 
the test, but also the list parameters to supply to it; these parameters 
could be machine-generated to give better coverage of the n-dimensional 
configuration space without having to explore all points in this space, 
which is obviously quite large

At the very least, I should update my exception and test result records, 
make them serializable in the new APIs and stick them in the source tree 
somewhere, as junit is one way to qualify every node for behaving 
roughly the same

-steve

Also: search for papers on "gridunit"