You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Ariel Valentin <ar...@arielvalentin.com> on 2014/10/29 22:16:15 UTC

Testing Map Reduce Jobs

I am looking for some guidance that will help me write better tests for our
map reduce jobs. My current jobs are tested using MRUnit, which covers most
of the "logic" but I feel like I am missing good "end-to-end" developer
tests.

I took a look at the tests for mapred classes but I am not sure that it
achieves my goal of an end-to-end test because of the use of MockInstance.
https://github.com/apache/accumulo/blob/master/mapreduce/src/test/java/org/apache/accumulo/core/client/mapred{,reduce}/

For me the characteristic of a end-to-end test that I would find valuable
is a suite that one could execute using mini-{accumulo,yarn,et.al.} but I
don't see any examples of how one would go about making those components
work in concert with each other.

Does anyone have any guidance when it comes to writing automated developer
end-to-end tests?

What kinds of testing strategies are people out there using for MR jobs?

Thanks,
Ariel Valentin
e-mail: ariel@arielvalentin.com
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: http://www.linkedin.com/profile/view?id=8996534
---------------------------------------
*simplicity *communication
*feedback *courage *respect

Re: Testing Map Reduce Jobs

Posted by Ariel Valentin <ar...@arielvalentin.com>.
Thanks Mike. I'll give that a try.

Thanks,
Ariel
---
Sent from my mobile device. Please excuse any errors.

> On Oct 29, 2014, at 6:27 PM, Mike Drob <ma...@cloudera.com> wrote:
> 
> If you launch a MapReduce job in your test code, without having a cluster present, then it will default into a local runner. This could be easy if you invoke your job through something like ToolRunner.run(). Or just build a job and invoke it directly. So you don't really need a mini-mr or yarn cluster for this, 90% of the time.
> 
> As far as integrating that with a mini-accumulo... if you start a MiniAccumuloCluster manually and keep a reference to it (which you should anyway because you will need to stop it eventually) then you can use that to populate your AccumuloInputFormat configuration (assuming you are using it).
> 
> Something like...
> 
>   @Before
>   public void setUp() throws Exception {
>     // Start the Accumulo Cluster
>     mac = new MiniAccumuloCluster(root.newFolder(), ACCUMULO_PASS);
>     mac.start();
> 
>     // Get first connection to create user
>     mac.getConnector(ACCUMULO_USER, ACCUMULO_PASS);
>   }
> 
>   @Test
>   public void setUp() throws Exception {
>     AccumuloInputFormat.setZooKeeperInstance(job, ClientConfiguration.loadDefault().withZkHosts(mac.getZooKeepers()).withInstance(mac.getInstanceName()));
>     // .. and other settings
> 
>     boolean success = job.waitForCompletion(false);
>     assertTrue("Job failed!", success);
>   }
> 
>   @After
>   public void tearDown() throws Exception {
>     mac.stop();
>   }
> 
> Not sure if this is helpful, but hopefully is enough to point you in the right direction. If you have more questions, please clarify.
> 
> 
> 
>> On Wed, Oct 29, 2014 at 4:16 PM, Ariel Valentin <ar...@arielvalentin.com> wrote:
>> I am looking for some guidance that will help me write better tests for our map reduce jobs. My current jobs are tested using MRUnit, which covers most of the "logic" but I feel like I am missing good "end-to-end" developer tests. 
>> 
>> I took a look at the tests for mapred classes but I am not sure that it achieves my goal of an end-to-end test because of the use of MockInstance.
>> https://github.com/apache/accumulo/blob/master/mapreduce/src/test/java/org/apache/accumulo/core/client/mapred{,reduce}/
>> 
>> For me the characteristic of a end-to-end test that I would find valuable is a suite that one could execute using mini-{accumulo,yarn,et.al.} but I don't see any examples of how one would go about making those components work in concert with each other. 
>> 
>> Does anyone have any guidance when it comes to writing automated developer end-to-end tests? 
>> 
>> What kinds of testing strategies are people out there using for MR jobs?
>> 
>> Thanks,
>> Ariel Valentin
>> e-mail: ariel@arielvalentin.com
>> website: http://blog.arielvalentin.com
>> skype: ariel.s.valentin
>> twitter: arielvalentin
>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>> ---------------------------------------
>> *simplicity *communication
>> *feedback *courage *respect
> 

Re: Testing Map Reduce Jobs

Posted by Mike Drob <ma...@cloudera.com>.
If you launch a MapReduce job in your test code, without having a cluster
present, then it will default into a local runner. This could be easy if
you invoke your job through something like ToolRunner.run(). Or just build
a job and invoke it directly. So you don't really need a mini-mr or yarn
cluster for this, 90% of the time.

As far as integrating that with a mini-accumulo... if you start a
MiniAccumuloCluster manually and keep a reference to it (which you should
anyway because you will need to stop it eventually) then you can use that
to populate your AccumuloInputFormat configuration (assuming you are using
it).

Something like...

  @Before
  public void setUp() throws Exception {
    // Start the Accumulo Cluster
    mac = new MiniAccumuloCluster(root.newFolder(), ACCUMULO_PASS);
    mac.start();

    // Get first connection to create user
    mac.getConnector(ACCUMULO_USER, ACCUMULO_PASS);
  }

  @Test
  public void setUp() throws Exception {
    AccumuloInputFormat.setZooKeeperInstance(job,
ClientConfiguration.loadDefault().withZkHosts(mac.getZooKeepers()).withInstance(mac.getInstanceName()));
    // .. and other settings

    boolean success = job.waitForCompletion(false);
    assertTrue("Job failed!", success);
  }

  @After
  public void tearDown() throws Exception {
    mac.stop();
  }

Not sure if this is helpful, but hopefully is enough to point you in the
right direction. If you have more questions, please clarify.



On Wed, Oct 29, 2014 at 4:16 PM, Ariel Valentin <ar...@arielvalentin.com>
wrote:

> I am looking for some guidance that will help me write better tests for
> our map reduce jobs. My current jobs are tested using MRUnit, which covers
> most of the "logic" but I feel like I am missing good "end-to-end"
> developer tests.
>
> I took a look at the tests for mapred classes but I am not sure that it
> achieves my goal of an end-to-end test because of the use of MockInstance.
>
> https://github.com/apache/accumulo/blob/master/mapreduce/src/test/java/org/apache/accumulo/core/client/mapred{,reduce}/
>
> For me the characteristic of a end-to-end test that I would find valuable
> is a suite that one could execute using mini-{accumulo,yarn,et.al.} but I
> don't see any examples of how one would go about making those components
> work in concert with each other.
>
> Does anyone have any guidance when it comes to writing automated developer
> end-to-end tests?
>
> What kinds of testing strategies are people out there using for MR jobs?
>
> Thanks,
> Ariel Valentin
> e-mail: ariel@arielvalentin.com
> website: http://blog.arielvalentin.com
> skype: ariel.s.valentin
> twitter: arielvalentin
> linkedin: http://www.linkedin.com/profile/view?id=8996534
> ---------------------------------------
> *simplicity *communication
> *feedback *courage *respect
>