You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Bejoy Ks <be...@gmail.com> on 2011/11/15 09:50:09 UTC

Re: Performance test practices for hadoop jobs - capturing metrics

Including hadoop common user group as well in loop.

On Tue, Nov 15, 2011 at 1:01 PM, Bejoy Ks <be...@gmail.com> wrote:

> Hi Experts
>
>         I'm currently working out to incorporate a performance test plan
> for a series of hadoop jobs.My entire application consists of map reduce,
> hive and flume jobs chained one after another and I need to do some
> rigorous performance  testing to ensure that it would never break under
> most circumstances. I’m planning to test individually each of the
> components as well as end to end tests and some overlapping tests across
> the components. The tests would be like
> ·         Regression test
> ·         Stress/Load test
> ·         Simultaneous run tests
>
> And for all these tests I'm planning to capture metrics from two sources
> ·         metrics related to map reduce job from the Job Tracker web UI
> ·         metrics related to IO,Memory and CPU usage from Ganglia
>
> The tests are triggred using simple shell scripts which is absolutely
> fine. And capturing metrics from Job Tracker and Ganglia for individual
> jobs are also fine. But the challenge comes when we are capturing metrics
> for regression test. In regression tests we’d be running a particular job
> (say hive job) continuously for 24 hours looped in a shell script and for
> my test data set that ranges a few gigs it would be kind of running nearly
> 130 times. It looks not a great solution to capture the metrics manually
> for all these 130 runs. I have a few queries around this like
>
> Is there any automated tool that would help us in capturing these metrics?
> Also is there any best practice to be followed on performance testing?
> Does anyone have any metrics sheet that defines what all details are to be
> captured during performance tests?
>
> It’d be great if you all could share your experiences with performance
> testing and the practices you follow for your hadoop projects. Also the dos
> and dont’s
>
> Awaiting all your valuable responses.
>
> Thanks a lot
>
> Bejoy.K.S