You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Arjun Bakshi <ba...@mail.uc.edu> on 2014/08/11 17:39:02 UTC

Examining effect of changing block placement policy.

Hi,

I've made some changed to the default block placement policy and want to 
see how if affects a cluster. Any suggestions on how I can test the 
before and after of a cluster after making these changes?

I read up a bit on Rumen and GridMix in my search for tools that would 
help me benchmark things on a cluster. As far as I know, I need some job 
traces to get the ball rolling. I've googled for sample job traces but 
didn't find anything. I found this page: 
http://ftp.pdl.cmu.edu/pub/datasets/hla/dataset.html but I'm not sure 
how to use the data there.

I don't have a ton of data, or a bunch of queries I could run on it. My 
best idea till now is to run a bunch of sorts on different input sizes, 
and word counts on different combination of files, all while following 
an exponential inter-job arrival time. I'm planning to do this on AWS's 
EC2's free tire.

Any suggestions on how to observe the effects of changing the policy 
would be appreciated.

Thank you,

Arjun