You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Jakub Stransky <st...@gmail.com> on 2014/09/11 18:35:41 UTC

task slowness

Hello experienced hadoop users,

I am having a data pipeline consisting of two java MR jobs coordinated by
oozie scheduler. Both of them process the same data but the first one is
more than 10 times slower than second one. Job counters on RM page are not
much helpful in that matter. I have verified from our monitoring system
that there were no constraints on hw like IO, CPU, network etc.
Specifically it was using just a fraction of allowed resources designated
to given container.

Is there a way to get some profiling statistics out of hadoop cluster task?
What are the best available tools, required settings etc.

I have read a Hadoop definitive guide - job tunning but not sure that those
settings are still valid for hadoop 2.2.0.

Could someone refer to some good resource where to look for informatio e.g.
blog, manual, book etc.. I am a bit confused what refers to hadoop 1 and
what's are the settings for hadoop 2 mr 2.

Dataset size is around 500MB compressed, and it is map only task

Thanks for any experience shared
Jakub

--