You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Ranjan Banerjee <rb...@wisc.edu> on 2012/04/02 16:19:57 UTC

Help needed regarding collection of statistics in map reduce tak phases

Hello everyone,
    I am interested in collecting statistics (mainly amount of time used) from Map Reduce task phases like split, read,spill,aggregate etc in both the map and reduce tasks. I was told to use hive or pig as they are good tools for statistical analysis. I installed hive and am able to query which translates to map reduce jobs in the underlying framework. I however am not sure how to get these statistical data from the map reduce task phases using hive. Can someone please give any hints, like setting a parameter to see the memory usage or time spent in each of these phases. Any help would be appreciated.

Thanking you

Yours faithfully
Ranjan Banerjee