You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Yongzhi Wang <wa...@gmail.com> on 2012/03/07 17:18:30 UTC

How can I track the actual Map and Reduce tasks executed in the pig?

Hi, There

I tried to use the syntax "explain", but the MapReduce plan sometime
confused me.

I tried such syntax below:

*my_raw = LOAD './houred-small' USING PigStorage('\t') AS (user,hour,
query);
part1 = filter my_raw by hour>11;
part2 = filter my_raw by hour<13;
result = cogroup part1 by hour, part2 by hour;
dump result;
explain result;*

The job stats shows as blow, indicating there are 2 Map tasks and 1 reduce
tasks. But I don't know how does the Map task is mapping to the MapReduce
plan shown below. It seems each Map task just do one filter and rearrange,
but on which phase the union operation is done? the shuffle phase? If in
that case, two Map tasks actually done different filter work. Is that
possible? Or my guess is wrong?

So, back to the question: *Is there any way that I can see the actual map
and reduce task executed in the pig?*

*Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime
MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
job_201203021230_0038   2       1       3       3       3       12
12     1    2       my_raw,part1,part2,result       COGROUP
hdfs://master:54310/tmp/temp6260
37557/tmp-1661404166,
*

The mapreduce plan shows as below:*
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-84
Map Plan
Union[tuple] - scope-85
|
|---result: Local Rearrange[tuple]{bytearray}(false) - scope-73
|   |   |
|   |   Project[bytearray][1] - scope-74
|   |
|   |---part1: Filter[bag] - scope-59
|       |   |
|       |   Greater Than[boolean] - scope-63
|       |   |
|       |   |---Cast[int] - scope-61
|       |   |   |
|       |   |   |---Project[bytearray][1] - scope-60
|       |   |
|       |   |---Constant(11) - scope-62
|       |
|       |---my_raw: New For Each(false,false,false)[bag] - scope-89
|           |   |
|           |   Project[bytearray][0] - scope-86
|           |   |
|           |   Project[bytearray][1] - scope-87
|           |   |
|           |   Project[bytearray][2] - scope-88
|           |
|           |---my_raw:
Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
scope-90
|
|---result: Local Rearrange[tuple]{bytearray}(false) - scope-75
    |   |
    |   Project[bytearray][1] - scope-76
    |
    |---part2: Filter[bag] - scope-66
        |   |
        |   Less Than[boolean] - scope-70
        |   |
        |   |---Cast[int] - scope-68
        |   |   |
        |   |   |---Project[bytearray][1] - scope-67
        |   |
        |   |---Constant(13) - scope-69
        |
        |---my_raw: New For Each(false,false,false)[bag] - scope-94
            |   |
            |   Project[bytearray][0] - scope-91
            |   |
            |   Project[bytearray][1] - scope-92
            |   |
            |   Project[bytearray][2] - scope-93
            |
            |---my_raw:
Load(hdfs://master:54310/user/root/houred-small:PigStorage('    ')) -
scope-95--------
Reduce Plan
result: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-77
|
|---result: Package[tuple]{bytearray} - scope-72--------
Global sort: false
----------------*

Thanks!