You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mark Church <ma...@gmail.com> on 2010/05/01 05:54:59 UTC

Differing behavior between local and mapreduce modes

Hello all,

I've reached an impasse in my attempts to learn Pig Latin.  When
running my script in local mode I get the results I expect.  However,
when I the same script in mapreduce mode the resulting output is
different.  In mapreduce mode it appears that pig only saves the first
field of the tuple during the join.

To illustrate here is a condensed version of my attachment
demonstrating the issue:

mchurch@beaker:~$ cd /data/Source/pig-issue/
mchurch@beaker:/data/Source/pig-issue$ export
JAVA_HOME=/data/Applications/jdk1.6.0_07/
mchurch@beaker:/data/Source/pig-issue$ cat test
0    200    /index.html
0    200    /index.html
0    200    /error.jsp
0    500    /error.jsp
0    500    /error.jsp
1    500    /index.html
1    200    /index.html
1    200    /error.jsp
1    200    /error.jsp

mchurch@beaker:/data/Source/pig-issue$ cat issue.pig
A = load 'test' AS (time:int, responseCode:int, url:chararray);
DUMP A;

B = FILTER A BY responseCode >= 500 and responseCode < 600;
DUMP B;

C = FOREACH ( GROUP A BY (time, url) ) GENERATE group, (int)COUNT($1)
as count:int;
DUMP C;

D = FOREACH ( GROUP B BY (time, url) ) GENERATE group, (int)COUNT($1)
as count:int;
DUMP D;

E = JOIN C BY group FULL, D BY group;

DUMP E;

mchurch@beaker:/data/Source/pig-issue$
/data/Applications/pig-0.6.0/bin/pig -x local issue.pig

(0,200,/index.html)
(0,200,/index.html)
(0,200,/error.jsp)
(0,500,/error.jsp)
(0,500,/error.jsp)
(1,500,/index.html)
(1,200,/index.html)
(1,200,/error.jsp)
(1,200,/error.jsp)
(,,)

(0,500,/error.jsp)
(0,500,/error.jsp)
(1,500,/index.html)

((,),0)
((0,/error.jsp),3)
((0,/index.html),2)
((1,/error.jsp),2)
((1,/index.html),2)

((0,/error.jsp),2)
((1,/index.html),1)

((,),0,,)
((0,/error.jsp),3,(0,/error.jsp),2)
((0,/index.html),2,,)
((1,/error.jsp),2,,)
((1,/index.html),2,(1,/index.html),1)

mchurch@beaker:/data/Source/pig-issue$
/data/Applications/pig-0.6.0/bin/pig -x mapreduce issue.pig

(0,200,/index.html)
(0,200,/index.html)
(0,200,/error.jsp)
(0,500,/error.jsp)
(0,500,/error.jsp)
(1,500,/index.html)
(1,200,/index.html)
(1,200,/error.jsp)
(1,200,/error.jsp)
(,,)

(0,500,/error.jsp)
(0,500,/error.jsp)
(1,500,/index.html)

((,),0)
((0,/error.jsp),3)
((0,/index.html),2)
((1,/error.jsp),2)
((1,/index.html),2)
((0,/error.jsp),2)
((1,/index.html),1)

(,0,,)
(0,3,0,2)
(0,2,,)
(1,2,,)
(1,2,1,1)

Any help would be appreciated.

Mark