You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by jingguo yao <ya...@gmail.com> on 2013/04/05 02:46:22 UTC

What is the output format of org.apache.hadoop.examples.Join?

I am reading the following mail:

http://www.mail-archive.com/core-user@hadoop.apache.org/msg04066.html

After running the following command (I am using Hadoop 1.0.4):

bin/hadoop jar hadoop-examples-1.0.4.jar join \
   -inFormat org.apache.hadoop.mapred.KeyValueTextInputFormat \
   -outKey org.apache.hadoop.io.Text \
   -joinOp outer \
   join/a.txt join/b.txt join/c.txt joinout


Then I run "bin/hadoop fs -text joinout/part-00000". I see the following
result:

AAAAAAAA        a0      [,,]
AAAAAAAA        b0      [,,]
AAAAAAAA        c0      [,,]
BBBBBBBB        a1      [,,]
BBBBBBBB        b1      [,,]
BBBBBBBB        b2      [,,]
BBBBBBBB        b3      [,,]
BBBBBBBB        c1      [,,]
CCCCCCCC        a2      [,,]
CCCCCCCC        a3      [,,]
DDDDDDDD        c2      [,,]
DDDDDDDD        c3      [,,]

But Chris said that the result should be:

AAAAAAAA        [a0,b0,c0]
BBBBBBBB        [a1,b1,c1]
BBBBBBBB        [a1,b2,c1]
BBBBBBBB        [a1,b3,c1]
CCCCCCCC        [a2,,]
CCCCCCCC        [a3,,]
DDDDDDDD        [,,c2]
DDDDDDDD        [,,c3]

Is Join's output format changed for Hadoop 1.0.4?

-- 
Jingguo