You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by "Rusia, Devansh" <dr...@paypal.com> on 2013/03/20 11:11:08 UTC

TupleWritable value in mapper Not getting cleaned up ( using CompositeInputFormat )

Hi,

I am trying to do an outer join on to input files.

But while joining the TupleWritable value in the mapper is not getting cleaned up and so is using the previous values of a different key.

The code I used is : (  'plist' is containing the set of paths to be taken as input )

jobConf.setInputFormat(CompositeInputFormat.class);
jobConf.set("mapred.join.expr", CompositeInputFormat.compose(op, inputFormatClass,plist.toArray(new Path[0])));
jobConf.setOutputFormat(outputFormatClass);

inp1:

anil1     10
anil2     20
anil3     30
dev1     40
dev2     50

inp2:

anil1     100
dev1     400
dev2     500
dev3     600


outer join output:

anil1     10,100
anil2     20,100
anil3     30,100
dev1     40,400
dev2     50,500
dev3     50,600

Actually It should be, right?

anil1     10,100
anil2     20
anil3     30
dev1     40,400
dev2     50,500
dev3     600

Regards,
Devansh Rusia