You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Arpit Wanchoo <Ar...@guavus.com> on 2012/05/28 06:38:30 UTC

MapReduce combiner issue : EOFException while reading Value

Hi

I have been trying to setup a map reduce job with hadoop 0.20.203.1.

Scenario :
My mapper is writing key value pairs where I have total 13 types of keys and corresponding value classes.
For each input record I write all these i.e 13 key-val pair to context.

Also for one specific key (say K1) I want its mapper output to go in one file and for all other keys to rest of files.
For doing this ,this I have define my partitioner as :
public int getPartition(DimensionSet key, MeasureSet value, int numPartitions) {

if(numPartitions < 2){
int x= (key.hashCode() & Integer.MAX_VALUE) % numPartitions;
return x;
}
int cubeId = key.getCubeId();
if (cubeId == CubeName.AT_COutgoing.ordinal()) {
return 0;
} else {

int x=((key.hashCode() & Integer.MAX_VALUE) % (numPartitions-1)) + 1;
return x;

}
}
My combiner and reducer are doing the same thing.

Issue :
My job is running fine when I don't use a combiner.
But when I run with combiner , I am getting EOFException.

java.io.EOFException
        at java.io.DataInputStream.readUnsignedShort(Unknown Source)
        at java.io.DataInputStream.readUTF(Unknown Source)
        at java.io.DataInputStream.readUTF(Unknown Source)
        at com.guavus.mapred.common.collection.ValueCollection.readFieldsLong(ValueCollection.java:40)
        at com.guavus.mapred.common.collection.ValueCollection.readFields(ValueCollection.java:21)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
        at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
        at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1420)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:852)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1343)


My Finding :
On checking and debugging what I got was that for the particular key-val pair (K1, which I want to write to reduce number 0), the combiner reads the key successfully but while trying to read the values it gives EOFException because it doesn't find anything in DataInput stream. Also this is occurring when data is large and combiner runs more than once.
I have noticed that the combiner is failing to get the value for this key when running for the 2nd time . (I read somewhere that combiner begins when the some amount of data has been written by mapper even though mapper is still writing data to context).
Actually the issue occured with any key which was defined in partitioner  to get partition 0 for writing.

I verified many times that my mapper is writing no null value. The issue looks really strange because combiner is able to read the key but doesn't get any value in data stream.

Please suggest what could be the root cause for this or what can I do to track the root cause.



Regards,
Arpit Wanchoo

RE: MapReduce combiner issue : EOFException while reading Value

Posted by Devaraj k <de...@huawei.com>.

Can you check ValueCollection.write(DataOutput) method is writing properly whatever you are expecting in readFields() method.

Thanks
Devaraj

________________________________________
From: Arpit Wanchoo [Arpit.Wanchoo@guavus.com]
Sent: Thursday, May 31, 2012 2:57 PM
To: <co...@hadoop.apache.org>
Subject: Re: MapReduce combiner issue : EOFException while reading Value

Hi Guys

Can anyone please proved any suggestion on this.
I am still facing this issue when running with combiner ?

Please give your valuable inputs

Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
Mobile Number +91-9899949788

On 28-May-2012, at 10:08 AM, Arpit Wanchoo wrote:

Hi

I have been trying to setup a map reduce job with hadoop 0.20.203.1.

Scenario :
My mapper is writing key value pairs where I have total 13 types of keys and corresponding value classes.
For each input record I write all these i.e 13 key-val pair to context.

Also for one specific key (say K1) I want its mapper output to go in one file and for all other keys to rest of files.
For doing this ,this I have define my partitioner as :
public int getPartition(DimensionSet key, MeasureSet value, int numPartitions) {

if(numPartitions < 2){
int x= (key.hashCode() & Integer.MAX_VALUE) % numPartitions;
return x;
}
int cubeId = key.getCubeId();
if (cubeId == CubeName.AT_COutgoing.ordinal()) {
return 0;
} else {

int x=((key.hashCode() & Integer.MAX_VALUE) % (numPartitions-1)) + 1;
return x;

}
}
My combiner and reducer are doing the same thing.

Issue :
My job is running fine when I don't use a combiner.
But when I run with combiner , I am getting EOFException.

java.io.EOFException
       at java.io.DataInputStream.readUnsignedShort(Unknown Source)
       at java.io.DataInputStream.readUTF(Unknown Source)
       at java.io.DataInputStream.readUTF(Unknown Source)
       at com.guavus.mapred.common.collection.ValueCollection.readFieldsLong(ValueCollection.java:40)
       at com.guavus.mapred.common.collection.ValueCollection.readFields(ValueCollection.java:21)
       at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
       at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
       at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
       at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
       at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
       at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1420)
       at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
       at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:852)
       at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1343)

My Finding :
On checking and debugging what I got was that for the particular key-val pair (K1, which I want to write to reduce number 0), the combiner reads the key successfully but while trying to read the values it gives EOFException because it doesn't find anything in DataInput stream. Also this is occurring when data is large and combiner runs more than once.
I have noticed that the combiner is failing to get the value for this key when running for the 2nd time . (I read somewhere that combiner begins when the some amount of data has been written by mapper even though mapper is still writing data to context).
Actually the issue occured with any key which was defined in partitioner  to get partition 0 for writing.

I verified many times that my mapper is writing no null value. The issue looks really strange because combiner is able to read the key but doesn't get any value in data stream.

Please suggest what could be the root cause for this or what can I do to track the root cause.

Regards,
Arpit Wanchoo

Re: MapReduce combiner issue : EOFException while reading Value

Posted by Arpit Wanchoo <Ar...@guavus.com>.

Hi Guys

Can anyone please proved any suggestion on this.
I am still facing this issue when running with combiner ?

Please give your valuable inputs

Regards,
Arpit Wanchoo | Sr. Software Engineer
Guavus Network Systems.
6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
Mobile Number +91-9899949788

On 28-May-2012, at 10:08 AM, Arpit Wanchoo wrote:

Hi

I have been trying to setup a map reduce job with hadoop 0.20.203.1.

Scenario :
My mapper is writing key value pairs where I have total 13 types of keys and corresponding value classes.
For each input record I write all these i.e 13 key-val pair to context.

Also for one specific key (say K1) I want its mapper output to go in one file and for all other keys to rest of files.
For doing this ,this I have define my partitioner as :
public int getPartition(DimensionSet key, MeasureSet value, int numPartitions) {

if(numPartitions < 2){
int x= (key.hashCode() & Integer.MAX_VALUE) % numPartitions;
return x;
}
int cubeId = key.getCubeId();
if (cubeId == CubeName.AT_COutgoing.ordinal()) {
return 0;
} else {

int x=((key.hashCode() & Integer.MAX_VALUE) % (numPartitions-1)) + 1;
return x;

}
}
My combiner and reducer are doing the same thing.

Issue :
My job is running fine when I don't use a combiner.
But when I run with combiner , I am getting EOFException.

java.io.EOFException
       at java.io.DataInputStream.readUnsignedShort(Unknown Source)
       at java.io.DataInputStream.readUTF(Unknown Source)
       at java.io.DataInputStream.readUTF(Unknown Source)
       at com.guavus.mapred.common.collection.ValueCollection.readFieldsLong(ValueCollection.java:40)
       at com.guavus.mapred.common.collection.ValueCollection.readFields(ValueCollection.java:21)
       at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
       at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
       at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
       at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
       at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
       at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1420)
       at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
       at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:852)
       at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1343)


My Finding :
On checking and debugging what I got was that for the particular key-val pair (K1, which I want to write to reduce number 0), the combiner reads the key successfully but while trying to read the values it gives EOFException because it doesn't find anything in DataInput stream. Also this is occurring when data is large and combiner runs more than once.
I have noticed that the combiner is failing to get the value for this key when running for the 2nd time . (I read somewhere that combiner begins when the some amount of data has been written by mapper even though mapper is still writing data to context).
Actually the issue occured with any key which was defined in partitioner  to get partition 0 for writing.

I verified many times that my mapper is writing no null value. The issue looks really strange because combiner is able to read the key but doesn't get any value in data stream.

Please suggest what could be the root cause for this or what can I do to track the root cause.



Regards,
Arpit Wanchoo