You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Krishnan K <kk...@gmail.com> on 2013/10/21 20:38:30 UTC

ArrayIndexOutOfBoundsException while writing MapReduce output as RCFile

Hi All,

I have a scenario where I've to read an RCFile, process it and write the
output as an RCFile using a MapReduce program.
My Hadoop version is *CDH 4.2.1 *

*
*
*Mapper*
Map Input <Key,Value> = LongWritable, BytesRefArrayWritable
Map Output <Key,Value> = Text, BytesRefArrayWritable (Record)


*******************************CODE BEGINS*******************************
//Mapper

public static class AbcMapper extends Mapper<LongWritable,
BytesRefArrayWritable, Text, BytesRefArrayWritable>{

    public void map(LongWritable key, BytesRefArrayWritable value, Context
context) throws IOException, InterruptedException {
.........
// I am passing a text key and BytesRefArrayWritable value (record) as map
output.
     context.write(new Text(keys),value);
   }

//Reducer

public static class AbcReducer extends
Reducer<Text,BytesRefArrayWritable,Text,BytesRefArrayWritable> {

public void reduce(Text keyz, Iterable<BytesRefArrayWritable>
values, Context context) throws IOException, InterruptedException {

//Based on some logic, I pick one BytesRefArrayWritable record from the
list of BytesRefArrayWritable values obtained in the reduce input.

BytesRefArrayWritable outRecord= new BytesRefArrayWritable(5);

for (BytesRefArrayWritable val : values) {
  if (some condition)
  outRecord = val;
}
outRecord.size(); //Value here is getting logged as 5.
context.write(new Text(keyz), outRecord);
}

*******************************CODE ENDS*******************************
I've added the following in the main method :

job.setInputFormatClass(RCFileInputFormat.*class*);
 job.setOutputFormatClass (RCFileOutputFormat.*class* );
 job.setOutputValueClass (BytesRefArrayWritable.*class* );


Before writing to reduce output, the value of outRecord.size() is 5.

But still I'm getting ArrayIndexOutOfBoundsException.

*Stacktrace : *
*
*
*java.lang.ArrayIndexOutOfBoundsException: 0*
        at
org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:890)
        at poc.RCFileOutputFormat$1.write(RCFileOutputFormat.java:82)
        at poc.RCFileOutputFormat$1.write(RCFileOutputFormat.java:1)
        at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:551)
        at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
        at
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
        at poc.Test$PurgeReducer.reduce(Purge.java:98)
        at poc.Test$AReducer.reduce(Purge.java:1)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
        at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)

I tried a lot to find which array was empty causing this exception
*java.lang.ArrayIndexOutOfBoundsException:
0, *but have not found anything yet

Could you please give me any pointers that will help me identify/resolve
the issue ?

Thanks!

Re: ArrayIndexOutOfBoundsException while writing MapReduce output as RCFile

Posted by Yin Huai <hu...@gmail.com>.

Seems you did not set the number of
columns (RCFileOutputFormat.setColumnNumber(Configuration conf, int
columnNum)). Can you set it in your main method and see if your MR program
works?

Thanks,

Yin


On Mon, Oct 21, 2013 at 2:38 PM, Krishnan K <kk...@gmail.com> wrote:

> Hi All,
>
> I have a scenario where I've to read an RCFile, process it and write the
> output as an RCFile using a MapReduce program.
> My Hadoop version is *CDH 4.2.1 *
>
> *
> *
> *Mapper*
> Map Input <Key,Value> = LongWritable, BytesRefArrayWritable
> Map Output <Key,Value> = Text, BytesRefArrayWritable (Record)
>
>
> *******************************CODE BEGINS*******************************
> //Mapper
>
> public static class AbcMapper extends Mapper<LongWritable,
> BytesRefArrayWritable, Text, BytesRefArrayWritable>{
>
>     public void map(LongWritable key, BytesRefArrayWritable value, Context
> context) throws IOException, InterruptedException {
> .........
> // I am passing a text key and BytesRefArrayWritable value (record) as map
> output.
>      context.write(new Text(keys),value);
>    }
>
> //Reducer
>
> public static class AbcReducer extends
> Reducer<Text,BytesRefArrayWritable,Text,BytesRefArrayWritable> {
>
> public void reduce(Text keyz, Iterable<BytesRefArrayWritable>
> values, Context context) throws IOException, InterruptedException {
>
> //Based on some logic, I pick one BytesRefArrayWritable record from the
> list of BytesRefArrayWritable values obtained in the reduce input.
>
> BytesRefArrayWritable outRecord= new BytesRefArrayWritable(5);
>
> for (BytesRefArrayWritable val : values) {
>   if (some condition)
>   outRecord = val;
> }
> outRecord.size(); //Value here is getting logged as 5.
> context.write(new Text(keyz), outRecord);
> }
>
> *******************************CODE ENDS*******************************
> I've added the following in the main method :
>
> job.setInputFormatClass(RCFileInputFormat.*class*);
>  job.setOutputFormatClass (RCFileOutputFormat.*class* );
>  job.setOutputValueClass (BytesRefArrayWritable.*class* );
>
>
> Before writing to reduce output, the value of outRecord.size() is 5.
>
> But still I'm getting ArrayIndexOutOfBoundsException.
>
> *Stacktrace : *
> *
> *
> *java.lang.ArrayIndexOutOfBoundsException: 0*
>         at
> org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:890)
>         at poc.RCFileOutputFormat$1.write(RCFileOutputFormat.java:82)
>         at poc.RCFileOutputFormat$1.write(RCFileOutputFormat.java:1)
>         at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:551)
>         at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
>         at
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
>         at poc.Test$PurgeReducer.reduce(Purge.java:98)
>         at poc.Test$AReducer.reduce(Purge.java:1)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
>         at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>
> I tried a lot to find which array was empty causing this exception *java.lang.ArrayIndexOutOfBoundsException:
> 0, *but have not found anything yet
>
> Could you please give me any pointers that will help me identify/resolve
> the issue ?
>
> Thanks!
>