You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@orc.apache.org by Sandeep Khurana <sa...@infoworks.io> on 2015/11/06 04:18:50 UTC

ORC MRUNit test

Hello All

I have a map program which process ORC file. From the driver I set the
orcformat as input format.
  job.setInputFormatClass(OrcNewInputFormat.class);

In the OrcNewInputFormat the value is OrcStruct. Map program I Writable
value passed as param is type casted to OrcStruct
     OrcStruct record = (OrcStruct) value

I want to test this mapper using MRUnit. For this in the setup method of
unit test I create a ORC file
OrcFile.createWriter(testFilePath,

 OrcFile.writerOptions(conf).inspector(inspector).stripeSize(100000).bufferSize(10000)
           .version(OrcFile.Version.V_0_12));

Then in the test method I read it and using MRUnit invoke mapper. Below is
the code

// Read orc file
                Reader reader = OrcFile.createReader(fs, testFilePath) ;
RecordReader recordRdr = reader.rows() ;
OrcStruct row = null ;
List<OrcStruct> mapData = new ArrayList<>()

              while(recordRdr.hasNext()) {
row = (OrcStruct) recordRdr.next(row) ;
mapData.add(row) ;
}

// test mapper
                 initializeSerde(mapDriver.getConfiguration());
Writable writable = getWritable(mapData.get(0))  ; // test 1st record
mapper processing
mapDriver.withCacheFile(strCachePath)
.withInput(NullWritable.get(), writable );
mapDriver.runTest();


But while running the test case I get below error
java.lang.UnsupportedOperationException: can't write the bundle
at
org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow.write(OrcSerde.java:61)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
at
org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:80)
at
org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:97)
at
org.apache.hadoop.mrunit.internal.io.Serialization.copyWithConf(Serialization.java:110)
at org.apache.hadoop.mrunit.TestDriver.copy(TestDriver.java:675)
at org.apache.hadoop.mrunit.TestDriver.copyPair(TestDriver.java:679)
at org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:120)
at org.apache.hadoop.mrunit.MapDriverBase.withInput(MapDriverBase.java:210)


Looking at orcserde write is not supported which MRUnit invokes. Hence test
case errors out.

How do we unit test case the mapper processing Orc file. Is there any
 other way or what needs to be changed in what I am doing?

Thanks in advance for the help .

br
Sandeep