You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by apivonka <al...@gmail.com> on 2014/11/14 22:19:24 UTC

AvroMultipleOutputs and java.lang.ClassCastException: cannot be cast to org.apache.hadoop.io.Text

Good Day all.
Any and all assistance is greatly appreciated.

Our team has worked hard to convert our domain structure to Avro/Schema and
stream line a great deal of work.
Everything works great outside of a MapReduce Job. 

Here is the scenario:

With the following Job configuration and AvroMultipleOutputs configurations
we are not able to persist using AvroMultipleOutputs and are 
getting the following exception:  java.lang.ClassCastException:
com.qux.omap.Person cannot be cast to org.apache.hadoop.io.Text

com.qux.omap.Person is one of our Avro objects.


MapReduce with text input and output is Avro backed objects.

Since we derive multiple Avro objects from the text input, we are using
trying to use AvroMultipleOutputs.

The Map reduce is setup as follows:
1)	public class QOMAPAvroMR extends Mapper<NullWritable , Text, NullWritable
, GenericRecord> 
2)	Within MapReduce setup we initialize the AvroMultipleOutputs.
addNamedOutput like the following:
	private void setupMapping(Context context) throws IOException,
InstantiationException, IllegalAccessException{
		Reflections reflections = new Reflections("com.quintiles.omop");
		Set<Class&lt;? extends SpecificRecordBase>> classes =
reflections.getSubTypesOf(SpecificRecordBase.class);
		Job job = new Job(context.getConfiguration(),"AvroJob");
		job.setOutputFormatClass(AvroKeyValueOutputFormat.class);
		for (Class<? extends SpecificRecordBase> class1 : classes) {
			SpecificRecordBase Thebase = class1.newInstance();
			
			AvroMultipleOutputs.addNamedOutput(job,
class1.getSimpleName(),AvroKeyValueOutputFormat.class,Schema.create(Schema.Type.LONG),Thebase.getSchema());
		}
	}
	
3)	After doing our major work load we utilize the following to serializate
the Avro Object(s)
       protected void avroHDFSSerialize( Object theObject,Context context)
throws Exception {
          avroMultipleOutputs.write(theObject.getClass().getSimpleName(),new
LongWritable(1), theObject);
        }
		
The map reduce job is part of an Oozie work flow and here is the config for
that (We are not sure we are configuring it correctly.)
<workflow-app name="AvroMR" xmlns="uri:oozie:workflow:0.4">
    <start to="QOMAPAvroMR"/>
    <action name="QOMAPAvroMR">
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.input.dir</name>
                    <value>/user/q7x98470/Avro/in</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>/user/q7x98470/Avro/out</value>
                </property>
                <property>
                    <name>mapred.jar</name>
                    <value>QOMAPAvroMapReduce-0.0.1.jar</value>
                </property>
                <property>
                    <name>mapreduce.map.class</name>
                    <value>com.quintiles.mapreduce.QOMOPAvroMR</value>
                </property>
                <property>
                    <name>mapred.reduce.tasks</name>
                    <value>0</value>
                </property>
                <property>
                    <name>mapred.mapper.new-api</name>
                    <value>true</value>
                </property>
                <property>
                    <name>mapred.job.name</name>
                    <value>QOMOP-AVRO_MR</value>
                </property>
                <property>
                    <name>mapreduce.inputformat.class</name>
                   
<value>com.hdbms.mapreduce.input.ZipFileInputFormat</value>
                </property>
                <property>
                    <name>zipFileExtensionFilters</name>
                    <value>.*\.xml$</value>
                </property>
                <property>
                    <name>mapreduce.outputformat.class</name>
                   
<value>org.apache.avro.mapreduce.AvroKeyValueOutputFormat</value>
                </property>
                <property>
                    <name>mapred.output.value.class</name>
                    <value>org.apache.avro.mapred.AvroValue</value>
                </property>
                <property>
                    <name>mapred.outputformat.class</name>
                   
<value>org.apache.avro.mapreduce.AvroKeyValueOutputFormat</value>
                </property>
            </configuration>
           
<file>lib/QOMOPAvroMapReduce-0.0.1.jar#QOMOPAvroMapReduce-0.0.1.jar</file>
        </map-reduce>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>Action failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

Again, any an all insights suggestions are greatly appreciated.




--
View this message in context: http://apache-avro.679487.n3.nabble.com/AvroMultipleOutputs-and-java-lang-ClassCastException-cannot-be-cast-to-org-apache-hadoop-io-Text-tp4031334.html
Sent from the Avro - Users mailing list archive at Nabble.com.