You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Alan Miller <al...@gmail.com> on 2012/12/20 16:21:10 UTC

How to read with SpecificDatumReader

I can write my Avro data fine, but how do I read my data records with the
SpecificDatum reader?

Basically, I write my (hdfs) data file like this:
Schema schema = new MyRecord().getSchema();
DatumWriter<MyRecord> writer = new SpecificDatumWriter<MyRecord>(schema);
DataFileWriter<MyRecord> dataFileWriter = new
DataFileWriter<MyRecord>(writer);
FSDataOutputStream fos = fs.create(avroPath);
dataFileWriter.create(schema, fos);
for (MyRecord r : map.values()) {
dataFileWriter.flush();
dataFileWriter.append(r);
}
dataFileWriter.flush();

This works fine because my MR job processes the generated files via
     Job job = new Job(config, jobName);
     job.setJarByClass(getClass());
        AvroJob.setInputKeySchema(job, schema);
     AvroJob.setInputValueSchema(job, schema);
        job.setInputFormatClass(AvroKeyInputFormat.class);
        job.setMapperClass(MyMapper.class);

Now I need to read the file from a different (non-Hadoop) application but
when I try to read the data like this:
596 DatumReader<MyRecord> myDatumReader = new
SpecificDatumReader<MyRecord>(MyRecord.class);
597 DataFileReader<MyRecord> dataFileReader = new
DataFileReader<MyRecord>(localFile, myDatumReader);
598 MyRecord record = null;
599 String owner = null;
600 while (dataFileReader.hasNext()) {
601 record = dataFileReader.next(record);
602 owner = record.getOwner().toString();
603 System.out.printf("owner = %s\n", owner);
604 }
605 dataFileReader.close();

I get this error:
Exception in thread "main" java.lang.ClassCastException:
org.apache.avro.generic.GenericData$Record cannot be cast to
com.company.app.MyRecord
at com.company.app.MyDriver.readAvroData(MyDriver.java:601)
at com.company.app.MyDriver.main(MyDriver.java:1378)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Alan

Re: How to read with SpecificDatumReader

Posted by Alan Miller <al...@gmail.com>.
Thanks Doug,
I guess that's the problem (somehow) but I don't see why writing the Avro
file works
but reading it doesn't. I mean, I',m writing or reading the files in the
same way.

When I said non-Hadoop, I mean't the java program isn't (yet) running via
Hadoop (yet).
This code is in my "Driver" class, before I actually submit the job to
Hadoop. Basically,
this is what I do.

I JAR up my classes (MyAppDriver,MyAppMapper,MyAppReducer,MyAppRecord) in
myjar.jar
then run this wrapper to trigger my MR job:

     DRIVER="com.company.app.MyAppDriver"
     JAR="/some/path/my.jar"
     ARGS="-debug -overwrite"

 EXTRAJARS="lib/logback-core-1.0.6.jar:lib/logback-classic-1.0.6.jar:lib/json_simple-1.1.jar"
     export HADOOP_USER_CLASSPATH_FIRST="true"
     export HADOOP_CLASSPATH=${EXTRAJARS}
     hadoop jar ${JAR} ${DRIVER} ${ARGS}

Writing the Avro file in the "Driver" code works but reading does not.
HOWEVER, if I add my.jar to EXTRAJARS then reading the Avro file works.

Alan


What I


On Thu, Dec 20, 2012 at 6:49 PM, Doug Cutting <cu...@apache.org> wrote:

> It looks to me like in your non-Hadoop application
> com.company.app.MyRecord is not on the classpath.
>
> Doug
>
> On Thu, Dec 20, 2012 at 7:21 AM, Alan Miller <al...@gmail.com>
> wrote:
> > I can write my Avro data fine, but how do I read my data records with the
> > SpecificDatum reader?
> >
> > Basically, I write my (hdfs) data file like this:
> > Schema schema = new MyRecord().getSchema();
> > DatumWriter<MyRecord> writer = new SpecificDatumWriter<MyRecord>(schema);
> > DataFileWriter<MyRecord> dataFileWriter = new
> > DataFileWriter<MyRecord>(writer);
> > FSDataOutputStream fos = fs.create(avroPath);
> > dataFileWriter.create(schema, fos);
> > for (MyRecord r : map.values()) {
> > dataFileWriter.flush();
> > dataFileWriter.append(r);
> > }
> > dataFileWriter.flush();
> >
> > This works fine because my MR job processes the generated files via
> >      Job job = new Job(config, jobName);
> >      job.setJarByClass(getClass());
> >         AvroJob.setInputKeySchema(job, schema);
> >      AvroJob.setInputValueSchema(job, schema);
> >         job.setInputFormatClass(AvroKeyInputFormat.class);
> >         job.setMapperClass(MyMapper.class);
> >
> > Now I need to read the file from a different (non-Hadoop) application but
> > when I try to read the data like this:
> > 596 DatumReader<MyRecord> myDatumReader = new
> > SpecificDatumReader<MyRecord>(MyRecord.class);
> > 597 DataFileReader<MyRecord> dataFileReader = new
> > DataFileReader<MyRecord>(localFile, myDatumReader);
> > 598 MyRecord record = null;
> > 599 String owner = null;
> > 600 while (dataFileReader.hasNext()) {
> > 601 record = dataFileReader.next(record);
> > 602 owner = record.getOwner().toString();
> > 603 System.out.printf("owner = %s\n", owner);
> > 604 }
> > 605 dataFileReader.close();
> >
> > I get this error:
> > Exception in thread "main" java.lang.ClassCastException:
> > org.apache.avro.generic.GenericData$Record cannot be cast to
> > com.company.app.MyRecord
> > at com.company.app.MyDriver.readAvroData(MyDriver.java:601)
> > at com.company.app.MyDriver.main(MyDriver.java:1378)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> > Alan
>

Re: How to read with SpecificDatumReader

Posted by Doug Cutting <cu...@apache.org>.
It looks to me like in your non-Hadoop application
com.company.app.MyRecord is not on the classpath.

Doug

On Thu, Dec 20, 2012 at 7:21 AM, Alan Miller <al...@gmail.com> wrote:
> I can write my Avro data fine, but how do I read my data records with the
> SpecificDatum reader?
>
> Basically, I write my (hdfs) data file like this:
> Schema schema = new MyRecord().getSchema();
> DatumWriter<MyRecord> writer = new SpecificDatumWriter<MyRecord>(schema);
> DataFileWriter<MyRecord> dataFileWriter = new
> DataFileWriter<MyRecord>(writer);
> FSDataOutputStream fos = fs.create(avroPath);
> dataFileWriter.create(schema, fos);
> for (MyRecord r : map.values()) {
> dataFileWriter.flush();
> dataFileWriter.append(r);
> }
> dataFileWriter.flush();
>
> This works fine because my MR job processes the generated files via
>      Job job = new Job(config, jobName);
>      job.setJarByClass(getClass());
>         AvroJob.setInputKeySchema(job, schema);
>      AvroJob.setInputValueSchema(job, schema);
>         job.setInputFormatClass(AvroKeyInputFormat.class);
>         job.setMapperClass(MyMapper.class);
>
> Now I need to read the file from a different (non-Hadoop) application but
> when I try to read the data like this:
> 596 DatumReader<MyRecord> myDatumReader = new
> SpecificDatumReader<MyRecord>(MyRecord.class);
> 597 DataFileReader<MyRecord> dataFileReader = new
> DataFileReader<MyRecord>(localFile, myDatumReader);
> 598 MyRecord record = null;
> 599 String owner = null;
> 600 while (dataFileReader.hasNext()) {
> 601 record = dataFileReader.next(record);
> 602 owner = record.getOwner().toString();
> 603 System.out.printf("owner = %s\n", owner);
> 604 }
> 605 dataFileReader.close();
>
> I get this error:
> Exception in thread "main" java.lang.ClassCastException:
> org.apache.avro.generic.GenericData$Record cannot be cast to
> com.company.app.MyRecord
> at com.company.app.MyDriver.readAvroData(MyDriver.java:601)
> at com.company.app.MyDriver.main(MyDriver.java:1378)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Alan