You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nira Amit (JIRA)" <ji...@apache.org> on 2017/03/05 17:43:32 UTC
[jira] [Updated] (SPARK-19656) Can't load custom type from avro
file to RDD with newAPIHadoopFile
[ https://issues.apache.org/jira/browse/SPARK-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nira Amit updated SPARK-19656:
------------------------------
Attachment: datum.png
{code}
public static class ABgoEventAvroReader extends AvroRecordReaderBase<ABgoEventAvroKey, NullWritable, ABgoEvent> {
static Schema schema;
static {
try {
schema = new Schema.Parser().parse(AvroTest.class.getResourceAsStream("/Abgo.avsc"));
} catch (IOException e) {
throw new RuntimeException(e);
}
}
/** A reusable object to hold records of the Avro container file. */
private final ABgoEventAvroKey mCurrentRecord;
public ABgoEventAvroReader() {
super(schema);
mCurrentRecord = new ABgoEventAvroKey();
}
/** {@inheritDoc} */
@Override
public boolean nextKeyValue() throws IOException, InterruptedException {
boolean hasNext = super.nextKeyValue();
mCurrentRecord.datum(getCurrentRecord());
return hasNext;
}
/** {@inheritDoc} */
@Override
public ABgoEventAvroKey getCurrentKey() throws IOException, InterruptedException {
return mCurrentRecord;
}
/** {@inheritDoc} */
@Override
public NullWritable getCurrentValue() throws IOException, InterruptedException {
return NullWritable.get();
}
}
{code}
> Can't load custom type from avro file to RDD with newAPIHadoopFile
> ------------------------------------------------------------------
>
> Key: SPARK-19656
> URL: https://issues.apache.org/jira/browse/SPARK-19656
> Project: Spark
> Issue Type: Question
> Components: Java API
> Affects Versions: 2.0.2
> Reporter: Nira Amit
> Attachments: datum.png
>
>
> If I understand correctly, in scala it's possible to load custom objects from avro files to RDDs this way:
> {code}
> ctx.hadoopFile("/path/to/the/avro/file.avro",
> classOf[AvroInputFormat[MyClassInAvroFile]],
> classOf[AvroWrapper[MyClassInAvroFile]],
> classOf[NullWritable])
> {code}
> I'm not a scala developer, so I tried to "translate" this to java as best I could. I created classes that extend AvroKey and FileInputFormat:
> {code}
> public static class MyCustomAvroKey extends AvroKey<MyCustomClass>{};
> public static class MyCustomAvroReader extends AvroRecordReaderBase<MyCustomAvroKey, NullWritable, MyCustomClass> {
> // with my custom schema and all the required methods...
> }
> public static class MyCustomInputFormat extends FileInputFormat<MyCustomAvroKey, NullWritable>{
> @Override
> public RecordReader<MyCustomAvroKey, NullWritable> createRecordReader(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
> return new MyCustomAvroReader();
> }
> }
> ...
> JavaPairRDD<MyCustomAvroKey, NullWritable> records =
> sc.newAPIHadoopFile("file:/path/to/datafile.avro",
> MyCustomInputFormat.class, MyCustomAvroKey.class,
> NullWritable.class,
> sc.hadoopConfiguration());
> MyCustomClass first = records.first()._1.datum();
> System.out.println("Got a result, some custom field: " + first.getSomeCustomField());
> {code}
> This compiles fine, but using a debugger I can see that `first._1.datum()` actually returns a `GenericData$Record` in runtime, not a `MyCustomClass` instance.
> And indeed, when the following line executes:
> {code}
> MyCustomClass first = records.first()._1.datum();
> {code}
> I get an exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to my.package.containing.MyCustomClass
> {code}
> Am I doing it wrong? Or is this not possible in Java?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org