You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Dariusz Kobylarz <da...@gmail.com> on 2014/09/08 21:46:17 UTC

saveAsHadoopFile into avro format

What is the right way of saving any PairRDD into avro output format. 
GraphArray extends SpecificRecord etc.
I have the following java rdd:
JavaPairRDD<GraphArray, NullWritable> pairRDD = ...
and want to save it to avro format:
org.apache.hadoop.mapred.JobConf jc = new 
org.apache.hadoop.mapred.JobConf();
org.apache.avro.mapred.AvroJob.setOutputSchema(jc, 
GraphArray.getClassSchema());
org.apache.avro.mapred.AvroOutputFormat.setOutputPath(jc, new Path(outURI));
pairRDD.saveAsHadoopDataset(jc);

the code above throws:
Exception in thread "main" org.apache.spark.SparkException: Job aborted 
due to stage failure: Task not serializable: 
java.io.NotSerializableException: org.apache.hadoop.io.NullWritable

I also tried wrapping key and values with AvroKey and AvroValue classes 
respectively.

What am I doing wrong? Should I use JavaRDD (list) instead and try with 
custom serializer?

Thanks,



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org