You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Anton Oellerer (Jira)" <ji...@apache.org> on 2020/03/30 15:41:00 UTC

[jira] [Updated] (AVRO-2787) Hadoop Mapreduce job fails when creating Writer

     [ https://issues.apache.org/jira/browse/AVRO-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anton Oellerer updated AVRO-2787:
---------------------------------
    Component/s:     (was: docker)

> Hadoop Mapreduce job fails when creating Writer
> -----------------------------------------------
>
>                 Key: AVRO-2787
>                 URL: https://issues.apache.org/jira/browse/AVRO-2787
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.9.2
>         Environment: Development
>  * OS: Fedora 31
>  * Java version 8
>  * Gradle version 6.2.2
>  * Avro version 1.9.2
>  * Shadow version 5.2.0
>  * Gradle-avro-plugin version 0.19.1
> Running in a Podman container
>  * OS: Ubuntu 18.04
>  * Podman 1.8.2
>  * Hadoop version 3.2.1
>  * Java version 8
>            Reporter: Anton Oellerer
>            Priority: Blocker
>         Attachments: CategoryData.avsc, CategoryTokensReducer.java, TextprocessingfundamentalsApplication.java
>
>
> Hey,
> I am trying to create a Hadoop pipeline getting the chi squared value in for tokens in reviews saved in JSON.
> For this, I created multiple Hadoop jobs, and the communication between them happens, partly, with Avro Data containers.
> When trying to run this pipeline, I get the following error at the end of the first reduce Job (Signature
> {code:java}
> public class CategoryTokensReducer extends Reducer<Text, StringArrayWritable, AvroKey<CharSequence>, AvroValue<CategoryData>>{code}
> )
> Error:
> {code:java}
> java.lang.Exception: java.lang.NoSuchMethodError: org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)                               
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)                                    
> Caused by: java.lang.NoSuchMethodError: org.apache.avro.Schema$Field.<init>(Ljava/lang/String;Lorg/apache/avro/Schema;Ljava/lang/String;Ljava/lang/Object;)V
>         at org.apache.avro.hadoop.io.AvroKeyValue.getSchema(AvroKeyValue.java:111)                    
>         at org.apache.avro.mapreduce.AvroKeyValueRecordWriter.<init>(AvroKeyValueRecordWriter.java:84)         
>         at org.apache.avro.mapreduce.AvroKeyValueOutputFormat.getRecordWriter(AvroKeyValueOutputFormat.java:70)
>         at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
>         at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)                               
>         at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)                       
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)                                                                       
> {code}
> The Job is setup like this:
> {code:java}
> Job jsonToCategoryTokensJob = Job.getInstance(conf, "json to category data");
> AvroJob.setOutputKeySchema(jsonToCategoryTokensJob, Schema.create(Schema.Type.STRING));
> AvroJob.setOutputValueSchema(jsonToCategoryTokensJob, CategoryData.getClassSchema());
> jsonToCategoryTokensJob.setJarByClass(TextprocessingfundamentalsApplication.class);
> jsonToCategoryTokensJob.setMapperClass(JsonToCategoryTokensMapper.class);
> jsonToCategoryTokensJob.setMapOutputKeyClass(Text.class);
> jsonToCategoryTokensJob.setMapOutputValueClass(StringArrayWritable.class);
> jsonToCategoryTokensJob.setReducerClass(CategoryTokensReducer.class);
> jsonToCategoryTokensJob.setOutputFormatClass(AvroKeyValueOutputFormat.class);
> String in = otherArgs.get(0);
> String out = otherArgs.get(1);
> FileInputFormat.addInputPath(jsonToCategoryTokensJob, new Path(in));
> FileOutputFormat.setOutputPath(jsonToCategoryTokensJob, new Path(out, "outCategoryData"));
> {code}
> Does someone know what the problem here might be?
> Best regards
> Anton



--
This message was sent by Atlassian Jira
(v8.3.4#803005)