You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Dave Beech (JIRA)" <ji...@apache.org> on 2012/10/29 18:40:12 UTC

[jira] [Created] (AVRO-1185) AvroJob.setInputSchema can have undesired side effects for map-only jobs

Dave Beech created AVRO-1185:
--------------------------------

             Summary: AvroJob.setInputSchema can have undesired side effects for map-only jobs
                 Key: AVRO-1185
                 URL: https://issues.apache.org/jira/browse/AVRO-1185
             Project: Avro
          Issue Type: Bug
          Components: java
            Reporter: Dave Beech
            Priority: Minor


I have a map-only MapReduce job which takes Avro input and writes non-Avro output (Hadoop Writables). 

The mapper is implemented as a standard Hadoop mapper with <AvroWrapper<IN>,NullWritable,Text,Text> type parameters. 

In the job setup, I thought I would be safe to call AvroJob.setInputSchema(MySchema.SCHEMA$), but it seems that this call makes assumptions about what the map output will be. Internally AvroJob.setInputSchema calls configureAvroInput which in turn calls configureAvroShuffle, resulting in my map output key/value types and serialization settings all being set incorrectly for my use case. 

This is confusing behaviour - what appears to be a simple setter has too many side effects which are not documented. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira