You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "kobefeng (JIRA)" <ji...@apache.org> on 2014/04/03 09:13:15 UTC

[jira] [Commented] (AVRO-1356) AvroMultipleOutputs map only jobs do not use NamedOutput schemas

    [ https://issues.apache.org/jira/browse/AVRO-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958587#comment-13958587 ] 

kobefeng commented on AVRO-1356:
--------------------------------

AvroKeyValueOutputFormat still used context.getOutputKeyClass() and context.getOutputValueClass() without considering isMapOnly.
Also value schema is not using mapOutputValueSchema in AvroDatumConverterFactory.create(Class<IN> inputClass):
100	      Schema schema = null;
101	      if (isMapOnly) {
102	        AvroJob.getMapOutputValueSchema(getConf());
103	        if (null == schema) {
104	          schema = AvroJob.getOutputValueSchema(getConf());
105	        }
106	      }

> AvroMultipleOutputs map only jobs do not use NamedOutput schemas
> ----------------------------------------------------------------
>
>                 Key: AVRO-1356
>                 URL: https://issues.apache.org/jira/browse/AVRO-1356
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.4
>            Reporter: Alan Paulsen
>            Assignee: Alan Paulsen
>             Fix For: 1.7.5
>
>         Attachments: AVRO-1356.patch
>
>
> AvroMultipleOutputs sets the MapOutputKeySchema when running a map only job, as follows:
> {code:java}
> boolean isMaponly = job.getNumReduceTasks() == 0;
>     if (keySchema != null) {
>       if (isMaponly)
>         AvroJob.setMapOutputKeySchema(job, keySchema);
>       else
>         AvroJob.setOutputKeySchema(job, keySchema);
>     }
>     if (valSchema != null) {
>       if (isMaponly)
>         AvroJob.setMapOutputValueSchema(job, valSchema);
>       else
>         AvroJob.setOutputValueSchema(job, valSchema);
>     }
> {code}
> Unfortunately, AvroKeyOutputFormat and AvroKeyValueOutputFormat never check if the job is map only, and uses the OutputKeySchema and OutputValueSchema regardless.
> We can fix this by either 
> * Changing AvroKeyOutputFormat and AvroKeyValueOutputFormat to check if the job is map only and use the appropriate schema.  (Seems right)
> * Change AvroMultipleOutputs to always use the OutputKeySchema and OutputValueSchema 



--
This message was sent by Atlassian JIRA
(v6.2#6252)