You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Alan Paulsen (JIRA)" <ji...@apache.org> on 2013/07/21 16:40:49 UTC
[jira] [Updated] (AVRO-1356) AvroMultipleOutputs map only jobs do
not use NamedOutput schemas
[ https://issues.apache.org/jira/browse/AVRO-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Paulsen updated AVRO-1356:
-------------------------------
Description:
AvroMultipleOutputs sets the MapOutputKeySchema when running a map only job, as follows:
{code:java}
boolean isMaponly = job.getNumReduceTasks() == 0;
if (keySchema != null) {
if (isMaponly)
AvroJob.setMapOutputKeySchema(job, keySchema);
else
AvroJob.setOutputKeySchema(job, keySchema);
}
if (valSchema != null) {
if (isMaponly)
AvroJob.setMapOutputValueSchema(job, valSchema);
else
AvroJob.setOutputValueSchema(job, valSchema);
}
{code}
Unfortunately, AvroKeyOutputFormat and AvroKeyValueOutputFormat never check if the job is map only, and uses the OutputKeySchema and OutputValueSchema regardless.
We can fix this by either
* Changing AvroKeyOutputFormat and AvroKeyValueOutputFormat to check if the job is map only and use the appropriate schema. (Seems right)
* Change AvroMultipleOutputs to always use the OutputKeySchema and OutputValueSchema
was:
AvroMultipleOutputs sets the MapOutputKeySchema when running a map only job, as follows:
{code:java}
boolean isMaponly = job.getNumReduceTasks() == 0;
if (keySchema != null) {
if (isMaponly)
AvroJob.setMapOutputKeySchema(job, keySchema);
else
AvroJob.setOutputKeySchema(job, keySchema);
}
if (valSchema != null) {
if (isMaponly)
AvroJob.setMapOutputValueSchema(job, valSchema);
else
AvroJob.setOutputValueSchema(job, valSchema);
}
{code}
Unfortunately, AvroKeyOutputFormat and AvroKeyValueOutputFormat never check if the job is map only, and uses the OutputKeySchema and OutputValueSchema regardless.
We can fix this by either
* Changing AvroKeyOutputFormat and AvroKeyValueOutputFormat to check if the job is map only and use the appropriate schema. (Seems right)
* Change AvroMultipleOutputs to always use the OutputKeySchema and OutputValueSchema
> AvroMultipleOutputs map only jobs do not use NamedOutput schemas
> ----------------------------------------------------------------
>
> Key: AVRO-1356
> URL: https://issues.apache.org/jira/browse/AVRO-1356
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.7.4
> Reporter: Alan Paulsen
> Fix For: 1.7.5
>
>
> AvroMultipleOutputs sets the MapOutputKeySchema when running a map only job, as follows:
> {code:java}
> boolean isMaponly = job.getNumReduceTasks() == 0;
> if (keySchema != null) {
> if (isMaponly)
> AvroJob.setMapOutputKeySchema(job, keySchema);
> else
> AvroJob.setOutputKeySchema(job, keySchema);
> }
> if (valSchema != null) {
> if (isMaponly)
> AvroJob.setMapOutputValueSchema(job, valSchema);
> else
> AvroJob.setOutputValueSchema(job, valSchema);
> }
> {code}
> Unfortunately, AvroKeyOutputFormat and AvroKeyValueOutputFormat never check if the job is map only, and uses the OutputKeySchema and OutputValueSchema regardless.
> We can fix this by either
> * Changing AvroKeyOutputFormat and AvroKeyValueOutputFormat to check if the job is map only and use the appropriate schema. (Seems right)
> * Change AvroMultipleOutputs to always use the OutputKeySchema and OutputValueSchema
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira