You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Matthew Hayes (JIRA)" <ji...@apache.org> on 2012/12/07 05:45:20 UTC

[jira] [Created] (AVRO-1215) AvroMultipleOutputs not working when specifying baseOutputPath

Matthew Hayes created AVRO-1215:
-----------------------------------

             Summary: AvroMultipleOutputs not working when specifying baseOutputPath
                 Key: AVRO-1215
                 URL: https://issues.apache.org/jira/browse/AVRO-1215
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.7.2
            Reporter: Matthew Hayes


I'm calling the write() method of AvroMultipleOutputs which takes the baseOutputPath.  The reducer appears to begin hanging once it tries writing to a baseOuputPath value not already encountered.  It then fails with:

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file ... because current leaseholder is trying to recreate file.

I think the problem has to do with this line in AvroMultipleOutputs:

{code}
      // get the record writer from context output format
      //FileOutputFormat.setOutputName(taskContext, baseFileName);
{code}

This line is not commented out in the similar code from Hadoop.  So I think the baseOutputPath is ignored.  As a result when each record writer is created it uses the same path, leading to the exception.

Uncommenting this line does not work because of visibility of the method.  However what this method does is set "mapreduce.output.basename".  But setting this doesn't work either.  

After digging through Avro code I found that AvroOutputFormatBase is using "avro.mo.config.namedOutput" to create the path.  If I replace the commented out line with this it seems to work:

{code}
taskContext.getConfiguration().set("avro.mo.config.namedOutput", baseFileName);  
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira