You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Viji (JIRA)" <ji...@apache.org> on 2012/10/17 09:48:06 UTC

[jira] [Created] (AVRO-1179) AvroMultipleOutputs does not seem to be generating different base output paths

Viji created AVRO-1179:
--------------------------

             Summary: AvroMultipleOutputs does not seem to be generating different base output paths
                 Key: AVRO-1179
                 URL: https://issues.apache.org/jira/browse/AVRO-1179
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.7.2
            Reporter: Viji


In the implementation at http://svn.apache.org/repos/asf/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroMultipleOutputs.java, the following line in {{getRecordWriter(TaskAttemptContext taskContext, String baseFileName)}} has been commented out:

{code}//FileOutputFormat.setOutputName(taskContext, baseFileName);{code}

Hence, when we call {{mo.write(outKey, NullWritable.get(), "subdir/samp");}} the output still goes to the default output directory and not under the string path we specify.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1179) AvroMultipleOutputs does not seem to be generating different base output paths

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478325#comment-13478325 ] 

Harsh J commented on AVRO-1179:
-------------------------------

Hi Ashish,

I think the specific error happens when you use the MO.write(KEY, VALUE, STRING) signature, which allows MO users to generally write outputs without defining a named output first. In this case, the STRING part is passed into the record writer fetch call, but is ignored there.
                
> AvroMultipleOutputs does not seem to be generating different base output paths
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-1179
>                 URL: https://issues.apache.org/jira/browse/AVRO-1179
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.2
>            Reporter: Viji
>
> In the implementation at http://svn.apache.org/repos/asf/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroMultipleOutputs.java, the following line in {{getRecordWriter(TaskAttemptContext taskContext, String baseFileName)}} has been commented out:
> {code}//FileOutputFormat.setOutputName(taskContext, baseFileName);{code}
> Hence, when we call {{mo.write(outKey, NullWritable.get(), "subdir/samp");}} the output still goes to the default output directory and not under the string path we specify.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1179) AvroMultipleOutputs does not seem to be generating different base output paths

Posted by "Ashish Nagavaram (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478401#comment-13478401 ] 

Ashish Nagavaram commented on AVRO-1179:
----------------------------------------

I think we will still need to add it to the namedoutput first as we need the output schema for it. Another way to go about it is to write by using the default schema declared? 

Let me know if the second option sounds good? I will make changes accordingly


                
> AvroMultipleOutputs does not seem to be generating different base output paths
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-1179
>                 URL: https://issues.apache.org/jira/browse/AVRO-1179
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.2
>            Reporter: Viji
>
> In the implementation at http://svn.apache.org/repos/asf/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroMultipleOutputs.java, the following line in {{getRecordWriter(TaskAttemptContext taskContext, String baseFileName)}} has been commented out:
> {code}//FileOutputFormat.setOutputName(taskContext, baseFileName);{code}
> Hence, when we call {{mo.write(outKey, NullWritable.get(), "subdir/samp");}} the output still goes to the default output directory and not under the string path we specify.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1179) AvroMultipleOutputs does not seem to be generating different base output paths

Posted by "Ashish Nagavaram (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478228#comment-13478228 ] 

Ashish Nagavaram commented on AVRO-1179:
----------------------------------------

Hi Viji,

The code for deciding the file name is at http://svn.apache.org/repos/asf/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroOutputFormatBase.java

So when you specify the outputformat class this function is invoked which looks up whether any namedoutputs have been declared otherwise it makes it null. 

sample use cases are written at
http://svn.apache.org/repos/asf/avro/trunk/lang/java/mapred/src/test/java/org/apache/avro/mapreduce/TestAvroMultipleOutputs.java

can you please paste the code where it fails?

--Ashish
                
> AvroMultipleOutputs does not seem to be generating different base output paths
> ------------------------------------------------------------------------------
>
>                 Key: AVRO-1179
>                 URL: https://issues.apache.org/jira/browse/AVRO-1179
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.2
>            Reporter: Viji
>
> In the implementation at http://svn.apache.org/repos/asf/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapreduce/AvroMultipleOutputs.java, the following line in {{getRecordWriter(TaskAttemptContext taskContext, String baseFileName)}} has been commented out:
> {code}//FileOutputFormat.setOutputName(taskContext, baseFileName);{code}
> Hence, when we call {{mo.write(outKey, NullWritable.get(), "subdir/samp");}} the output still goes to the default output directory and not under the string path we specify.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira