You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Robbie Strickland (JIRA)" <ji...@apache.org> on 2012/05/01 22:42:56 UTC

[jira] [Created] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats

Robbie Strickland created MAPREDUCE-4216:
--------------------------------------------

             Summary: Make MultipleOutputs generic to support non-file output formats
                 Key: MAPREDUCE-4216
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv2
    Affects Versions: 1.0.2
            Reporter: Robbie Strickland


The current MultipleOutputs implementation is tied to FileOutputFormat in such a way that it is not extensible to other types of output. It should be made more generic, such as with an interface that can be implemented for different outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats

Posted by "Robbie Strickland (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robbie Strickland updated MAPREDUCE-4216:
-----------------------------------------

    Status: Patch Available  (was: Open)
    
> Make MultipleOutputs generic to support non-file output formats
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4216
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 1.0.2
>            Reporter: Robbie Strickland
>              Labels: Output
>         Attachments: MAPREDUCE-4216.patch
>
>
> The current MultipleOutputs implementation is tied to FileOutputFormat in such a way that it is not extensible to other types of output. It should be made more generic, such as with an interface that can be implemented for different outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267542#comment-13267542 ] 

Hadoop QA commented on MAPREDUCE-4216:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525460/MAPREDUCE-4216.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated 3 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2353//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2353//console

This message is automatically generated.
                
> Make MultipleOutputs generic to support non-file output formats
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4216
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 1.0.2
>            Reporter: Robbie Strickland
>              Labels: Output
>         Attachments: MAPREDUCE-4216.patch
>
>
> The current MultipleOutputs implementation is tied to FileOutputFormat in such a way that it is not extensible to other types of output. It should be made more generic, such as with an interface that can be implemented for different outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285331#comment-13285331 ] 

Harsh J commented on MAPREDUCE-4216:
------------------------------------

Hey Robbie, thanks for the patch and the report!

Do we _have_ to move static methods into the abstract OutputFormat class? Why can't it remain in the FileOutputFormat class itself?

I agree MultipleOutputs should allow non FOF-extending classes (though it was primarily written for file classes), however why doesn't your first change hunk alone suffice for it?
                
> Make MultipleOutputs generic to support non-file output formats
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4216
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 1.0.2
>            Reporter: Robbie Strickland
>              Labels: Output
>         Attachments: MAPREDUCE-4216.patch
>
>
> The current MultipleOutputs implementation is tied to FileOutputFormat in such a way that it is not extensible to other types of output. It should be made more generic, such as with an interface that can be implemented for different outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats

Posted by "Robbie Strickland (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288539#comment-13288539 ] 

Robbie Strickland commented on MAPREDUCE-4216:
----------------------------------------------

I guess we don't have to, but if it isn't located somewhere generic there's no guarantee (to implementers of other output formats) that the version in FOF won't change.  And if the FOF version is the one being called in MultipleOutputs, then it doesn't really matter what other classes implement.  My motivation here is that I have submitted a patch against Cassandra to support MultipleOutputs (see the related issue), and in order to do this I have to duplicate the config key string.  I think it would be cleaner to have a way to access this property through a method call rather than duplicating the key.
                
> Make MultipleOutputs generic to support non-file output formats
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4216
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 1.0.2
>            Reporter: Robbie Strickland
>              Labels: Output
>         Attachments: MAPREDUCE-4216.patch
>
>
> The current MultipleOutputs implementation is tied to FileOutputFormat in such a way that it is not extensible to other types of output. It should be made more generic, such as with an interface that can be implemented for different outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats

Posted by "Robbie Strickland (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robbie Strickland updated MAPREDUCE-4216:
-----------------------------------------

    Attachment: MAPREDUCE-4216.patch

I am submitting a patch that puts a default getOutputName() on OutputFormat to make MultipleOutputs more generic, and to allow access to the config key constant from other OutputFormats besides FileOutputFormat.
                
> Make MultipleOutputs generic to support non-file output formats
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4216
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 1.0.2
>            Reporter: Robbie Strickland
>              Labels: Output
>         Attachments: MAPREDUCE-4216.patch
>
>
> The current MultipleOutputs implementation is tied to FileOutputFormat in such a way that it is not extensible to other types of output. It should be made more generic, such as with an interface that can be implemented for different outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira