You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mrunit.apache.org by "E. Sammer (JIRA)" <ji...@apache.org> on 2011/05/27 00:18:47 UTC

[jira] [Created] (MRUNIT-13) Add support for MultipleOutputs

Add support for MultipleOutputs
-------------------------------

                 Key: MRUNIT-13
                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
             Project: MRUnit
          Issue Type: Sub-task
    Affects Versions: 0.5.0
            Reporter: E. Sammer
            Assignee: E. Sammer


Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-13) Add support for MultipleOutputs

Posted by "Jim Donofrio (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Donofrio updated MRUNIT-13:
-------------------------------

    Fix Version/s: 1.0.0
    
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>             Fix For: 1.0.0
>
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MRUNIT-13) Add support for MultipleOutputs

Posted by "Jon Grasmeder (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227701#comment-13227701 ] 

Jon Grasmeder edited comment on MRUNIT-13 at 3/12/12 5:17 PM:
--------------------------------------------------------------

Jim - thanks for looking into this!  

If it helps, here is the use case that I am working on: the reducer "bins" input records into outputFileA, outputFileB or both.  MultipleOutputs works great for this, but I couldn't test using MRUnit.  You probably already know this, but the first issue is a ClassCastException in setup() when the MockOutputCommitter is being cast as a FileOutputCommitter.  The second issue is a NullPointerException in reduce() when trying to perform the write(Text, Text, String) using the MultipleOutputs instance.

As for output, I was planning to open a DataInputStream to read the result files written by MultipleOutputs.  As you mentioned, it would be easier for the user if you can return Strings.   

The challenge is that each call to reduce() could 'write' multiple records to several 'files'.  (In my case, I only write a single record each each file but one can envision scenarios that require multiple writes per reduce() call.)  One solution may be to store (in JobConf) a pointer to a HashMap<String> <List>, where the String is the baseOutputPath (modified if needed by the namedOutput parameter) and List is the set of key/value Pair emitted by write(). 

- Jon 
                
      was (Author: gras):
    Jim - thanks for looking into this!  

If it helps, here is the use case that I am working on: the reducer "bins" input records into outputFileA, outputFileB or both.  MultipleOutputs works great for this, but I couldn't test using MRUnit.  You probably already know this, but the first issue is a ClassCastException in setup() when the MockOutputCommitter is being cast as a FileOutputCommitter.  The second issue is a NullPointerException in reduce() when trying to perform the write(Text, Text, String) using the MultipleOutputs instance.

As for output, I was planning to open a DataInputStream to read the result files written by MultipleOutputs.  As you mentioned, it would be easier for the user if you can return Strings.   

The challenge is that each call to reduce() could 'write' multiple records to several 'files'.  (In my case, I only write a single record each each file but one can envision scenarios that require multiple writes per reduce() call.)  One solution may be to store (in JobConf) a pointer to a HashMap<String> <List>, where the String is the baseOutputPath (modified if needed by the namedOutput parameter) and List is the set of key/value Pair emitted by write(). 

- Jon Grasmeder
                  
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: Sub-task
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-13) Add support for MultipleOutputs

Posted by "Jim Donofrio (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Donofrio updated MRUNIT-13:
-------------------------------

    Fix Version/s:     (was: 1.0.0)
    
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: Sub-task
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-13) Add support for MultipleOutputs

Posted by "Brock Noland (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brock Noland updated MRUNIT-13:
-------------------------------

    Fix Version/s:     (was: 0.8.0)
                   1.0.0
    
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: Sub-task
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: E. Sammer
>             Fix For: 1.0.0
>
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-13) Add support for MultipleOutputs

Posted by "E. Sammer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

E. Sammer updated MRUNIT-13:
----------------------------

    Fix Version/s:     (was: 0.5.0)
                   0.8.0

> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: Sub-task
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: E. Sammer
>             Fix For: 0.8.0
>
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MRUNIT-13) Add support for MultipleOutputs

Posted by "Jim Donofrio (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Donofrio reassigned MRUNIT-13:
----------------------------------

    Assignee: Jim Donofrio  (was: E. Sammer)
    
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: Sub-task
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>             Fix For: 1.0.0
>
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MRUNIT-13) Add support for MultipleOutputs

Posted by "Jim Donofrio (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226955#comment-13226955 ] 

Jim Donofrio commented on MRUNIT-13:
------------------------------------

To do this I guess we need to have a MockRecordWriter returned from a MockOutputFormat. Users will specify everything that they usually specify for MultipleOutputs except for the OutputFormat. The MockRecordWriter will store all the results in a list as we do now with the MockOutputCollector. Then the problem is how we pass this list back to the MapDriver or ReduceDriver. The only shared object we would have access to is the JobConf. We can then use the Serialization classes to serialize the objects out to bytes, convert the bytes into a string using their hex representation to prevent the String decode from messing anything up. Then we can store the hex strings in the conf and deserialize them out in the MapDriver or ReduceDriver to verify the actual output matches the requested output for each multipleoutputs type.

I recognize this isnt the cleanest solution but this will be easier for the user and I am not sure this is another way of doing this. Any thoughts?
                
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>             Fix For: 1.0.0
>
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-13) Add support for MultipleOutputs

Posted by "Jim Donofrio (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Donofrio updated MRUNIT-13:
-------------------------------

    Issue Type: New Feature  (was: Sub-task)
        Parent:     (was: MRUNIT-69)
    
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-13) Add support for MultipleOutputs

Posted by "Jim Donofrio (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Donofrio updated MRUNIT-13:
-------------------------------

    Issue Type: New Feature  (was: Sub-task)
        Parent:     (was: MRUNIT-12)
    
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>             Fix For: 1.0.0
>
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MRUNIT-13) Add support for MultipleOutputs

Posted by "Jon Grasmeder (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227701#comment-13227701 ] 

Jon Grasmeder commented on MRUNIT-13:
-------------------------------------

Jim - thanks for looking into this!  

If it helps, here is the use case that I am working on: the reducer "bins" input records into outputFileA, outputFileB or both.  MultipleOutputs works great for this, but I couldn't test using MRUnit.  You probably already know this, but the first issue is a ClassCastException in setup() when the MockOutputCommitter is being cast as a FileOutputCommitter.  The second issue is a NullPointerException in reduce() when trying to perform the write(Text, Text, String) using the MultipleOutputs instance.

As for output, I was planning to open a DataInputStream to read the result files written by MultipleOutputs.  As you mentioned, it would be easier for the user if you can return Strings.   

The challenge is that each call to reduce() could 'write' multiple records to several 'files'.  (In my case, I only write a single record each each file but one can envision scenarios that require multiple writes per reduce() call.)  One solution may be to store (in JobConf) a pointer to a HashMap<String> <List>, where the String is the baseOutputPath (modified if needed by the namedOutput parameter) and List is the set of key/value Pair emitted by write(). 

- Jon Grasmeder
                
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: Sub-task
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MRUNIT-13) Add support for MultipleOutputs

Posted by "Jim Donofrio (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Donofrio updated MRUNIT-13:
-------------------------------

    Issue Type: Sub-task  (was: New Feature)
        Parent: MRUNIT-69
    
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: Sub-task
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>             Fix For: 1.0.0
>
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MRUNIT-13) Add support for MultipleOutputs

Posted by "Jim Donofrio (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255306#comment-13255306 ] 

Jim Donofrio commented on MRUNIT-13:
------------------------------------

This really isnt dependent on the new api. I think it makes sense to get as many of these new features into the API we have now so that when we finally make the new api with the annotation we wont have to patchwork the features in
                
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira