You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mrunit.apache.org by "Jon Grasmeder (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/03/12 18:17:39 UTC

[jira] [Issue Comment Edited] (MRUNIT-13) Add support for MultipleOutputs

    [ https://issues.apache.org/jira/browse/MRUNIT-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227701#comment-13227701 ] 

Jon Grasmeder edited comment on MRUNIT-13 at 3/12/12 5:17 PM:
--------------------------------------------------------------

Jim - thanks for looking into this!  

If it helps, here is the use case that I am working on: the reducer "bins" input records into outputFileA, outputFileB or both.  MultipleOutputs works great for this, but I couldn't test using MRUnit.  You probably already know this, but the first issue is a ClassCastException in setup() when the MockOutputCommitter is being cast as a FileOutputCommitter.  The second issue is a NullPointerException in reduce() when trying to perform the write(Text, Text, String) using the MultipleOutputs instance.

As for output, I was planning to open a DataInputStream to read the result files written by MultipleOutputs.  As you mentioned, it would be easier for the user if you can return Strings.   

The challenge is that each call to reduce() could 'write' multiple records to several 'files'.  (In my case, I only write a single record each each file but one can envision scenarios that require multiple writes per reduce() call.)  One solution may be to store (in JobConf) a pointer to a HashMap<String> <List>, where the String is the baseOutputPath (modified if needed by the namedOutput parameter) and List is the set of key/value Pair emitted by write(). 

- Jon 
                
      was (Author: gras):
    Jim - thanks for looking into this!  

If it helps, here is the use case that I am working on: the reducer "bins" input records into outputFileA, outputFileB or both.  MultipleOutputs works great for this, but I couldn't test using MRUnit.  You probably already know this, but the first issue is a ClassCastException in setup() when the MockOutputCommitter is being cast as a FileOutputCommitter.  The second issue is a NullPointerException in reduce() when trying to perform the write(Text, Text, String) using the MultipleOutputs instance.

As for output, I was planning to open a DataInputStream to read the result files written by MultipleOutputs.  As you mentioned, it would be easier for the user if you can return Strings.   

The challenge is that each call to reduce() could 'write' multiple records to several 'files'.  (In my case, I only write a single record each each file but one can envision scenarios that require multiple writes per reduce() call.)  One solution may be to store (in JobConf) a pointer to a HashMap<String> <List>, where the String is the baseOutputPath (modified if needed by the namedOutput parameter) and List is the set of key/value Pair emitted by write(). 

- Jon Grasmeder
                  
> Add support for MultipleOutputs
> -------------------------------
>
>                 Key: MRUNIT-13
>                 URL: https://issues.apache.org/jira/browse/MRUNIT-13
>             Project: MRUnit
>          Issue Type: Sub-task
>    Affects Versions: 0.5.0
>            Reporter: E. Sammer
>            Assignee: Jim Donofrio
>
> Add support to mrunit for Hadoop's MultipleOutputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira