You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "TezQA (JIRA)" <ji...@apache.org> on 2016/09/13 00:06:20 UTC

[jira] [Commented] (TEZ-3215) Support for MultipleOutputs

    [ https://issues.apache.org/jira/browse/TEZ-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485714#comment-15485714 ] 

TezQA commented on TEZ-3215:
----------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment
  http://issues.apache.org/jira/secure/attachment/12828126/TEZ-3215-3.patch
  against master revision a23de49.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 3.0.1) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1966//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1966//console

This message is automatically generated.

> Support for MultipleOutputs
> ---------------------------
>
>                 Key: TEZ-3215
>                 URL: https://issues.apache.org/jira/browse/TEZ-3215
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: TEZ-3215-2.patch, TEZ-3215-3.patch, TEZ-3215.patch
>
>
> Here is the use case. A reducer might write its output to more than one file. The file name will be based on the mapper key. We don't know all possible keys ahead of time. In MR, MultipleOutputs provides such support. I couldn't find anything readily available in Tez.
> * Set up one DataSink per file ahead of time won't work as we don't know all possible keys.
> * Use MR MultipleOutputs directly from the Tez application processor. It isn't clear how to pass TaskInputOutputContext to MultipleOutputs.
> * Tez MROutput can create a DataSink based on the specified outputFormat. But it can't take MR MultipleOutputs.
> I end up modifying Tez MROutput with HashMap {{recordWriters}} to achieve this. If this is a solved problem, can anyone explain how to do it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)