You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "TezQA (JIRA)" <ji...@apache.org> on 2017/08/15 05:08:01 UTC

[jira] [Commented] (TEZ-3818) Support a new data routing policy for small partitions

    [ https://issues.apache.org/jira/browse/TEZ-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126805#comment-16126805 ] 

TezQA commented on TEZ-3818:
----------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment
  http://issues.apache.org/jira/secure/attachment/12881863/TEZ-3818.patch
  against master revision 823b1bb.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 3.0.1) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in :
                   org.apache.tez.test.TestExceptionPropagation

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2616//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2616//console

This message is automatically generated.

> Support a new data routing policy for small partitions 
> -------------------------------------------------------
>
>                 Key: TEZ-3818
>                 URL: https://issues.apache.org/jira/browse/TEZ-3818
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: TEZ-3818.patch
>
>
> Under the existing fair shuffle manager data routing policies of fair_parallelism and increase_parallelism, small partitions (total size up to the max desirable limit) are processed together by a single destination task.
> We have the following use case that will prefer having one destination task process one small partition while still having multiple destination tasks process one large partition. When destination vertex is connected to MultiMROutput and the output format is parquet output format, each instance of parquet output stream consumes extra memory. So if a destination task ends up processing lots of small partitions, it ends up exceeding the task memory limit.
> With the new data routing policy, here is the summary of what each data routing policy does.
> * reduce_parallelism. The parallelism is decreased to a desired level by having one destination task process multiple consecutive partitions.
> * fair_parallelism. The parallelism is adjusted to a desired level by having one destination task process multiple consecutive small partitions and multiple destination tasks process one large partition.
> * The new increase_parallelism. The parallelism is increased to a desired level by having one destination task process each small partition and multiple destination tasks process one large partition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)