You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2014/06/18 13:17:03 UTC

[jira] [Commented] (MAPREDUCE-5932) Provide an option to use a dedicated reduce-side shuffle log

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035589#comment-14035589 ] 

Hadoop QA commented on MAPREDUCE-5932:
--------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12651062/MAPREDUCE-5932.v01.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

                  org.apache.hadoop.mapreduce.v2.app.job.impl.TestMapReduceChildJVM

                                      The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

org.apache.hadoop.mapred.pipes.TestPipeApplication

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4670//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4670//console

This message is automatically generated.

> Provide an option to use a dedicated reduce-side shuffle log
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-5932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5932
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 2.4.0
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-5932.v01.patch
>
>
> For reducers in large jobs our users cannot easily spot portions of the log associated with problems with their code. An example reducer with INFO-level logging generates ~3500 lines / ~700KiB  lines per second. 95% of the log is the client-side of the shuffle {{org.apache.hadoop.mapreduce.task.reduce.*}}
> {code}
> $ wc syslog 
>     3642   48192  691013 syslog
> $ grep task.reduce syslog | wc 
>     3424   46534  659038
> $ grep task.reduce.ShuffleScheduler syslog | wc 
>     1521   17745  251458
> $ grep task.reduce.Fetcher syslog | wc 
>     1045   15340  223683
> $ grep task.reduce.InMemoryMapOutput syslog | wc 
>      400    4800   72060
> $ grep task.reduce.MergeManagerImpl syslog | wc 
>      432    8200  106555
> {code}
> Byte percentage breakdown:
> {code}
> Shuffle total:           95%
> ShuffleScheduler:        36%
> Fetcher:                 32%
> InMemoryMapOutput:       10%
> MergeManagerImpl:        15%
> {code}
> While this is information is actually often useful for devops debugging shuffle performance issues, the job users are often lost. 
> We propose to have a dedicated syslog.shuffle file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)