You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2014/10/14 16:14:33 UTC

[jira] [Comment Edited] (TEZ-1566) Reduce log verbosity

    [ https://issues.apache.org/jira/browse/TEZ-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170976#comment-14170976 ] 

Rajesh Balamohan edited comment on TEZ-1566 at 10/14/14 2:14 PM:
-----------------------------------------------------------------

Post some analysis done on local cluster

Package, Size of logs, number of INFO log lines
======================================
(org.apache.tez.runtime.library.common.shuffle.impl.ShuffleInputEventHandler:,1300035426,3252238)
(org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler:,858108834,2962423)
(org.apache.tez.runtime.library.common.sort.impl.IFile:,305599064,1376503)
(org.apache.tez.runtime.library.common.shuffle.impl.Fetcher:,270034851,736752)
(org.apache.tez.runtime.library.common.shuffle.impl.MergeManager:,112706088,395205)
(org.apache.tez.runtime.library.shuffle.common.HttpConnection:,104086919,293698)
(org.apache.tez.runtime.LogicalIOProcessorRuntimeTask:,17632363,78517)
(org.apache.tez.dag.history.HistoryEventHandler:,10456251,8040)


analyzed a job log which ran at 10 TB scale (application_1411511669099_0978):
======================================
Time taken: 2799.673 seconds
(org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler:,4220211264,12618536)
(org.apache.tez.runtime.library.common.shuffle.impl.Fetcher:,2708017450,7370119)
(org.apache.tez.runtime.library.common.shuffle.impl.ShuffleInputEventHandler:,1714279150,3975012)
(org.apache.tez.runtime.library.common.shuffle.impl.MergeManager:,1096099941,3733091)
(org.apache.tez.runtime.library.common.sort.impl.IFile:,838530582,3631986)
(org.apache.tez.runtime.library.shuffle.common.HttpConnection:,646022486,1537542)
(org.apache.tez.runtime.LogicalIOProcessorRuntimeTask:,32617293,141620)
(org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter:,31282080,209802)
(org.apache.tez.dag.history.HistoryEventHandler:,21952366,18959)


Post fix (same job @ 10 TB scale): application_1411511669099_0980
=====================================================
Time taken: 2245.886 seconds
(org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler:,3185060504,9118344)
(org.apache.tez.runtime.library.common.shuffle.impl.Fetcher:,2697984459,7443895)
(org.apache.tez.runtime.library.common.shuffle.impl.MergeManager:,1107250913,3770807)
(org.apache.tez.runtime.library.shuffle.common.HttpConnection:,505282991,1046448)
(org.apache.tez.runtime.LogicalIOProcessorRuntimeTask:,32956738,143419)
(org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter:,26897012,172731)
(org.apache.tez.dag.history.HistoryEventHandler:,21827554,18819)

Job runtime difference is around 25% and overall log size is also reduced by around 30-33%.  Job runtime difference could be mainly attributed by moving logs in ShuffleEventInputHandler, ShuffleScheduler to debug level (attributing to faster processing of events instead of waiting for syncs of log4j). 

Attaching first version of the patch.  [~sseth] - Can you please review? import changes were mainly due to auto-import rearrange of intellj.


was (Author: rajesh.balamohan):
Post some analysis done on local cluster

Package, Size of logs, number of INFO log lines
======================================
(org.apache.tez.runtime.library.common.shuffle.impl.ShuffleInputEventHandler:,1300035426,3252238)
(org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler:,858108834,2962423)
(org.apache.tez.runtime.library.common.sort.impl.IFile:,305599064,1376503)
(org.apache.tez.runtime.library.common.shuffle.impl.Fetcher:,270034851,736752)
(org.apache.tez.runtime.library.common.shuffle.impl.MergeManager:,112706088,395205)
(org.apache.tez.runtime.library.shuffle.common.HttpConnection:,104086919,293698)
(org.apache.tez.runtime.LogicalIOProcessorRuntimeTask:,17632363,78517)
(org.apache.tez.dag.history.HistoryEventHandler:,10456251,8040)


analyzed a job log which ran at 10 TB scale (application_1411511669099_0978):
Time taken: 2799.673 seconds
(org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler:,4220211264,12618536)
(org.apache.tez.runtime.library.common.shuffle.impl.Fetcher:,2708017450,7370119)
(org.apache.tez.runtime.library.common.shuffle.impl.ShuffleInputEventHandler:,1714279150,3975012)
(org.apache.tez.runtime.library.common.shuffle.impl.MergeManager:,1096099941,3733091)
(org.apache.tez.runtime.library.common.sort.impl.IFile:,838530582,3631986)
(org.apache.tez.runtime.library.shuffle.common.HttpConnection:,646022486,1537542)
(org.apache.tez.runtime.LogicalIOProcessorRuntimeTask:,32617293,141620)
(org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter:,31282080,209802)
(org.apache.tez.dag.history.HistoryEventHandler:,21952366,18959)


Post fix (same job @ 10 TB scale): application_1411511669099_0980
Time taken: 2245.886 seconds
(org.apache.tez.runtime.library.common.shuffle.impl.ShuffleScheduler:,3185060504,9118344)
(org.apache.tez.runtime.library.common.shuffle.impl.Fetcher:,2697984459,7443895)
(org.apache.tez.runtime.library.common.shuffle.impl.MergeManager:,1107250913,3770807)
(org.apache.tez.runtime.library.shuffle.common.HttpConnection:,505282991,1046448)
(org.apache.tez.runtime.LogicalIOProcessorRuntimeTask:,32956738,143419)
(org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter:,26897012,172731)
(org.apache.tez.dag.history.HistoryEventHandler:,21827554,18819)

Job runtime difference is around 25% and overall log size is also reduced by around 30-33%.  Job runtime difference could be mainly attributed by moving logs in ShuffleEventInputHandler, ShuffleScheduler to debug level (attributing to faster processing of events instead of waiting for syncs of log4j). 

Attaching first version of the patch.  [~sseth] - Can you please review? import changes were mainly due to auto-import rearrange of intellj.

> Reduce log verbosity
> --------------------
>
>                 Key: TEZ-1566
>                 URL: https://issues.apache.org/jira/browse/TEZ-1566
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-1566.1.patch
>
>
> We generate 40+GB of logs for a simple 20K X 1K sleep job, at INFO level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)