You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/05/27 19:08:19 UTC

[jira] [Commented] (TEZ-2491) Optimize storage and exchange of Counters for better scaling

    [ https://issues.apache.org/jira/browse/TEZ-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14561283#comment-14561283 ] 

Rohini Palaniswamy commented on TEZ-2491:
-----------------------------------------

[~jlowe] is working on storing tez events to disk and parsing them like mapreduce JHS as ATS does not scale for us with the direct publishing from AMs. Capturing his thoughts below on some of the possible fixes from the discussion we had on the size analysis he did after logging the events to the file.

Analysis:
     - A 40K+ task created a file size of 517.8M. We suspected configuration was taking up a lot of space, but it was only 6MB. Task events had taken up 499MB of space
     - 426mb of the 499mb are finished events. Half of which is attempt finished events. So counters being logged twice is the most of it

Possible fixes:
   - There are some odd counters. "WRONG_REDUCE", "WRONG_MAP", etc. seems like counters that should never be non-zero in practice, so sort of a waste to emit them over and over and over. Realize they _could_ occur, but seems so rare to bother dedicating a counter just for those cases. Would be nice to omit zero counters. Looking up a non-existent counter means you get zero, so why bother storing it explicitly  This could break if people were iterating over Group Counters instead of direct counter lookup. For eg: Pig iterates over counter groups for RANK implementation in mapreduce, but application should handle missing counters as empty maps and reducers will not produce counters in mapreduce.  So that should not be an issue. Or can omit sending send counters when running, but send only on completion (succeeded, failed, killed) in case it might be still required for something like counter drill down in UI or analytics of the jobs themselves later.
  -  Counter display names take a lot of space
{code}
{'counterDisplayName': 'BAD_ID',
                'counterName': 'BAD_ID',
                'counterValue': 0},
{code}
     Can omit if name and display name are same. Will require UI changes. Better would be store the display names only once for all counters for the app.

  Reducing the counter size will also reduce memory usage on AM and allow it to process task events faster.

> Optimize storage and exchange of Counters for better scaling
> ------------------------------------------------------------
>
>                 Key: TEZ-2491
>                 URL: https://issues.apache.org/jira/browse/TEZ-2491
>             Project: Apache Tez
>          Issue Type: Task
>            Reporter: Rohini Palaniswamy
>
>      Counters take up a lot of space in the task events generated and is a major bottleneck for scaling ATS. [~jlowe] found a lot of potential optimizations. Using this as an umbrella jira for documentation. Can create sub-tasks later after discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)