You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Sreenath Somarajapuram (JIRA)" <ji...@apache.org> on 2015/06/10 20:54:01 UTC

[jira] [Comment Edited] (TEZ-2485) Reduce the Resource Load on the Timeline Server

    [ https://issues.apache.org/jira/browse/TEZ-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580948#comment-14580948 ] 

Sreenath Somarajapuram edited comment on TEZ-2485 at 6/10/15 6:53 PM:
----------------------------------------------------------------------

[~jlowe] ats-omit-dup-display-names-and-zero-counters_v2 looks good. Can we have it as part of a sub task so that the same can be added to the codebase.


was (Author: sreenath):
ats-omit-dup-display-names-and-zero-counters_v2 looks good. Can we have it as part of a sub task so that the same can be added to the codebase.

> Reduce the Resource Load on the Timeline Server
> -----------------------------------------------
>
>                 Key: TEZ-2485
>                 URL: https://issues.apache.org/jira/browse/TEZ-2485
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Jonathan Eagles
>         Attachments: TEZ-2485.REMOVE_TEZ_CONTAINER_ID.1.patch, TEZ-2485.SHORTER_ENTITIES.1.patch, ats-omit-dup-display-names-and-zero-counters.patch, ats-omit-dup-display-names-and-zero-counters_v2.patch
>
>
> The disk, network, and memory resources needed by the timeline server are are many times higher than the need for the equivalent mapreduce job. 
> Based on storage improvents YARN-3448, the timeline server may support up to 30,000 jobs / 10,000,000 tasks a
> day.
> While I understand there is community effort on timeline server v2, it
> will be good if Tez can reduce its pressure on the timeline server by
> auditing both the number of events and size of events.
> Here are some observations based on my understanding of the design of
> timeline stores:
> Each timeline entity pushed explodes into many records in the database
> 1 marker record
> 1 domain record
> 1 record per event
> 2 records per related entity
> 2 records per primary filter (2 record per primary filter in
> RollingLevelDBTimelineStore, in leveldb it rewrites entire entity
> records per primary filter )
> 1 record per other info
> For example
> Task Attempt Start
> 1 marker
> 1 domain
> 1 task attempt start event
> 1 related entity X 2
> 7 other info entries
> 4 primary filters X 2
> 20 records written in the database for task attempt start
> Task Attempt Finish
> 1 marker
> 1 domain
> 1 task attempt start event
> 1 related entity X 2
> 5 other info entries
> 5 primary filters X 2
> 20 records written in the database for task attempt finish
> =====================================================
> QUESTION:
> =====================================================
> Is there any data we are publishing to the timeline server that is not
> in the UI?
> Do we use all the entities (TEZ_CONTAINER_ID for example)
> Do we use all the primary filters?
> Do we use all the related entities specified?
> Are there any fields we don't use?
> Are there other approaches to consider to reduce entity count/size?
> Is there a way to store the same information in less space?
> ===================
> Key Value Breakdown
> ||Count||Key Size||Value Size||
> |5642512|533690380|745454867|
> Entity Type Breakdown
> ||Type||Count||Key Size||Value Size||
> |TEZ_CONTAINER_ID|843850|86244392|5654341|
> |applicationAttemptId|544|53248|6174|
> |applicationId|544|44412|6174|
> |TEZ_TASK_ATTEMPT_ID|2471393|239523553|373637209|
> |TEZ_APPLICATION|1048|84312|13057630|
> |containerId|362443|37013813|4135845|
> |TEZ_VERTEX_ID|99239|10387114|1559948|
> |TEZ_DAG_ID|5402|387705|2910830|
> |TEZ_TASK_ID|1762211|146210017|344478400|
> |TEZ_APPLICATION_ATTEMPT|95838|13741814|8316|
> Column Breakdown
> ||Column||Count||Key Size||Value Size||
> |primarykeys|1092413|118768299|0|
> |marker|373515|25740507|2988120|
> |events|578196|55148482|1156392|
> |domain|373515|26114022|15314115|
> |reverserelated|587815|73721347|0|
> |otherinfo|2143751|170983893|725996240|
> |related|493307|63213830|0|
> Other Info Key Breakdown
> ||Key||Count||Key Size||Value Size||
> |appSubmitTime|126|11466|1638|
> |vertexName|349|23732|3081|
> |stats|349|21987|142938|
> |applicationId|163|10106|5705|
> |exitStatus|84337|7337319|84559|
> |endTime|288538|22354866|3750994|
> |counters|204201|15474759|646685059|
> |startTime|204201|15678960|2654613|
> |nodeId|106761|8540880|3950157|
> |initTime|512|32325|6656|
> |numKilledTasks|512|35397|517|
> |timeTaken|204201|15678960|1061085|
> |inProgressLogsURL|106761|9715251|11741572|
> |config|126|8820|13037092|
> |scheduledTime|96928|7172672|1260064|
> |dagPlan|163|9128|2074899|
> |completedLogsURL|106761|9608490|22703699|
> |taskAttemptErrorEnum|15808|1485952|331784|
> |initRequestedTime|349|26175|4537|
> |startRequestedTime|349|26524|4537|
> |numFailedTasks|512|35397|512|
> |vertexNameIdMapping|163|11084|16157|
> |numSucceededTasks|512|36933|1054|
> |numKilledTaskAttempts|512|38981|521|
> |status|204201|15066357|2198349|
> |processorClassName|349|26524|18690|
> |numFailedTaskAttempts|512|38981|512|
> |tezVersion|126|9324|14364|
> |numTasks|349|23034|665|
> |successfulAttemptId|96785|7742800|4355325|
> |nodeHttpAddress|106761|9501729|3950157|
> |numCompletedTasks|512|36933|1056|
> |diagnostics|204201|16087362|915925|
> |containerId|106761|9074685|5017767|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)