You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2015/05/27 01:07:18 UTC
[jira] [Comment Edited] (TEZ-2485) Reduce the Resource Load on the Timeline Server

    [ https://issues.apache.org/jira/browse/TEZ-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560070#comment-14560070 ] 

Jonathan Eagles edited comment on TEZ-2485 at 5/26/15 11:07 PM:
----------------------------------------------------------------

Posted the data storage breakdown by entity type and by column type. The database in the instance was approximately 315MB on disk. Leveldb uses snappy compression so that the expanded key/value breakdown is 508MB/710MB respectively. Another thing to consider is the key overhead per record. Keys are of the form |Entity Type|8bytes for timestamp| Entity Id | column specific data|. To calculate the amount of space utilized by type multiple the type length by the count. The majority of this data was generated using pig.


was (Author: jeagles):
Posted the data storage breakdown by entity type and by column type. The database in the instance was approximately 315MB on disk. Leveldb uses snappy compression so that the expanded key/value breakdown is 508MB/710MB respectively. Another thing to consider is the key overhead per record. Keys are of the form |Entity Type|8bytes for timestamp| Entity Id | column specific data|. To calculate the amount of space utilized by type multiple the type length by the count.

> Reduce the Resource Load on the Timeline Server
> -----------------------------------------------
>
>                 Key: TEZ-2485
>                 URL: https://issues.apache.org/jira/browse/TEZ-2485
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Jonathan Eagles
>
> The disk, network, and memory resources needed by the timeline server are are many times higher than the need for the equivalent mapreduce job. 
> Based on storage improvents YARN-3448, the timeline server may support up to 30,000 jobs / 10,000,000 tasks a
> day.
> While I understand there is community effort on timeline server v2, it
> will be good if Tez can reduce its pressure on the timeline server by
> auditing both the number of events and size of events.
> Here are some observations based on my understanding of the design of
> timeline stores:
> Each timeline entity pushed explodes into many records in the database
> 1 marker record
> 1 domain record
> 1 record per event
> 2 records per related entity
> 2 records per primary filter (2 record per primary filter in
> RollingLevelDBTimelineStore, in leveldb it rewrites entire entity
> records per primary filter )
> 1 record per other info
> For example
> Task Attempt Start
> 1 marker
> 1 domain
> 1 task attempt start event
> 1 related entity X 2
> 7 other info entries
> 4 primary filters X 2
> 20 records written in the database for task attempt start
> Task Attempt Finish
> 1 marker
> 1 domain
> 1 task attempt start event
> 1 related entity X 2
> 5 other info entries
> 5 primary filters X 2
> 20 records written in the database for task attempt finish
> =====================================================
> QUESTION:
> =====================================================
> Is there any data we are publishing to the timeline server that is not
> in the UI?
> Do we use all the entities (TEZ_CONTAINER_ID for example)
> Do we use all the primary filters?
> Do we use all the related entities specified?
> Are there any fields we don't use?
> Are there other approaches to consider to reduce entity count/size?
> Is there a way to store the same information in less space?
> ===================
> Key Value Breakdown
> ||Count||Key Size||Value Size||
> |5642512|533690380|745454867|
> Entity Type Breakdown
> ||Type||Count||Key Size||Value Size||
> |TEZ_CONTAINER_ID|843850|86244392|5654341|
> |applicationAttemptId|544|53248|6174|
> |applicationId|544|44412|6174|
> |TEZ_TASK_ATTEMPT_ID|2471393|239523553|373637209|
> |TEZ_APPLICATION|1048|84312|13057630|
> |containerId|362443|37013813|4135845|
> |TEZ_VERTEX_ID|99239|10387114|1559948|
> |TEZ_DAG_ID|5402|387705|2910830|
> |TEZ_TASK_ID|1762211|146210017|344478400|
> |TEZ_APPLICATION_ATTEMPT|95838|13741814|8316|
> Column Breakdown
> ||Column||Count||Key Size||Value Size||
> |primarykeys|1092413|118768299|0|
> |marker|373515|25740507|2988120|
> |events|578196|55148482|1156392|
> |domain|373515|26114022|15314115|
> |reverserelated|587815|73721347|0|
> |otherinfo|2143751|170983893|725996240|
> |related|493307|63213830|0|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)