You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2015/05/27 00:58:17 UTC
[jira] [Updated] (TEZ-2485) Reduce the Resource Load on the Timeline Server

     [ https://issues.apache.org/jira/browse/TEZ-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Eagles updated TEZ-2485:
---------------------------------
    Description: 
The disk, network, and memory resources needed by the timeline server are are many times higher than the need for the equivalent mapreduce job. 

Based on storage improvents YARN-3448, the timeline server may support up to 30,000 jobs / 10,000,000 tasks a
day.

While I understand there is community effort on timeline server v2, it
will be good if Tez can reduce its pressure on the timeline server by
auditing both the number of events and size of events.

Here are some observations based on my understanding of the design of
timeline stores:

Each timeline entity pushed explodes into many records in the database
1 marker record
1 domain record
1 record per event
2 records per related entity
2 records per primary filter (2 record per primary filter in
RollingLevelDBTimelineStore, in leveldb it rewrites entire entity
records per primary filter )
1 record per other info

For example

Task Attempt Start
1 marker
1 domain
1 task attempt start event
1 related entity X 2
7 other info entries
4 primary filters X 2

20 records written in the database for task attempt start

Task Attempt Finish
1 marker
1 domain
1 task attempt start event
1 related entity X 2
5 other info entries
5 primary filters X 2

20 records written in the database for task attempt finish

=====================================================
QUESTION:
=====================================================

Is there any data we are publishing to the timeline server that is not
in the UI?

Do we use all the entities (TEZ_CONTAINER_ID for example)
Do we use all the primary filters?
Do we use all the related entities specified?
Are there any fields we don't use?
Are there other approaches to consider to reduce entity count/size?
Is there a way to store the same information in less space?

===================
Key Value Breakdown
||Count||Key Size||Value Size||
|5642512|533690380|745454867|

Entity Type Breakdown
||Type||Count||Key Size||Value Size||
|TEZ_CONTAINER_ID|843850|86244392|5654341|
|applicationAttemptId|544|53248|6174|
|applicationId|544|44412|6174|
|TEZ_TASK_ATTEMPT_ID|2471393|239523553|373637209|
|TEZ_APPLICATION|1048|84312|13057630|
|containerId|362443|37013813|4135845|
|TEZ_VERTEX_ID|99239|10387114|1559948|
|TEZ_DAG_ID|5402|387705|2910830|
|TEZ_TASK_ID|1762211|146210017|344478400|
|TEZ_APPLICATION_ATTEMPT|95838|13741814|8316|

Column Breakdown
||Column||Count||Key Size||Value Size||
|primarykeys|1092413|118768299|0|
|marker|373515|25740507|2988120|
|events|578196|55148482|1156392|
|domain|373515|26114022|15314115|
|reverserelated|587815|73721347|0|
|otherinfo|2143751|170983893|725996240|
|related|493307|63213830|0|

  was:
The disk, network, and memory resources needed by the timeline server are are many times higher than the need for the equivalent mapreduce job. 

Based on storage improvents YARN-3448, the timeline server may support up to 30,000 jobs / 10,000,000 tasks a
day.

While I understand there is community effort on timeline server v2, it
will be good if Tez can reduce its pressure on the timeline server by
auditing both the number of events and size of events.

Here are some observations based on my understanding of the design of
timeline stores:

Each timeline entity pushed explodes into many records in the database
1 marker record
1 domain record
1 record per event
2 records per related entity
2 records per primary filter (2 record per primary filter in
RollingLevelDBTimelineStore, in leveldb it rewrites entire entity
records per primary filter )
1 record per other info

For example

Task Attempt Start
1 marker
1 domain
1 task attempt start event
1 related entity X 2
7 other info entries
4 primary filters X 2

20 records written in the database for task attempt start

Task Attempt Finish
1 marker
1 domain
1 task attempt start event
1 related entity X 2
5 other info entries
5 primary filters X 2

20 records written in the database for task attempt finish

=====================================================
QUESTION:
=====================================================

Is there any data we are publishing to the timeline server that is not
in the UI?

Do we use all the entities (TEZ_CONTAINER_ID for example)
Do we use all the primary filters?
Do we use all the related entities specified?
Are there any fields we don't use?
Are there other approaches to consider to reduce entity count/size?
Is there a way to store the same information in less space?


> Reduce the Resource Load on the Timeline Server
> -----------------------------------------------
>
>                 Key: TEZ-2485
>                 URL: https://issues.apache.org/jira/browse/TEZ-2485
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Jonathan Eagles
>
> The disk, network, and memory resources needed by the timeline server are are many times higher than the need for the equivalent mapreduce job. 
> Based on storage improvents YARN-3448, the timeline server may support up to 30,000 jobs / 10,000,000 tasks a
> day.
> While I understand there is community effort on timeline server v2, it
> will be good if Tez can reduce its pressure on the timeline server by
> auditing both the number of events and size of events.
> Here are some observations based on my understanding of the design of
> timeline stores:
> Each timeline entity pushed explodes into many records in the database
> 1 marker record
> 1 domain record
> 1 record per event
> 2 records per related entity
> 2 records per primary filter (2 record per primary filter in
> RollingLevelDBTimelineStore, in leveldb it rewrites entire entity
> records per primary filter )
> 1 record per other info
> For example
> Task Attempt Start
> 1 marker
> 1 domain
> 1 task attempt start event
> 1 related entity X 2
> 7 other info entries
> 4 primary filters X 2
> 20 records written in the database for task attempt start
> Task Attempt Finish
> 1 marker
> 1 domain
> 1 task attempt start event
> 1 related entity X 2
> 5 other info entries
> 5 primary filters X 2
> 20 records written in the database for task attempt finish
> =====================================================
> QUESTION:
> =====================================================
> Is there any data we are publishing to the timeline server that is not
> in the UI?
> Do we use all the entities (TEZ_CONTAINER_ID for example)
> Do we use all the primary filters?
> Do we use all the related entities specified?
> Are there any fields we don't use?
> Are there other approaches to consider to reduce entity count/size?
> Is there a way to store the same information in less space?
> ===================
> Key Value Breakdown
> ||Count||Key Size||Value Size||
> |5642512|533690380|745454867|
> Entity Type Breakdown
> ||Type||Count||Key Size||Value Size||
> |TEZ_CONTAINER_ID|843850|86244392|5654341|
> |applicationAttemptId|544|53248|6174|
> |applicationId|544|44412|6174|
> |TEZ_TASK_ATTEMPT_ID|2471393|239523553|373637209|
> |TEZ_APPLICATION|1048|84312|13057630|
> |containerId|362443|37013813|4135845|
> |TEZ_VERTEX_ID|99239|10387114|1559948|
> |TEZ_DAG_ID|5402|387705|2910830|
> |TEZ_TASK_ID|1762211|146210017|344478400|
> |TEZ_APPLICATION_ATTEMPT|95838|13741814|8316|
> Column Breakdown
> ||Column||Count||Key Size||Value Size||
> |primarykeys|1092413|118768299|0|
> |marker|373515|25740507|2988120|
> |events|578196|55148482|1156392|
> |domain|373515|26114022|15314115|
> |reverserelated|587815|73721347|0|
> |otherinfo|2143751|170983893|725996240|
> |related|493307|63213830|0|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)