You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2015/05/01 04:44:06 UTC

[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

    [ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522686#comment-14522686 ] 

Bikas Saha commented on TEZ-776:
--------------------------------

Thanks for the numbers and trying it out on a cluster. However the comparison is not apples to apples for the following reasons
1) The patch in TEZ-2255 is doing e2e Composite event routing (design 1 in the original design document). So its not creating new DataMovement event objects in the AM. My profiling shows that new object creation is the biggest CPU culprit in this code path.
2) The patch in TEZ-2255 is a POC patch while the patch here is taking care of all cases. A quick look shows at TEZ-2255 shows potential short cuts. Even though the design in TEZ-2255 envisages the creation of a RoutedEvent the patch is currently just modifying the CompositeEvent in place with the target index (which may not be theoretically correct). New object creation eats CPU. Similarly, the target index is being set in the task by using the tasks id which is not a real solution (apart from other things it breaks auto-reduce). It is likely that a full implementation will use more CPU than the currently attached patch on TEZ-2255.

However, the numbers are useful because they show how much gain can be expected to be made after doing e2e composite event routing. I have not done that in this patch since it increases the scope of work but I will do that as a follow up since the API allows for it.

Pragmatically, for the 1-1 case, it cannot be denied that the ODR is doing unnecessary iterations. And clearly, the difference will increase with job size but so will the real work done by a real job of that size instead of an empty job running 100K tasks.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>         Attachments: TEZ-776.1.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)