You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@slider.apache.org by "Gour Saha (JIRA)" <ji...@apache.org> on 2017/03/06 20:52:33 UTC

[jira] [Updated] (SLIDER-1190) Provide solution to possible memory issues with storing app diagnostics for large no of containers

     [ https://issues.apache.org/jira/browse/SLIDER-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gour Saha updated SLIDER-1190:
------------------------------
    Parent Issue: SLIDER-1216  (was: SLIDER-1185)

> Provide solution to possible memory issues with storing app diagnostics for large no of containers
> --------------------------------------------------------------------------------------------------
>
>                 Key: SLIDER-1190
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1190
>             Project: Slider
>          Issue Type: Sub-task
>          Components: appmaster, client
>    Affects Versions: Slider 0.91
>            Reporter: Gour Saha
>             Fix For: Slider 1.0.0
>
>
> [~billie.rinaldi] raised a very important point on a potential memory issue in SLIDER-1187.
> I wanted to capture her point and my first initial thoughts on it. Let's use this JIRA to discuss further on this topic and find the best solution.
> Billie's question: Do you think this will cause memory issues for long-lived AMs?
> Gour's initial thoughts: I agree with you that any list which is only growing over time is a concern for possible memory issues. However I checked the size of a single container diagnostics payload and it hovers anywhere between 4-5 KB. So for about 100,000 containers it will end up consuming ~500MB. This is at the borderline of acceptability for a 1GB AM container. However for most production clusters I have seen that the min size of a container is set to 4GB or higher. Either way, 100K containers for a single app (even if running for years) is very unlikely but not impossible. We can do couple of things here. 1) Provide an API which can be triggered to drop all container diagnostics of the old/dead containers except n most recent ones (n can be passed as a parameter to the API). 2) Add logic where the AM will cap the no of old/dead containers to a limit of say 10,000 (which will be configurable per application). Nevertheless, if an app is created with 100K+ containers we can still be hosed, but here we are stretching our imaginations too much  Anyway I don't think we should use this patch to solve this. I am going to create a new sub-task for this possible memory issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)