You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Venkat Morampudi <ve...@gmail.com> on 2017/11/03 14:01:31 UTC
Re: Sandbox life cycle /age

Hi Benjamin,

Apologies for the delay. GC seem be working fine. Folders older than 2 hours are being deleted. After change of config and restart, Mesos agent took some time to delete the old folder that are around before the restart. I may have jumped the  gun.

Thanks,
Venkat

> On Oct 30, 2017, at 1:01 PM, Benjamin Mahler <bm...@apache.org> wrote:
> 
> Hi Venkat,
> 
> You're seeing that files with a modification time greater than your gc
> delay of 2 hours are *not* getting deleted? Can you show a full
> listing of /var/lib/mesos/slave/slaves/?
> Is there more than 1 entry there?
> 
> On Fri, Oct 27, 2017 at 8:43 AM, Venkat Morampudi <venkatmorampudi@gmail.com <ma...@gmail.com>
>> wrote:
> 
>> Hi Tomek,
>> 
>> After changing GC delay to 2hrs, the existing sandbox folders that are
>> older than the “Max allowed age” are not deleted. Here are the logs
>> 
>> Logs entire before and after the change:
>> 
>> I1027 15:00:48.055465 12861 slave.cpp:4615] Current disk usage 21.63%. Max
>> allowed age: 1.367499658088657days
>> I1027 15:02:37.720451 30693 slave.cpp:4615] Current disk usage 21.60%. Max
>> allowed age: 1.368035520611667hrs
>> 
>> Executor info from the node:
>> 
>> 
>> [techops@kaiju-dcos-privateslave27 ~]$ date
>> Fri Oct 27 15:41:59 UTC 2017
>> [techops@kaiju-dcos-privateslave27 ~]$ ls -l /var/lib/mesos/slave/slaves/
>> 3fd7ffe9-945b-4ae1-968e-2c397585960c-S201/frameworks/35e600c2-6
>> f43-402c-856f-9084c0040187-002/executors/
>> total 452
>> drwxr-xr-x. 3 root root 4096 Oct 26 09:43 74861.0.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 09:56 74861.10.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 10:31 74867.0.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 11:07 74871.3.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 11:45 74875.10.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 13:43 74886.0.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 13:45 74886.2.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 13:51 74886.8.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 13:56 74886.9.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 13:59 74887.1.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 14:42 74895.0.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 14:58 74899.7.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 14:59 74900.8.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 15:12 74902.0.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 17:05 74938.0.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 17:10 74940.0.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 17:11 74942.0.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 17:30 74944.3.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.0.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 17:45 74951.1.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 18:07 74959.7.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 18:47 74971.0.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 19:18 74981.3.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 20:14 74992.0.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 20:28 74995.3.0
>> drwxr-xr-x. 3 root root 4096 Oct 26 20:59 75001.1.0
>> 
>> 
>> Thanks,
>> Venkat
>> 
>>> On Oct 27, 2017, at 6:42 AM, Tomek Janiszewski <ja...@gmail.com>
>> wrote:
>>> 
>>> Low GC delay menas files will be deleted more often. I don't' think there
>>> will be any performance problem but low GC means you will lose your
>>> sandboxes earlier and they are useful for debugging purposes.
>>> 
>>> pt., 27 paź 2017 o 04:40 użytkownik Venkat Morampudi <
>>> venkatmorampudi@gmail.com <ma...@gmail.com> <mailto:venkatmorampudi@gmail.com <ma...@gmail.com>>> napisał:
>>> 
>>>> Hi Tomek,
>>>> 
>>>> Thanks for the quick reply. After digging a bit into Mesos code we were
>>>> able understand that age actually mean threshold age. Anything older
>> than
>>>> the “age" would be GCed. We are going to try different setting starting
>>>> with "--gc_disk_headroom=.2 --gc_delay=2hrs”. Is there any downside of
>> the
>>>> going with very low GC delay?
>>>> 
>>>> Thanks,
>>>> Venkat
>>>> 
>>>> 
>>>>> On Oct 26, 2017, at 4:28 PM, Tomek Janiszewski <ja...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> gc_delay * max(0.0, (1.0 - gc_disk_headroom - disk usage))
>>>>> 
>>>>> *Example:*
>>>>> gc_delay = 7days
>>>>> gc_disk_headroom = 0.1
>>>>> disk_usage = 0.8
>>>>> 7 * max(0.0, 1 - 0.1 - 0.8) = 7 * max(0.0, 0.1) = 0.7 days = 16 h 48
>> min
>>>>> 
>>>>> Can you show some logs containging information about GC?
>>>>> 
>>>>> pt., 27 paź 2017 o 00:43 użytkownik Venkat Morampudi <
>>>>> venkatmorampudi@gmail.com <ma...@gmail.com> <mailto:venkatmorampudi@gmail.com <ma...@gmail.com>> <mailto:
>> venkatmorampudi@gmail.com <ma...@gmail.com>>> napisał:
>>>>> 
>>>>>> Hello,
>>>>>> In our production env, we noticed that our disk filled up because one
>>>>>> framework had a lot of failed/completed executors folders laying
>> around.
>>>>>> The folders eventually filled up the disk.
>>>>>> 
>>>>>> 
>>>>>> 228M
>>>>>> 
>>>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-
>> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.1.0
>>>>>> 228M
>>>>>> 
>>>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-
>> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52334.2.0
>>>>>> 228M
>>>>>> 
>>>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-
>> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.1.0
>>>>>> 228M
>>>>>> 
>>>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-
>> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52335.2.0
>>>>>> 228M
>>>>>> 
>>>> /mnt/resource/slaves/c8674097-6e67-4609-b022-3e11de380fe5-
>> S2/frameworks/35e600c2-6f43-402c-856f-9084c0040187-002/executors/52336.1.0
>>>>>> 
>>>>>> http://mesos.apache.org/documentation/latest/sandbox/#
>> sandbox-lifecycle <http://mesos.apache.org/documentation/latest/sandbox/# <http://mesos.apache.org/documentation/latest/sandbox/#>
>> sandbox-lifecycle>
>>>> <
>>>>>> http://mesos.apache.org/documentation/latest/sandbox/#
>> sandbox-lifecycle
>>>> <http://mesos.apache.org/documentation/latest/sandbox/#
>> sandbox-lifecycle>>
>>>>>> 
>>>>>> We have our lifecycle clean up set to the default which is 7days, I
>>>>>> believe.
>>>>>> 
>>>>>> We wanted to know if this is the proper way to clean up the
>>>>>> failed/completed executors folders for a running framework?
>>>>>> OR does the framework need to be Inactive or Completed for the garbage
>>>>>> collection to work?
>>>>>> OR does the framework , itself, need to deal with cleaning up its own
>>>>>> executors?
>>>>>> 
>>>>>> Bonus question: How does “gc_disk_headroom” actually work? This
>> equation
>>>>>> will always return 0 it seems. gc_delay * max(0.0, (1.0 -
>>>> gc_disk_headroom
>>>>>> - disk usage))
>>>>>> 
>>>>>> Thanks,
>>>>>> Venkat