You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Srikant Kalani <sr...@gmail.com> on 2016/10/04 13:00:46 UTC

Re: Resource Isolation in Mesos

We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to 1.0.1.
After upgrade we are not able to see memory used by task which was fine in
previous version. Due to this cgroups are not effective.

Answers to your questions below :

There is only 1 task running as a appserver which is consuming approx 20G
mem but this info is not coming in Mesos UI.
Swaps are enabled in agent start command.
These flags are used in agent - cgroups_limits_swap=true
--isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
In agent logs I can see updated memory limit to 33MB for container.

Web UI shows the total memory allocated to framework but it is not showing
memory used by task.It always shows 0B/33MB.

Not sure if this is rhel7 issue or mesos 1.0.1.

Any suggestions ?
On 26 Sep 2016 21:55, "haosdent" <ha...@gmail.com> wrote:

> Hi, @Srikant May you elaborate
>
> >We have verified using top command that framework was using 2gB memory
> while allocated was just 50 mb.
>
> * How many running tasks in your framework?
> * Do you enable or disable swap in the agents?
> * What's the flags that you launch agents?
> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to ` in
> the log of agent?
>
> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
> srikant.blackrock@gmail.com> wrote:
>
>> Hi Greg ,
>>
>> Previously we were running Mesos 0.27 on Rhel6 and since we already have
>> one c group hierarchy for cpu and memory for our production  processes I'd
>> we were not able to merge two c groups hierarchy on rhel6. Slave process
>> was not coming up.
>> Now we have moved  to Rhel7 and both mesos master and slave are running
>> on rhel7 with c group implemented.But we are seeing that mesos UI not
>> showing the actual memory used by framework.
>>
>> Any idea why framework usage of cpu and memory is not coming in UI. Due
>> to this OS is still not killing the task which are consuming more memory
>> than the allocated one.
>> We have verified using top command that framework was using 2gB memory
>> while allocated was just 50 mb.
>>
>> Please suggest.
>> On 8 Sep 2016 01:53, "Greg Mann" <gr...@mesosphere.io> wrote:
>>
>>> Hi Srikant,
>>> Without using cgroups, it won't be possible to enforce isolation of
>>> cpu/memory on a Linux agent. Could you elaborate a bit on why you aren't
>>> able to use cgroups currently? Have you tested the existing Mesos cgroup
>>> isolators in your system?
>>>
>>> Cheers,
>>> Greg
>>>
>>> On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani <
>>> srikant.blackrock@gmail.com> wrote:
>>>
>>>> Hi Guys,
>>>>
>>>> We are running Mesos cluster in our development environment. We are
>>>> seeing the cases where framework uses more amount of resources like cpu and
>>>> memory then the initial requested resources. When any new framework is
>>>> registered Mesos calculates the resources on the basis of already offered
>>>> resources to first framework and it doesn't consider actual  resources
>>>> utilised by previous framework.
>>>> This is resulting in incorrect calculation of resources.
>>>> Mesos website says that we should Implement  c groups but it is not
>>>> possible in our case as we have already implemented c groups in other
>>>> projects and due to Linux restrictions  we can't merge two c groups
>>>> hierarchy.
>>>>
>>>> Any idea how we can implement resource Isolation in Mesos ?
>>>>
>>>> We are using Mesos 0.27.1
>>>>
>>>> Thanks
>>>> Srikant Kalani
>>>>
>>>
>>>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: Resource Isolation in Mesos

Posted by haosdent <ha...@gmail.com>.
Check with @Srikant via hangout. It looks the Linux cgroups memory.stat is
incorrect after `chown` cgroup to a normal user.
Would continue to follow up and verify if it is the bug of Mesos cgroups
after @Srikant have any test result in a new machine.
Thanks a lot for @Srikant great helps!

On Thu, Oct 6, 2016 at 8:17 PM, Srikant Kalani <sr...@gmail.com>
wrote:

> Thanks for the detail steps.
>
> We are also using same flags .
>
> Today we ran our task twice. First with the root I'd and it was working
> fine and we were able to implement cgroups .UI was working as expected.
>
> But second time when we ran same task with application I'd cgroup didn't
> work. Memory.stat file provided in your email dont have rss updated value.
>
> Do I need to use any other flags in agent so that non root I'd can also
> follow cgroups.
> On 5 Oct 2016 10:40 p.m., "haosdent" <ha...@gmail.com> wrote:
>
>> > These flags are used in agent - cgroups_limits_swap=true
>> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
>> In agent logs I can see updated memory limit to 33MB for container.
>>
>> Not sure if there are typos or not, some flags name may incorrect. Add
>> according to
>>
>> > "mem_limit_bytes": 1107296256,
>>
>> I think mesos allocated 1107296256 bytes memory (1GB) to your task
>> instead of 33 MB.
>>
>> For the status of `mem_rss_bytes` is zero, let me describe how I test it
>> on my machine, maybe helpful for you to troubleshoot the problem.
>>
>> ```
>> ## Start the master
>> sudo ./bin/mesos-master.sh --ip=111.223.45.25 --hostname=111.223.45.25
>> --work_dir=/tmp/mesos
>> ## Start the agent
>> sudo ./bin/mesos-agent.sh --ip=111.223.45.25 --hostname=111.223.45.25
>> --work_dir=/tmp/mesos --master=111.223.45.25:5050
>> --cgroups_hierarchy=/sys/fs/cgroup --isolation=cgroups/cpu,cgroups/mem
>> --cgroups_limit_swap=true
>> ## Start the task
>> ./src/mesos-execute --master=111.223.45.25:5050 --name="test-single-1"
>> --command="sleep 2000"
>> ```
>>
>> Then query the `/containers` endpoint to get the container id of the task
>>
>> ```
>> $ curl 'http://111.223.45.25:5051/containers' 2>/dev/null |jq .
>> [
>>   {
>>     "container_id": "74fea157-100f-4bf8-b0d0-b65c6e17def1",
>>     "executor_id": "test-single-1",
>>     "executor_name": "Command Executor (Task: test-single-1) (Command: sh
>> -c 'sleep 2000')",
>>     "framework_id": "db9f43ce-0361-4c65-b42f-4dbbefa75ff8-0000",
>>     "source": "test-single-1",
>>     "statistics": {
>>       "cpus_limit": 1.1,
>>       "cpus_system_time_secs": 3.69,
>>       "cpus_user_time_secs": 3.1,
>>       "mem_anon_bytes": 9940992,
>>       "mem_cache_bytes": 8192,
>>       "mem_critical_pressure_counter": 0,
>>       "mem_file_bytes": 8192,
>>       "mem_limit_bytes": 167772160,
>>       "mem_low_pressure_counter": 0,
>>       "mem_mapped_file_bytes": 0,
>>       "mem_medium_pressure_counter": 0,
>>       "mem_rss_bytes": 9940992,
>>       "mem_swap_bytes": 0,
>>       "mem_total_bytes": 10076160,
>>       "mem_total_memsw_bytes": 10076160,
>>       "mem_unevictable_bytes": 0,
>>       "timestamp": 1475686847.54635
>>     },
>>     "status": {
>>       "executor_pid": 2775
>>     }
>>   }
>> ]
>> ```
>>
>> As you see above, the container id is `74fea157-100f-4bf8-b0d0-b65c6e17def1`,
>> so I
>>
>> ```
>> $ cat /sys/fs/cgroup/memory/mesos/74fea157-100f-4bf8-b0d0-b65c6e17
>> def1/memory.stat
>> ```
>>
>> Mesos get the memory statistics from this file for the task. `total_rss`
>> would be parsed as the `"mem_rss_bytes"` field.
>>
>> ```
>> ...
>> hierarchical_memory_limit 167772160
>> hierarchical_memsw_limit 167772160
>> total_rss 9940992
>> ...
>> ```
>>
>> You could check which step above is mismatch with your side and reply
>> this email for future discussion, the problem seems to be the
>> incorrect configuration or launch flags.
>>
>> On Wed, Oct 5, 2016 at 8:46 PM, Srikant Kalani <
>> srikant.blackrock@gmail.com> wrote:
>>
>>> What i can see in http output is mem_rss_bytes is not coming on rhel7.
>>>
>>> Here is the http output :
>>>
>>> Output for Agent running on rhel7
>>>
>>> [{"container\_id":"8062e683\-204c\-40c2\-87ae\-fcc2c3f71b85"
>>> ,"executor\_id":"\*\*\*\*\*","executor\_name":"Command Executor (Task:
>>> \*\*\*\*\*) (Command: sh \-c '\\*\*\*\*\*\*...')","framewor
>>> k\_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0000","sou
>>> rce":"\*\*\*\*\*","statistics":{"cpus\_limit":1.1,"cpus\_
>>> system\_time\_secs":0.01,"cpus\_user\_time\_secs":0.03,"
>>> mem\_anon\_bytes":0,"mem\_cache\_bytes":0,"mem\_critical
>>> \_pressure\_counter":0,"mem\_file\_bytes":0,"mem\_limit\_
>>> bytes":1107296256,"mem\_low\_pressure\_counter":0,"mem\_
>>> mapped\_file\_bytes":0,"mem\_medium\_pressure\_counter":0,"
>>> mem\_rss\_bytes":0,"mem\_swap\_bytes":0,"mem\_total\_bytes":
>>> 0,"mem\_unevictable\_bytes":0,"timestamp":1475668277.62915},
>>> "status":{"executor\_pid":14454}}]
>>>
>>> Output for Agent running on Rhel 6
>>>
>>>   [{"container\_id":"359c0944\-c089\-4d43\-983e\-1f97134fe799"
>>> ,"executor\_id":"\*\*\*\*\*","executor\_name":"Command Executor (Task:
>>> \*\*\*\*\*) (Command: sh \-c '\*\*\*\*\*\*...')","framework
>>> \_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0001","sour
>>> ce":"\*\*\*\*\*","statistics":{"cpus\_limit":8.1,"cpus\_
>>> system\_time\_secs":1.92,"cpus\_user\_time\_secs":6.93,"
>>> mem\_limit\_bytes":1107296256,"mem\_rss\_bytes":2329763840,"
>>> timestamp":1475670762.73402},"status":{"executor\_pid":31577}}]
>>>
>>> Attach are UI screenshot :
>>> Wa002.jpg is for rhel7 and other one is rhel6.
>>> On 5 Oct 2016 4:55 p.m., "haosdent" <ha...@gmail.com> wrote:
>>>
>>>> Hi, @Srikant How about the result of http://${YOUR_AGENT_IP}:5051/containers?
>>>> It is wired that you could saw
>>>>
>>>> ```
>>>> Updated 'memory.limit_in_bytes' to xxx
>>>> ```
>>>>
>>>> in log as you mentioned, but `limit_in_bytes` is still the initialize
>>>> value as you show above.
>>>>
>>>> On Wed, Oct 5, 2016 at 2:04 PM, Srikant Kalani <
>>>> srikant.blackrock@gmail.com> wrote:
>>>>
>>>>> Here are the values -
>>>>> Memory.limit_in_bytes = 1107296256
>>>>> Memory.soft_limit_in_bytes=1107296256
>>>>> Memory.memsw.limit_in_bytes=9223372036854775807
>>>>>
>>>>> I have run the same task on mesos 1.0.1 running on rhel6 and UI then
>>>>> shows task memory usage as 2.2G/1.0G where 2.2 is used and 1.0G is
>>>>> allocated but since we don't have cgroups their so task are not getting
>>>>> killed.
>>>>>
>>>>> On rhel7 UI is showing 0B/1.0G for task memory details.
>>>>>
>>>>> Any idea is this rhel7 fault or do I need to  adjust some
>>>>> configurations ?
>>>>> On 4 Oct 2016 21:33, "haosdent" <ha...@gmail.com> wrote:
>>>>>
>>>>>> Hi, @Srikant
>>>>>>
>>>>>> Hi, @Srikant
>>>>>>
>>>>>> Usually, your task should be killed when over cgroup limit. Would you
>>>>>> enter the `/sys/fs/cgroup/memory/mesos` folder in the agent?
>>>>>> Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`,
>>>>>>  `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and
>>>>>> `${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this
>>>>>> email.
>>>>>>
>>>>>> ${YOUR_CONTAINER_ID} is the container id of your task here, you could
>>>>>> find it from the agent log. Or as you said, you only have this one task, so
>>>>>> it should only have one directory under `/sys/fs/cgroup/memory/mesos`.
>>>>>>
>>>>>> Furthermore, would you show the result of http://
>>>>>> ${YOUR_AGENT_IP}:5051/containers? It contains some tasks statistics
>>>>>> information as well.
>>>>>>
>>>>>> On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani <
>>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>>
>>>>>>> We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to
>>>>>>> 1.0.1.
>>>>>>> After upgrade we are not able to see memory used by task which was
>>>>>>> fine in previous version. Due to this cgroups are not effective.
>>>>>>>
>>>>>>> Answers to your questions below :
>>>>>>>
>>>>>>> There is only 1 task running as a appserver which is consuming
>>>>>>> approx 20G mem but this info is not coming in Mesos UI.
>>>>>>> Swaps are enabled in agent start command.
>>>>>>> These flags are used in agent - cgroups_limits_swap=true
>>>>>>> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c
>>>>>>> group
>>>>>>> In agent logs I can see updated memory limit to 33MB for container.
>>>>>>>
>>>>>>> Web UI shows the total memory allocated to framework but it is not
>>>>>>> showing memory used by task.It always shows 0B/33MB.
>>>>>>>
>>>>>>> Not sure if this is rhel7 issue or mesos 1.0.1.
>>>>>>>
>>>>>>> Any suggestions ?
>>>>>>> On 26 Sep 2016 21:55, "haosdent" <ha...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi, @Srikant May you elaborate
>>>>>>>>
>>>>>>>> >We have verified using top command that framework was using 2gB
>>>>>>>> memory while allocated was just 50 mb.
>>>>>>>>
>>>>>>>> * How many running tasks in your framework?
>>>>>>>> * Do you enable or disable swap in the agents?
>>>>>>>> * What's the flags that you launch agents?
>>>>>>>> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to
>>>>>>>> ` in the log of agent?
>>>>>>>>
>>>>>>>> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
>>>>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Greg ,
>>>>>>>>>
>>>>>>>>> Previously we were running Mesos 0.27 on Rhel6 and since we
>>>>>>>>> already have one c group hierarchy for cpu and memory for our production
>>>>>>>>> processes I'd we were not able to merge two c groups hierarchy on rhel6.
>>>>>>>>> Slave process was not coming up.
>>>>>>>>> Now we have moved  to Rhel7 and both mesos master and slave are
>>>>>>>>> running on rhel7 with c group implemented.But we are seeing that mesos UI
>>>>>>>>> not showing the actual memory used by framework.
>>>>>>>>>
>>>>>>>>> Any idea why framework usage of cpu and memory is not coming in
>>>>>>>>> UI. Due to this OS is still not killing the task which are consuming more
>>>>>>>>> memory than the allocated one.
>>>>>>>>> We have verified using top command that framework was using 2gB
>>>>>>>>> memory while allocated was just 50 mb.
>>>>>>>>>
>>>>>>>>> Please suggest.
>>>>>>>>> On 8 Sep 2016 01:53, "Greg Mann" <gr...@mesosphere.io> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Srikant,
>>>>>>>>>> Without using cgroups, it won't be possible to enforce isolation
>>>>>>>>>> of cpu/memory on a Linux agent. Could you elaborate a bit on why you aren't
>>>>>>>>>> able to use cgroups currently? Have you tested the existing Mesos cgroup
>>>>>>>>>> isolators in your system?
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Greg
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani <
>>>>>>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Guys,
>>>>>>>>>>>
>>>>>>>>>>> We are running Mesos cluster in our development environment. We
>>>>>>>>>>> are seeing the cases where framework uses more amount of resources like cpu
>>>>>>>>>>> and memory then the initial requested resources. When any new framework is
>>>>>>>>>>> registered Mesos calculates the resources on the basis of already offered
>>>>>>>>>>> resources to first framework and it doesn't consider actual  resources
>>>>>>>>>>> utilised by previous framework.
>>>>>>>>>>> This is resulting in incorrect calculation of resources.
>>>>>>>>>>> Mesos website says that we should Implement  c groups but it is
>>>>>>>>>>> not possible in our case as we have already implemented c groups in other
>>>>>>>>>>> projects and due to Linux restrictions  we can't merge two c groups
>>>>>>>>>>> hierarchy.
>>>>>>>>>>>
>>>>>>>>>>> Any idea how we can implement resource Isolation in Mesos ?
>>>>>>>>>>>
>>>>>>>>>>> We are using Mesos 0.27.1
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Srikant Kalani
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>


-- 
Best Regards,
Haosdent Huang

Re: Resource Isolation in Mesos

Posted by Srikant Kalani <sr...@gmail.com>.
Thanks for the detail steps.

We are also using same flags .

Today we ran our task twice. First with the root I'd and it was working
fine and we were able to implement cgroups .UI was working as expected.

But second time when we ran same task with application I'd cgroup didn't
work. Memory.stat file provided in your email dont have rss updated value.

Do I need to use any other flags in agent so that non root I'd can also
follow cgroups.
On 5 Oct 2016 10:40 p.m., "haosdent" <ha...@gmail.com> wrote:

> > These flags are used in agent - cgroups_limits_swap=true
> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
> In agent logs I can see updated memory limit to 33MB for container.
>
> Not sure if there are typos or not, some flags name may incorrect. Add
> according to
>
> > "mem_limit_bytes": 1107296256,
>
> I think mesos allocated 1107296256 bytes memory (1GB) to your task instead
> of 33 MB.
>
> For the status of `mem_rss_bytes` is zero, let me describe how I test it
> on my machine, maybe helpful for you to troubleshoot the problem.
>
> ```
> ## Start the master
> sudo ./bin/mesos-master.sh --ip=111.223.45.25 --hostname=111.223.45.25
> --work_dir=/tmp/mesos
> ## Start the agent
> sudo ./bin/mesos-agent.sh --ip=111.223.45.25 --hostname=111.223.45.25
> --work_dir=/tmp/mesos --master=111.223.45.25:5050
> --cgroups_hierarchy=/sys/fs/cgroup --isolation=cgroups/cpu,cgroups/mem
> --cgroups_limit_swap=true
> ## Start the task
> ./src/mesos-execute --master=111.223.45.25:5050 --name="test-single-1"
> --command="sleep 2000"
> ```
>
> Then query the `/containers` endpoint to get the container id of the task
>
> ```
> $ curl 'http://111.223.45.25:5051/containers' 2>/dev/null |jq .
> [
>   {
>     "container_id": "74fea157-100f-4bf8-b0d0-b65c6e17def1",
>     "executor_id": "test-single-1",
>     "executor_name": "Command Executor (Task: test-single-1) (Command: sh
> -c 'sleep 2000')",
>     "framework_id": "db9f43ce-0361-4c65-b42f-4dbbefa75ff8-0000",
>     "source": "test-single-1",
>     "statistics": {
>       "cpus_limit": 1.1,
>       "cpus_system_time_secs": 3.69,
>       "cpus_user_time_secs": 3.1,
>       "mem_anon_bytes": 9940992,
>       "mem_cache_bytes": 8192,
>       "mem_critical_pressure_counter": 0,
>       "mem_file_bytes": 8192,
>       "mem_limit_bytes": 167772160,
>       "mem_low_pressure_counter": 0,
>       "mem_mapped_file_bytes": 0,
>       "mem_medium_pressure_counter": 0,
>       "mem_rss_bytes": 9940992,
>       "mem_swap_bytes": 0,
>       "mem_total_bytes": 10076160,
>       "mem_total_memsw_bytes": 10076160,
>       "mem_unevictable_bytes": 0,
>       "timestamp": 1475686847.54635
>     },
>     "status": {
>       "executor_pid": 2775
>     }
>   }
> ]
> ```
>
> As you see above, the container id is `74fea157-100f-4bf8-b0d0-b65c6e17def1`,
> so I
>
> ```
> $ cat /sys/fs/cgroup/memory/mesos/74fea157-100f-4bf8-b0d0-
> b65c6e17def1/memory.stat
> ```
>
> Mesos get the memory statistics from this file for the task. `total_rss`
> would be parsed as the `"mem_rss_bytes"` field.
>
> ```
> ...
> hierarchical_memory_limit 167772160
> hierarchical_memsw_limit 167772160
> total_rss 9940992
> ...
> ```
>
> You could check which step above is mismatch with your side and reply this
> email for future discussion, the problem seems to be the
> incorrect configuration or launch flags.
>
> On Wed, Oct 5, 2016 at 8:46 PM, Srikant Kalani <
> srikant.blackrock@gmail.com> wrote:
>
>> What i can see in http output is mem_rss_bytes is not coming on rhel7.
>>
>> Here is the http output :
>>
>> Output for Agent running on rhel7
>>
>> [{"container\_id":"8062e683\-204c\-40c2\-87ae\-fcc2c3f71b85"
>> ,"executor\_id":"\*\*\*\*\*","executor\_name":"Command Executor (Task:
>> \*\*\*\*\*) (Command: sh \-c '\\*\*\*\*\*\*...')","framewor
>> k\_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0000","
>> source":"\*\*\*\*\*","statistics":{"cpus\_limit":1.
>> 1,"cpus\_system\_time\_secs":0.01,"cpus\_user\_time\_secs":
>> 0.03,"mem\_anon\_bytes":0,"mem\_cache\_bytes":0,"mem\_cri
>> tical\_pressure\_counter":0,"mem\_file\_bytes":0,"mem\_limi
>> t\_bytes":1107296256,"mem\_low\_pressure\_counter":0,"mem
>> \_mapped\_file\_bytes":0,"mem\_medium\_pressure\_counter":0,
>> "mem\_rss\_bytes":0,"mem\_swap\_bytes":0,"mem\_total\_
>> bytes":0,"mem\_unevictable\_bytes":0,"timestamp":
>> 1475668277.62915},"status":{"executor\_pid":14454}}]
>>
>> Output for Agent running on Rhel 6
>>
>>   [{"container\_id":"359c0944\-c089\-4d43\-983e\-1f97134fe799"
>> ,"executor\_id":"\*\*\*\*\*","executor\_name":"Command Executor (Task:
>> \*\*\*\*\*) (Command: sh \-c '\*\*\*\*\*\*...')","framework
>> \_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0001","
>> source":"\*\*\*\*\*","statistics":{"cpus\_limit":8.
>> 1,"cpus\_system\_time\_secs":1.92,"cpus\_user\_time\_secs":
>> 6.93,"mem\_limit\_bytes":1107296256,"mem\_rss\_bytes":
>> 2329763840,"timestamp":1475670762.73402},"status":{"
>> executor\_pid":31577}}]
>>
>> Attach are UI screenshot :
>> Wa002.jpg is for rhel7 and other one is rhel6.
>> On 5 Oct 2016 4:55 p.m., "haosdent" <ha...@gmail.com> wrote:
>>
>>> Hi, @Srikant How about the result of http://${YOUR_AGENT_IP}:5051/containers?
>>> It is wired that you could saw
>>>
>>> ```
>>> Updated 'memory.limit_in_bytes' to xxx
>>> ```
>>>
>>> in log as you mentioned, but `limit_in_bytes` is still the initialize
>>> value as you show above.
>>>
>>> On Wed, Oct 5, 2016 at 2:04 PM, Srikant Kalani <
>>> srikant.blackrock@gmail.com> wrote:
>>>
>>>> Here are the values -
>>>> Memory.limit_in_bytes = 1107296256
>>>> Memory.soft_limit_in_bytes=1107296256
>>>> Memory.memsw.limit_in_bytes=9223372036854775807
>>>>
>>>> I have run the same task on mesos 1.0.1 running on rhel6 and UI then
>>>> shows task memory usage as 2.2G/1.0G where 2.2 is used and 1.0G is
>>>> allocated but since we don't have cgroups their so task are not getting
>>>> killed.
>>>>
>>>> On rhel7 UI is showing 0B/1.0G for task memory details.
>>>>
>>>> Any idea is this rhel7 fault or do I need to  adjust some
>>>> configurations ?
>>>> On 4 Oct 2016 21:33, "haosdent" <ha...@gmail.com> wrote:
>>>>
>>>>> Hi, @Srikant
>>>>>
>>>>> Hi, @Srikant
>>>>>
>>>>> Usually, your task should be killed when over cgroup limit. Would you
>>>>> enter the `/sys/fs/cgroup/memory/mesos` folder in the agent?
>>>>> Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`,
>>>>>  `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and
>>>>> `${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this
>>>>> email.
>>>>>
>>>>> ${YOUR_CONTAINER_ID} is the container id of your task here, you could
>>>>> find it from the agent log. Or as you said, you only have this one task, so
>>>>> it should only have one directory under `/sys/fs/cgroup/memory/mesos`.
>>>>>
>>>>> Furthermore, would you show the result of http://
>>>>> ${YOUR_AGENT_IP}:5051/containers? It contains some tasks statistics
>>>>> information as well.
>>>>>
>>>>> On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani <
>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>
>>>>>> We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to
>>>>>> 1.0.1.
>>>>>> After upgrade we are not able to see memory used by task which was
>>>>>> fine in previous version. Due to this cgroups are not effective.
>>>>>>
>>>>>> Answers to your questions below :
>>>>>>
>>>>>> There is only 1 task running as a appserver which is consuming approx
>>>>>> 20G mem but this info is not coming in Mesos UI.
>>>>>> Swaps are enabled in agent start command.
>>>>>> These flags are used in agent - cgroups_limits_swap=true
>>>>>> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c
>>>>>> group
>>>>>> In agent logs I can see updated memory limit to 33MB for container.
>>>>>>
>>>>>> Web UI shows the total memory allocated to framework but it is not
>>>>>> showing memory used by task.It always shows 0B/33MB.
>>>>>>
>>>>>> Not sure if this is rhel7 issue or mesos 1.0.1.
>>>>>>
>>>>>> Any suggestions ?
>>>>>> On 26 Sep 2016 21:55, "haosdent" <ha...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi, @Srikant May you elaborate
>>>>>>>
>>>>>>> >We have verified using top command that framework was using 2gB
>>>>>>> memory while allocated was just 50 mb.
>>>>>>>
>>>>>>> * How many running tasks in your framework?
>>>>>>> * Do you enable or disable swap in the agents?
>>>>>>> * What's the flags that you launch agents?
>>>>>>> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to `
>>>>>>> in the log of agent?
>>>>>>>
>>>>>>> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
>>>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Greg ,
>>>>>>>>
>>>>>>>> Previously we were running Mesos 0.27 on Rhel6 and since we already
>>>>>>>> have one c group hierarchy for cpu and memory for our production  processes
>>>>>>>> I'd we were not able to merge two c groups hierarchy on rhel6. Slave
>>>>>>>> process was not coming up.
>>>>>>>> Now we have moved  to Rhel7 and both mesos master and slave are
>>>>>>>> running on rhel7 with c group implemented.But we are seeing that mesos UI
>>>>>>>> not showing the actual memory used by framework.
>>>>>>>>
>>>>>>>> Any idea why framework usage of cpu and memory is not coming in UI.
>>>>>>>> Due to this OS is still not killing the task which are consuming more
>>>>>>>> memory than the allocated one.
>>>>>>>> We have verified using top command that framework was using 2gB
>>>>>>>> memory while allocated was just 50 mb.
>>>>>>>>
>>>>>>>> Please suggest.
>>>>>>>> On 8 Sep 2016 01:53, "Greg Mann" <gr...@mesosphere.io> wrote:
>>>>>>>>
>>>>>>>>> Hi Srikant,
>>>>>>>>> Without using cgroups, it won't be possible to enforce isolation
>>>>>>>>> of cpu/memory on a Linux agent. Could you elaborate a bit on why you aren't
>>>>>>>>> able to use cgroups currently? Have you tested the existing Mesos cgroup
>>>>>>>>> isolators in your system?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Greg
>>>>>>>>>
>>>>>>>>> On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani <
>>>>>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Guys,
>>>>>>>>>>
>>>>>>>>>> We are running Mesos cluster in our development environment. We
>>>>>>>>>> are seeing the cases where framework uses more amount of resources like cpu
>>>>>>>>>> and memory then the initial requested resources. When any new framework is
>>>>>>>>>> registered Mesos calculates the resources on the basis of already offered
>>>>>>>>>> resources to first framework and it doesn't consider actual  resources
>>>>>>>>>> utilised by previous framework.
>>>>>>>>>> This is resulting in incorrect calculation of resources.
>>>>>>>>>> Mesos website says that we should Implement  c groups but it is
>>>>>>>>>> not possible in our case as we have already implemented c groups in other
>>>>>>>>>> projects and due to Linux restrictions  we can't merge two c groups
>>>>>>>>>> hierarchy.
>>>>>>>>>>
>>>>>>>>>> Any idea how we can implement resource Isolation in Mesos ?
>>>>>>>>>>
>>>>>>>>>> We are using Mesos 0.27.1
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Srikant Kalani
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: Resource Isolation in Mesos

Posted by haosdent <ha...@gmail.com>.
> These flags are used in agent - cgroups_limits_swap=true
--isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
In agent logs I can see updated memory limit to 33MB for container.

Not sure if there are typos or not, some flags name may incorrect. Add
according to

> "mem_limit_bytes": 1107296256,

I think mesos allocated 1107296256 bytes memory (1GB) to your task instead
of 33 MB.

For the status of `mem_rss_bytes` is zero, let me describe how I test it on
my machine, maybe helpful for you to troubleshoot the problem.

```
## Start the master
sudo ./bin/mesos-master.sh --ip=111.223.45.25 --hostname=111.223.45.25
--work_dir=/tmp/mesos
## Start the agent
sudo ./bin/mesos-agent.sh --ip=111.223.45.25 --hostname=111.223.45.25
--work_dir=/tmp/mesos --master=111.223.45.25:5050
--cgroups_hierarchy=/sys/fs/cgroup --isolation=cgroups/cpu,cgroups/mem
--cgroups_limit_swap=true
## Start the task
./src/mesos-execute --master=111.223.45.25:5050 --name="test-single-1"
--command="sleep 2000"
```

Then query the `/containers` endpoint to get the container id of the task

```
$ curl 'http://111.223.45.25:5051/containers' 2>/dev/null |jq .
[
  {
    "container_id": "74fea157-100f-4bf8-b0d0-b65c6e17def1",
    "executor_id": "test-single-1",
    "executor_name": "Command Executor (Task: test-single-1) (Command: sh
-c 'sleep 2000')",
    "framework_id": "db9f43ce-0361-4c65-b42f-4dbbefa75ff8-0000",
    "source": "test-single-1",
    "statistics": {
      "cpus_limit": 1.1,
      "cpus_system_time_secs": 3.69,
      "cpus_user_time_secs": 3.1,
      "mem_anon_bytes": 9940992,
      "mem_cache_bytes": 8192,
      "mem_critical_pressure_counter": 0,
      "mem_file_bytes": 8192,
      "mem_limit_bytes": 167772160,
      "mem_low_pressure_counter": 0,
      "mem_mapped_file_bytes": 0,
      "mem_medium_pressure_counter": 0,
      "mem_rss_bytes": 9940992,
      "mem_swap_bytes": 0,
      "mem_total_bytes": 10076160,
      "mem_total_memsw_bytes": 10076160,
      "mem_unevictable_bytes": 0,
      "timestamp": 1475686847.54635
    },
    "status": {
      "executor_pid": 2775
    }
  }
]
```

As you see above, the container id is
`74fea157-100f-4bf8-b0d0-b65c6e17def1`, so I

```
$ cat
/sys/fs/cgroup/memory/mesos/74fea157-100f-4bf8-b0d0-b65c6e17def1/memory.stat
```

Mesos get the memory statistics from this file for the task. `total_rss`
would be parsed as the `"mem_rss_bytes"` field.

```
...
hierarchical_memory_limit 167772160
hierarchical_memsw_limit 167772160
total_rss 9940992
...
```

You could check which step above is mismatch with your side and reply this
email for future discussion, the problem seems to be the
incorrect configuration or launch flags.

On Wed, Oct 5, 2016 at 8:46 PM, Srikant Kalani <sr...@gmail.com>
wrote:

> What i can see in http output is mem_rss_bytes is not coming on rhel7.
>
> Here is the http output :
>
> Output for Agent running on rhel7
>
> [{"container\_id":"8062e683\-204c\-40c2\-87ae\-
> fcc2c3f71b85","executor\_id":"\*\*\*\*\*","executor\_name":"Command
> Executor (Task: \*\*\*\*\*) (Command: sh \-c '\\*\*\*\*\*\*...')","
> framework\_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0000","source":"
> \*\*\*\*\*","statistics":{"cpus\_limit":1.1,"cpus\_
> system\_time\_secs":0.01,"cpus\_user\_time\_secs":0.03,"
> mem\_anon\_bytes":0,"mem\_cache\_bytes":0,"mem\_
> critical\_pressure\_counter":0,"mem\_file\_bytes":0,"mem\_
> limit\_bytes":1107296256,"mem\_low\_pressure\_counter":0,"
> mem\_mapped\_file\_bytes":0,"mem\_medium\_pressure\_
> counter":0,"mem\_rss\_bytes":0,"mem\_swap\_bytes":0,"mem\_
> total\_bytes":0,"mem\_unevictable\_bytes":0,"
> timestamp":1475668277.62915},"status":{"executor\_pid":14454}}]
>
> Output for Agent running on Rhel 6
>
>   [{"container\_id":"359c0944\-c089\-4d43\-983e\-
> 1f97134fe799","executor\_id":"\*\*\*\*\*","executor\_name":"Command
> Executor (Task: \*\*\*\*\*) (Command: sh \-c '\*\*\*\*\*\*...')","
> framework\_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0001","source":"
> \*\*\*\*\*","statistics":{"cpus\_limit":8.1,"cpus\_
> system\_time\_secs":1.92,"cpus\_user\_time\_secs":6.93,"
> mem\_limit\_bytes":1107296256,"mem\_rss\_bytes":2329763840,"
> timestamp":1475670762.73402},"status":{"executor\_pid":31577}}]
>
> Attach are UI screenshot :
> Wa002.jpg is for rhel7 and other one is rhel6.
> On 5 Oct 2016 4:55 p.m., "haosdent" <ha...@gmail.com> wrote:
>
>> Hi, @Srikant How about the result of http://${YOUR_AGENT_IP}:5051/containers?
>> It is wired that you could saw
>>
>> ```
>> Updated 'memory.limit_in_bytes' to xxx
>> ```
>>
>> in log as you mentioned, but `limit_in_bytes` is still the initialize
>> value as you show above.
>>
>> On Wed, Oct 5, 2016 at 2:04 PM, Srikant Kalani <
>> srikant.blackrock@gmail.com> wrote:
>>
>>> Here are the values -
>>> Memory.limit_in_bytes = 1107296256
>>> Memory.soft_limit_in_bytes=1107296256
>>> Memory.memsw.limit_in_bytes=9223372036854775807
>>>
>>> I have run the same task on mesos 1.0.1 running on rhel6 and UI then
>>> shows task memory usage as 2.2G/1.0G where 2.2 is used and 1.0G is
>>> allocated but since we don't have cgroups their so task are not getting
>>> killed.
>>>
>>> On rhel7 UI is showing 0B/1.0G for task memory details.
>>>
>>> Any idea is this rhel7 fault or do I need to  adjust some configurations
>>> ?
>>> On 4 Oct 2016 21:33, "haosdent" <ha...@gmail.com> wrote:
>>>
>>>> Hi, @Srikant
>>>>
>>>> Hi, @Srikant
>>>>
>>>> Usually, your task should be killed when over cgroup limit. Would you
>>>> enter the `/sys/fs/cgroup/memory/mesos` folder in the agent?
>>>> Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`,
>>>>  `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and
>>>> `${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this
>>>> email.
>>>>
>>>> ${YOUR_CONTAINER_ID} is the container id of your task here, you could
>>>> find it from the agent log. Or as you said, you only have this one task, so
>>>> it should only have one directory under `/sys/fs/cgroup/memory/mesos`.
>>>>
>>>> Furthermore, would you show the result of http://
>>>> ${YOUR_AGENT_IP}:5051/containers? It contains some tasks statistics
>>>> information as well.
>>>>
>>>> On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani <
>>>> srikant.blackrock@gmail.com> wrote:
>>>>
>>>>> We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to
>>>>> 1.0.1.
>>>>> After upgrade we are not able to see memory used by task which was
>>>>> fine in previous version. Due to this cgroups are not effective.
>>>>>
>>>>> Answers to your questions below :
>>>>>
>>>>> There is only 1 task running as a appserver which is consuming approx
>>>>> 20G mem but this info is not coming in Mesos UI.
>>>>> Swaps are enabled in agent start command.
>>>>> These flags are used in agent - cgroups_limits_swap=true
>>>>> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
>>>>> In agent logs I can see updated memory limit to 33MB for container.
>>>>>
>>>>> Web UI shows the total memory allocated to framework but it is not
>>>>> showing memory used by task.It always shows 0B/33MB.
>>>>>
>>>>> Not sure if this is rhel7 issue or mesos 1.0.1.
>>>>>
>>>>> Any suggestions ?
>>>>> On 26 Sep 2016 21:55, "haosdent" <ha...@gmail.com> wrote:
>>>>>
>>>>>> Hi, @Srikant May you elaborate
>>>>>>
>>>>>> >We have verified using top command that framework was using 2gB
>>>>>> memory while allocated was just 50 mb.
>>>>>>
>>>>>> * How many running tasks in your framework?
>>>>>> * Do you enable or disable swap in the agents?
>>>>>> * What's the flags that you launch agents?
>>>>>> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to `
>>>>>> in the log of agent?
>>>>>>
>>>>>> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
>>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Greg ,
>>>>>>>
>>>>>>> Previously we were running Mesos 0.27 on Rhel6 and since we already
>>>>>>> have one c group hierarchy for cpu and memory for our production  processes
>>>>>>> I'd we were not able to merge two c groups hierarchy on rhel6. Slave
>>>>>>> process was not coming up.
>>>>>>> Now we have moved  to Rhel7 and both mesos master and slave are
>>>>>>> running on rhel7 with c group implemented.But we are seeing that mesos UI
>>>>>>> not showing the actual memory used by framework.
>>>>>>>
>>>>>>> Any idea why framework usage of cpu and memory is not coming in UI.
>>>>>>> Due to this OS is still not killing the task which are consuming more
>>>>>>> memory than the allocated one.
>>>>>>> We have verified using top command that framework was using 2gB
>>>>>>> memory while allocated was just 50 mb.
>>>>>>>
>>>>>>> Please suggest.
>>>>>>> On 8 Sep 2016 01:53, "Greg Mann" <gr...@mesosphere.io> wrote:
>>>>>>>
>>>>>>>> Hi Srikant,
>>>>>>>> Without using cgroups, it won't be possible to enforce isolation of
>>>>>>>> cpu/memory on a Linux agent. Could you elaborate a bit on why you aren't
>>>>>>>> able to use cgroups currently? Have you tested the existing Mesos cgroup
>>>>>>>> isolators in your system?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Greg
>>>>>>>>
>>>>>>>> On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani <
>>>>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Guys,
>>>>>>>>>
>>>>>>>>> We are running Mesos cluster in our development environment. We
>>>>>>>>> are seeing the cases where framework uses more amount of resources like cpu
>>>>>>>>> and memory then the initial requested resources. When any new framework is
>>>>>>>>> registered Mesos calculates the resources on the basis of already offered
>>>>>>>>> resources to first framework and it doesn't consider actual  resources
>>>>>>>>> utilised by previous framework.
>>>>>>>>> This is resulting in incorrect calculation of resources.
>>>>>>>>> Mesos website says that we should Implement  c groups but it is
>>>>>>>>> not possible in our case as we have already implemented c groups in other
>>>>>>>>> projects and due to Linux restrictions  we can't merge two c groups
>>>>>>>>> hierarchy.
>>>>>>>>>
>>>>>>>>> Any idea how we can implement resource Isolation in Mesos ?
>>>>>>>>>
>>>>>>>>> We are using Mesos 0.27.1
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Srikant Kalani
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>


-- 
Best Regards,
Haosdent Huang

Re: Resource Isolation in Mesos

Posted by Srikant Kalani <sr...@gmail.com>.
What i can see in http output is mem_rss_bytes is not coming on rhel7.

Here is the http output :

Output for Agent running on rhel7

[{"container\_id":"8062e683\-204c\-40c2\-87ae\-fcc2c3f71b85","executor\_id":"\*\*\*\*\*","executor\_name":"Command
Executor (Task: \*\*\*\*\*) (Command: sh \-c
'\\*\*\*\*\*\*...')","framework\_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0000","source":"\*\*\*\*\*","statistics":{"cpus\_limit":1.1,"cpus\_system\_time\_secs":0.01,"cpus\_user\_time\_secs":0.03,"mem\_anon\_bytes":0,"mem\_cache\_bytes":0,"mem\_critical\_pressure\_counter":0,"mem\_file\_bytes":0,"mem\_limit\_bytes":1107296256,"mem\_low\_pressure\_counter":0,"mem\_mapped\_file\_bytes":0,"mem\_medium\_pressure\_counter":0,"mem\_rss\_bytes":0,"mem\_swap\_bytes":0,"mem\_total\_bytes":0,"mem\_unevictable\_bytes":0,"timestamp":1475668277.62915},"status":{"executor\_pid":14454}}]

Output for Agent running on Rhel 6


[{"container\_id":"359c0944\-c089\-4d43\-983e\-1f97134fe799","executor\_id":"\*\*\*\*\*","executor\_name":"Command
Executor (Task: \*\*\*\*\*) (Command: sh \-c
'\*\*\*\*\*\*...')","framework\_id":"edbffd6d\-b274\-4cb1\-b386\-2362ed2af517\-0001","source":"\*\*\*\*\*","statistics":{"cpus\_limit":8.1,"cpus\_system\_time\_secs":1.92,"cpus\_user\_time\_secs":6.93,"mem\_limit\_bytes":1107296256,"mem\_rss\_bytes":2329763840,"timestamp":1475670762.73402},"status":{"executor\_pid":31577}}]

Attach are UI screenshot :
Wa002.jpg is for rhel7 and other one is rhel6.
On 5 Oct 2016 4:55 p.m., "haosdent" <ha...@gmail.com> wrote:

> Hi, @Srikant How about the result of http://${YOUR_AGENT_IP}:5051/containers?
> It is wired that you could saw
>
> ```
> Updated 'memory.limit_in_bytes' to xxx
> ```
>
> in log as you mentioned, but `limit_in_bytes` is still the initialize
> value as you show above.
>
> On Wed, Oct 5, 2016 at 2:04 PM, Srikant Kalani <
> srikant.blackrock@gmail.com> wrote:
>
>> Here are the values -
>> Memory.limit_in_bytes = 1107296256
>> Memory.soft_limit_in_bytes=1107296256
>> Memory.memsw.limit_in_bytes=9223372036854775807
>>
>> I have run the same task on mesos 1.0.1 running on rhel6 and UI then
>> shows task memory usage as 2.2G/1.0G where 2.2 is used and 1.0G is
>> allocated but since we don't have cgroups their so task are not getting
>> killed.
>>
>> On rhel7 UI is showing 0B/1.0G for task memory details.
>>
>> Any idea is this rhel7 fault or do I need to  adjust some configurations ?
>> On 4 Oct 2016 21:33, "haosdent" <ha...@gmail.com> wrote:
>>
>>> Hi, @Srikant
>>>
>>> Hi, @Srikant
>>>
>>> Usually, your task should be killed when over cgroup limit. Would you
>>> enter the `/sys/fs/cgroup/memory/mesos` folder in the agent?
>>> Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`,
>>>  `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and
>>> `${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this
>>> email.
>>>
>>> ${YOUR_CONTAINER_ID} is the container id of your task here, you could
>>> find it from the agent log. Or as you said, you only have this one task, so
>>> it should only have one directory under `/sys/fs/cgroup/memory/mesos`.
>>>
>>> Furthermore, would you show the result of http://${YOUR_AGENT_IP}:5051/containers?
>>> It contains some tasks statistics information as well.
>>>
>>> On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani <
>>> srikant.blackrock@gmail.com> wrote:
>>>
>>>> We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to 1.0.1.
>>>> After upgrade we are not able to see memory used by task which was fine
>>>> in previous version. Due to this cgroups are not effective.
>>>>
>>>> Answers to your questions below :
>>>>
>>>> There is only 1 task running as a appserver which is consuming approx
>>>> 20G mem but this info is not coming in Mesos UI.
>>>> Swaps are enabled in agent start command.
>>>> These flags are used in agent - cgroups_limits_swap=true
>>>> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
>>>> In agent logs I can see updated memory limit to 33MB for container.
>>>>
>>>> Web UI shows the total memory allocated to framework but it is not
>>>> showing memory used by task.It always shows 0B/33MB.
>>>>
>>>> Not sure if this is rhel7 issue or mesos 1.0.1.
>>>>
>>>> Any suggestions ?
>>>> On 26 Sep 2016 21:55, "haosdent" <ha...@gmail.com> wrote:
>>>>
>>>>> Hi, @Srikant May you elaborate
>>>>>
>>>>> >We have verified using top command that framework was using 2gB
>>>>> memory while allocated was just 50 mb.
>>>>>
>>>>> * How many running tasks in your framework?
>>>>> * Do you enable or disable swap in the agents?
>>>>> * What's the flags that you launch agents?
>>>>> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to `
>>>>> in the log of agent?
>>>>>
>>>>> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>
>>>>>> Hi Greg ,
>>>>>>
>>>>>> Previously we were running Mesos 0.27 on Rhel6 and since we already
>>>>>> have one c group hierarchy for cpu and memory for our production  processes
>>>>>> I'd we were not able to merge two c groups hierarchy on rhel6. Slave
>>>>>> process was not coming up.
>>>>>> Now we have moved  to Rhel7 and both mesos master and slave are
>>>>>> running on rhel7 with c group implemented.But we are seeing that mesos UI
>>>>>> not showing the actual memory used by framework.
>>>>>>
>>>>>> Any idea why framework usage of cpu and memory is not coming in UI.
>>>>>> Due to this OS is still not killing the task which are consuming more
>>>>>> memory than the allocated one.
>>>>>> We have verified using top command that framework was using 2gB
>>>>>> memory while allocated was just 50 mb.
>>>>>>
>>>>>> Please suggest.
>>>>>> On 8 Sep 2016 01:53, "Greg Mann" <gr...@mesosphere.io> wrote:
>>>>>>
>>>>>>> Hi Srikant,
>>>>>>> Without using cgroups, it won't be possible to enforce isolation of
>>>>>>> cpu/memory on a Linux agent. Could you elaborate a bit on why you aren't
>>>>>>> able to use cgroups currently? Have you tested the existing Mesos cgroup
>>>>>>> isolators in your system?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Greg
>>>>>>>
>>>>>>> On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani <
>>>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Guys,
>>>>>>>>
>>>>>>>> We are running Mesos cluster in our development environment. We are
>>>>>>>> seeing the cases where framework uses more amount of resources like cpu and
>>>>>>>> memory then the initial requested resources. When any new framework is
>>>>>>>> registered Mesos calculates the resources on the basis of already offered
>>>>>>>> resources to first framework and it doesn't consider actual  resources
>>>>>>>> utilised by previous framework.
>>>>>>>> This is resulting in incorrect calculation of resources.
>>>>>>>> Mesos website says that we should Implement  c groups but it is not
>>>>>>>> possible in our case as we have already implemented c groups in other
>>>>>>>> projects and due to Linux restrictions  we can't merge two c groups
>>>>>>>> hierarchy.
>>>>>>>>
>>>>>>>> Any idea how we can implement resource Isolation in Mesos ?
>>>>>>>>
>>>>>>>> We are using Mesos 0.27.1
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Srikant Kalani
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: Resource Isolation in Mesos

Posted by haosdent <ha...@gmail.com>.
Hi, @Srikant How about the result of http://${YOUR_AGENT_IP}:5051/containers?
It is wired that you could saw

```
Updated 'memory.limit_in_bytes' to xxx
```

in log as you mentioned, but `limit_in_bytes` is still the initialize value
as you show above.

On Wed, Oct 5, 2016 at 2:04 PM, Srikant Kalani <sr...@gmail.com>
wrote:

> Here are the values -
> Memory.limit_in_bytes = 1107296256
> Memory.soft_limit_in_bytes=1107296256
> Memory.memsw.limit_in_bytes=9223372036854775807
>
> I have run the same task on mesos 1.0.1 running on rhel6 and UI then shows
> task memory usage as 2.2G/1.0G where 2.2 is used and 1.0G is allocated but
> since we don't have cgroups their so task are not getting killed.
>
> On rhel7 UI is showing 0B/1.0G for task memory details.
>
> Any idea is this rhel7 fault or do I need to  adjust some configurations ?
> On 4 Oct 2016 21:33, "haosdent" <ha...@gmail.com> wrote:
>
>> Hi, @Srikant
>>
>> Hi, @Srikant
>>
>> Usually, your task should be killed when over cgroup limit. Would you
>> enter the `/sys/fs/cgroup/memory/mesos` folder in the agent?
>> Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`,
>>  `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and
>> `${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this
>> email.
>>
>> ${YOUR_CONTAINER_ID} is the container id of your task here, you could
>> find it from the agent log. Or as you said, you only have this one task, so
>> it should only have one directory under `/sys/fs/cgroup/memory/mesos`.
>>
>> Furthermore, would you show the result of http://${YOUR_AGENT_IP}:5051/containers?
>> It contains some tasks statistics information as well.
>>
>> On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani <
>> srikant.blackrock@gmail.com> wrote:
>>
>>> We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to 1.0.1.
>>> After upgrade we are not able to see memory used by task which was fine
>>> in previous version. Due to this cgroups are not effective.
>>>
>>> Answers to your questions below :
>>>
>>> There is only 1 task running as a appserver which is consuming approx
>>> 20G mem but this info is not coming in Mesos UI.
>>> Swaps are enabled in agent start command.
>>> These flags are used in agent - cgroups_limits_swap=true
>>> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
>>> In agent logs I can see updated memory limit to 33MB for container.
>>>
>>> Web UI shows the total memory allocated to framework but it is not
>>> showing memory used by task.It always shows 0B/33MB.
>>>
>>> Not sure if this is rhel7 issue or mesos 1.0.1.
>>>
>>> Any suggestions ?
>>> On 26 Sep 2016 21:55, "haosdent" <ha...@gmail.com> wrote:
>>>
>>>> Hi, @Srikant May you elaborate
>>>>
>>>> >We have verified using top command that framework was using 2gB
>>>> memory while allocated was just 50 mb.
>>>>
>>>> * How many running tasks in your framework?
>>>> * Do you enable or disable swap in the agents?
>>>> * What's the flags that you launch agents?
>>>> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to ` in
>>>> the log of agent?
>>>>
>>>> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
>>>> srikant.blackrock@gmail.com> wrote:
>>>>
>>>>> Hi Greg ,
>>>>>
>>>>> Previously we were running Mesos 0.27 on Rhel6 and since we already
>>>>> have one c group hierarchy for cpu and memory for our production  processes
>>>>> I'd we were not able to merge two c groups hierarchy on rhel6. Slave
>>>>> process was not coming up.
>>>>> Now we have moved  to Rhel7 and both mesos master and slave are
>>>>> running on rhel7 with c group implemented.But we are seeing that mesos UI
>>>>> not showing the actual memory used by framework.
>>>>>
>>>>> Any idea why framework usage of cpu and memory is not coming in UI.
>>>>> Due to this OS is still not killing the task which are consuming more
>>>>> memory than the allocated one.
>>>>> We have verified using top command that framework was using 2gB memory
>>>>> while allocated was just 50 mb.
>>>>>
>>>>> Please suggest.
>>>>> On 8 Sep 2016 01:53, "Greg Mann" <gr...@mesosphere.io> wrote:
>>>>>
>>>>>> Hi Srikant,
>>>>>> Without using cgroups, it won't be possible to enforce isolation of
>>>>>> cpu/memory on a Linux agent. Could you elaborate a bit on why you aren't
>>>>>> able to use cgroups currently? Have you tested the existing Mesos cgroup
>>>>>> isolators in your system?
>>>>>>
>>>>>> Cheers,
>>>>>> Greg
>>>>>>
>>>>>> On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani <
>>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Guys,
>>>>>>>
>>>>>>> We are running Mesos cluster in our development environment. We are
>>>>>>> seeing the cases where framework uses more amount of resources like cpu and
>>>>>>> memory then the initial requested resources. When any new framework is
>>>>>>> registered Mesos calculates the resources on the basis of already offered
>>>>>>> resources to first framework and it doesn't consider actual  resources
>>>>>>> utilised by previous framework.
>>>>>>> This is resulting in incorrect calculation of resources.
>>>>>>> Mesos website says that we should Implement  c groups but it is not
>>>>>>> possible in our case as we have already implemented c groups in other
>>>>>>> projects and due to Linux restrictions  we can't merge two c groups
>>>>>>> hierarchy.
>>>>>>>
>>>>>>> Any idea how we can implement resource Isolation in Mesos ?
>>>>>>>
>>>>>>> We are using Mesos 0.27.1
>>>>>>>
>>>>>>> Thanks
>>>>>>> Srikant Kalani
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>


-- 
Best Regards,
Haosdent Huang

Re: Resource Isolation in Mesos

Posted by Srikant Kalani <sr...@gmail.com>.
Here are the values -
Memory.limit_in_bytes = 1107296256
Memory.soft_limit_in_bytes=1107296256
Memory.memsw.limit_in_bytes=9223372036854775807

I have run the same task on mesos 1.0.1 running on rhel6 and UI then shows
task memory usage as 2.2G/1.0G where 2.2 is used and 1.0G is allocated but
since we don't have cgroups their so task are not getting killed.

On rhel7 UI is showing 0B/1.0G for task memory details.

Any idea is this rhel7 fault or do I need to  adjust some configurations ?
On 4 Oct 2016 21:33, "haosdent" <ha...@gmail.com> wrote:

> Hi, @Srikant
>
> Hi, @Srikant
>
> Usually, your task should be killed when over cgroup limit. Would you
> enter the `/sys/fs/cgroup/memory/mesos` folder in the agent?
> Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`,
>  `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and
> `${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this
> email.
>
> ${YOUR_CONTAINER_ID} is the container id of your task here, you could find
> it from the agent log. Or as you said, you only have this one task, so it
> should only have one directory under `/sys/fs/cgroup/memory/mesos`.
>
> Furthermore, would you show the result of http://${YOUR_AGENT_IP}:5051/containers?
> It contains some tasks statistics information as well.
>
> On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani <
> srikant.blackrock@gmail.com> wrote:
>
>> We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to 1.0.1.
>> After upgrade we are not able to see memory used by task which was fine
>> in previous version. Due to this cgroups are not effective.
>>
>> Answers to your questions below :
>>
>> There is only 1 task running as a appserver which is consuming approx 20G
>> mem but this info is not coming in Mesos UI.
>> Swaps are enabled in agent start command.
>> These flags are used in agent - cgroups_limits_swap=true
>> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
>> In agent logs I can see updated memory limit to 33MB for container.
>>
>> Web UI shows the total memory allocated to framework but it is not
>> showing memory used by task.It always shows 0B/33MB.
>>
>> Not sure if this is rhel7 issue or mesos 1.0.1.
>>
>> Any suggestions ?
>> On 26 Sep 2016 21:55, "haosdent" <ha...@gmail.com> wrote:
>>
>>> Hi, @Srikant May you elaborate
>>>
>>> >We have verified using top command that framework was using 2gB memory
>>> while allocated was just 50 mb.
>>>
>>> * How many running tasks in your framework?
>>> * Do you enable or disable swap in the agents?
>>> * What's the flags that you launch agents?
>>> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to ` in
>>> the log of agent?
>>>
>>> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
>>> srikant.blackrock@gmail.com> wrote:
>>>
>>>> Hi Greg ,
>>>>
>>>> Previously we were running Mesos 0.27 on Rhel6 and since we already
>>>> have one c group hierarchy for cpu and memory for our production  processes
>>>> I'd we were not able to merge two c groups hierarchy on rhel6. Slave
>>>> process was not coming up.
>>>> Now we have moved  to Rhel7 and both mesos master and slave are running
>>>> on rhel7 with c group implemented.But we are seeing that mesos UI not
>>>> showing the actual memory used by framework.
>>>>
>>>> Any idea why framework usage of cpu and memory is not coming in UI. Due
>>>> to this OS is still not killing the task which are consuming more memory
>>>> than the allocated one.
>>>> We have verified using top command that framework was using 2gB memory
>>>> while allocated was just 50 mb.
>>>>
>>>> Please suggest.
>>>> On 8 Sep 2016 01:53, "Greg Mann" <gr...@mesosphere.io> wrote:
>>>>
>>>>> Hi Srikant,
>>>>> Without using cgroups, it won't be possible to enforce isolation of
>>>>> cpu/memory on a Linux agent. Could you elaborate a bit on why you aren't
>>>>> able to use cgroups currently? Have you tested the existing Mesos cgroup
>>>>> isolators in your system?
>>>>>
>>>>> Cheers,
>>>>> Greg
>>>>>
>>>>> On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani <
>>>>> srikant.blackrock@gmail.com> wrote:
>>>>>
>>>>>> Hi Guys,
>>>>>>
>>>>>> We are running Mesos cluster in our development environment. We are
>>>>>> seeing the cases where framework uses more amount of resources like cpu and
>>>>>> memory then the initial requested resources. When any new framework is
>>>>>> registered Mesos calculates the resources on the basis of already offered
>>>>>> resources to first framework and it doesn't consider actual  resources
>>>>>> utilised by previous framework.
>>>>>> This is resulting in incorrect calculation of resources.
>>>>>> Mesos website says that we should Implement  c groups but it is not
>>>>>> possible in our case as we have already implemented c groups in other
>>>>>> projects and due to Linux restrictions  we can't merge two c groups
>>>>>> hierarchy.
>>>>>>
>>>>>> Any idea how we can implement resource Isolation in Mesos ?
>>>>>>
>>>>>> We are using Mesos 0.27.1
>>>>>>
>>>>>> Thanks
>>>>>> Srikant Kalani
>>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: Resource Isolation in Mesos

Posted by haosdent <ha...@gmail.com>.
Hi, @Srikant

Hi, @Srikant

Usually, your task should be killed when over cgroup limit. Would you enter
the `/sys/fs/cgroup/memory/mesos` folder in the agent?
Then check the values in `${YOUR_CONTAINER_ID}/memory.limit_in_bytes`,
 `${YOUR_CONTAINER_ID}/memory.soft_limit_in_bytes` and
`${YOUR_CONTAINER_ID}/memory.memsw.limit_in_bytes` and reply in this email.

${YOUR_CONTAINER_ID} is the container id of your task here, you could find
it from the agent log. Or as you said, you only have this one task, so it
should only have one directory under `/sys/fs/cgroup/memory/mesos`.

Furthermore, would you show the result of
http://${YOUR_AGENT_IP}:5051/containers?
It contains some tasks statistics information as well.

On Tue, Oct 4, 2016 at 9:00 PM, Srikant Kalani <sr...@gmail.com>
wrote:

> We have upgraded linux from rhel6 to rhel7 and mesos from 0.27 to 1.0.1.
> After upgrade we are not able to see memory used by task which was fine in
> previous version. Due to this cgroups are not effective.
>
> Answers to your questions below :
>
> There is only 1 task running as a appserver which is consuming approx 20G
> mem but this info is not coming in Mesos UI.
> Swaps are enabled in agent start command.
> These flags are used in agent - cgroups_limits_swap=true
> --isolation=cgroups/cpu,cgroups/mem --cgroups_hierachy=/sys/fs/c group
> In agent logs I can see updated memory limit to 33MB for container.
>
> Web UI shows the total memory allocated to framework but it is not showing
> memory used by task.It always shows 0B/33MB.
>
> Not sure if this is rhel7 issue or mesos 1.0.1.
>
> Any suggestions ?
> On 26 Sep 2016 21:55, "haosdent" <ha...@gmail.com> wrote:
>
>> Hi, @Srikant May you elaborate
>>
>> >We have verified using top command that framework was using 2gB memory
>> while allocated was just 50 mb.
>>
>> * How many running tasks in your framework?
>> * Do you enable or disable swap in the agents?
>> * What's the flags that you launch agents?
>> * Have you saw some thing like `Updated 'memory.limit_in_bytes' to ` in
>> the log of agent?
>>
>> On Tue, Sep 27, 2016 at 12:14 AM, Srikant Kalani <
>> srikant.blackrock@gmail.com> wrote:
>>
>>> Hi Greg ,
>>>
>>> Previously we were running Mesos 0.27 on Rhel6 and since we already have
>>> one c group hierarchy for cpu and memory for our production  processes I'd
>>> we were not able to merge two c groups hierarchy on rhel6. Slave process
>>> was not coming up.
>>> Now we have moved  to Rhel7 and both mesos master and slave are running
>>> on rhel7 with c group implemented.But we are seeing that mesos UI not
>>> showing the actual memory used by framework.
>>>
>>> Any idea why framework usage of cpu and memory is not coming in UI. Due
>>> to this OS is still not killing the task which are consuming more memory
>>> than the allocated one.
>>> We have verified using top command that framework was using 2gB memory
>>> while allocated was just 50 mb.
>>>
>>> Please suggest.
>>> On 8 Sep 2016 01:53, "Greg Mann" <gr...@mesosphere.io> wrote:
>>>
>>>> Hi Srikant,
>>>> Without using cgroups, it won't be possible to enforce isolation of
>>>> cpu/memory on a Linux agent. Could you elaborate a bit on why you aren't
>>>> able to use cgroups currently? Have you tested the existing Mesos cgroup
>>>> isolators in your system?
>>>>
>>>> Cheers,
>>>> Greg
>>>>
>>>> On Tue, Sep 6, 2016 at 9:24 PM, Srikant Kalani <
>>>> srikant.blackrock@gmail.com> wrote:
>>>>
>>>>> Hi Guys,
>>>>>
>>>>> We are running Mesos cluster in our development environment. We are
>>>>> seeing the cases where framework uses more amount of resources like cpu and
>>>>> memory then the initial requested resources. When any new framework is
>>>>> registered Mesos calculates the resources on the basis of already offered
>>>>> resources to first framework and it doesn't consider actual  resources
>>>>> utilised by previous framework.
>>>>> This is resulting in incorrect calculation of resources.
>>>>> Mesos website says that we should Implement  c groups but it is not
>>>>> possible in our case as we have already implemented c groups in other
>>>>> projects and due to Linux restrictions  we can't merge two c groups
>>>>> hierarchy.
>>>>>
>>>>> Any idea how we can implement resource Isolation in Mesos ?
>>>>>
>>>>> We are using Mesos 0.27.1
>>>>>
>>>>> Thanks
>>>>> Srikant Kalani
>>>>>
>>>>
>>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>


-- 
Best Regards,
Haosdent Huang