You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Stephan Erb <se...@apache.org> on 2018/06/18 08:57:11 UTC
Review Request 67627: Add observer flag to disable resource metric
collection
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67627/
-----------------------------------------------------------
Review request for Aurora, Renan DelValle, Reza Motamedi, and Santhosh Kumar Shanmugham.
Repository: aurora
Description
-------
Add observer command line option `--disable_task_resource_collection` to
disable the collection of CPU, memory, and disk metrics for observed tasks.
This is useful in setups where metrics cannot be gathered reliable (e.g. when
using PID namespaces) or when it is expensive due to hundreds of active tasks
per host.
Diffs
-----
RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca
docs/reference/observer-configuration.md c791b3480e5bf35e6eb0fbea908ff3242eab315d
src/main/python/apache/aurora/config/BUILD 12e7fe973f456d0847ce63d3b293131a7f4c3bdd
src/main/python/apache/aurora/tools/thermos_observer.py fd9465d2e2b3135f3fdf8230777117adaa89337c
src/main/python/apache/thermos/monitoring/resource.py 72ed4e5a82dfd8a09e0a8262f6da4992ac98542a
src/main/python/apache/thermos/observer/task_observer.py 94cd6c541bb7f8a4c153cc51caa63d2c08888a49
src/test/python/apache/thermos/monitoring/test_resource.py 44450647a180f86903ebd37f2a9f4327496597e9
Diff: https://reviews.apache.org/r/67627/diff/1/
Testing
-------
We are running our Mesos agents with enabled PID namespaces (i.e.
`--isolation='namespaces/ipc,namespaces/pid,...'`). Sometimes the hosts are
also tightly packed with many small tasks (e.g. `~130` active tasks and `~1000`
finished tasks). Even with very relaxed scrape settings of
`--task_process_collection_interval_secs=3000` and
`--task_disk_collection_interval_secs=3000` it can take between `150ms-2500ms`
to render the observer landing page `/main`. This patch reduces this to about
`100ms-150ms`. There is no immediate downside as metrics reporting is broken
anyway due to the PID namespacing.
Thanks,
Stephan Erb
Re: Review Request 67627: Add observer flag to disable resource metric
collection
Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67627/#review204916
-----------------------------------------------------------
Ship it!
Master (4719fa7) is green with this patch.
./build-support/jenkins/build.sh
I will refresh this build result if you post a review containing "@ReviewBot retry"
- Aurora ReviewBot
On June 18, 2018, 8:57 a.m., Stephan Erb wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67627/
> -----------------------------------------------------------
>
> (Updated June 18, 2018, 8:57 a.m.)
>
>
> Review request for Aurora, Renan DelValle, Reza Motamedi, and Santhosh Kumar Shanmugham.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Add observer command line option `--disable_task_resource_collection` to
> disable the collection of CPU, memory, and disk metrics for observed tasks.
> This is useful in setups where metrics cannot be gathered reliable (e.g. when
> using PID namespaces) or when it is expensive due to hundreds of active tasks
> per host.
>
>
> Diffs
> -----
>
> RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca
> docs/reference/observer-configuration.md c791b3480e5bf35e6eb0fbea908ff3242eab315d
> src/main/python/apache/aurora/config/BUILD 12e7fe973f456d0847ce63d3b293131a7f4c3bdd
> src/main/python/apache/aurora/tools/thermos_observer.py fd9465d2e2b3135f3fdf8230777117adaa89337c
> src/main/python/apache/thermos/monitoring/resource.py 72ed4e5a82dfd8a09e0a8262f6da4992ac98542a
> src/main/python/apache/thermos/observer/task_observer.py 94cd6c541bb7f8a4c153cc51caa63d2c08888a49
> src/test/python/apache/thermos/monitoring/test_resource.py 44450647a180f86903ebd37f2a9f4327496597e9
>
>
> Diff: https://reviews.apache.org/r/67627/diff/1/
>
>
> Testing
> -------
>
> We are running our Mesos agents with enabled PID namespaces (i.e.
> `--isolation='namespaces/ipc,namespaces/pid,...'`). Sometimes the hosts are
> also tightly packed with many small tasks (e.g. `~130` active tasks and `~1000`
> finished tasks). Even with very relaxed scrape settings of
> `--task_process_collection_interval_secs=3000` and
> `--task_disk_collection_interval_secs=3000` it can take between `150ms-2500ms`
> to render the observer landing page `/main`. This patch reduces this to about
> `100ms-150ms`. There is no immediate downside as metrics reporting is broken
> anyway due to the PID namespacing.
>
>
> Thanks,
>
> Stephan Erb
>
>
Re: Review Request 67627: Add observer flag to disable resource metric
collection
Posted by Santhosh Kumar Shanmugham <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67627/#review204936
-----------------------------------------------------------
Mostly LGTM.
Will the UI show 0s or empty spaces?
Can you expand on why PID namespaces breaks metrics?
docs/reference/observer-configuration.md
Lines 27 (patched)
<https://reviews.apache.org/r/67627/#comment287754>
also disk metrics
src/main/python/apache/aurora/tools/thermos_observer.py
Lines 68 (patched)
<https://reviews.apache.org/r/67627/#comment287753>
also disk metrics
- Santhosh Kumar Shanmugham
On June 18, 2018, 1:57 a.m., Stephan Erb wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67627/
> -----------------------------------------------------------
>
> (Updated June 18, 2018, 1:57 a.m.)
>
>
> Review request for Aurora, Renan DelValle, Reza Motamedi, and Santhosh Kumar Shanmugham.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Add observer command line option `--disable_task_resource_collection` to
> disable the collection of CPU, memory, and disk metrics for observed tasks.
> This is useful in setups where metrics cannot be gathered reliable (e.g. when
> using PID namespaces) or when it is expensive due to hundreds of active tasks
> per host.
>
>
> Diffs
> -----
>
> RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca
> docs/reference/observer-configuration.md c791b3480e5bf35e6eb0fbea908ff3242eab315d
> src/main/python/apache/aurora/config/BUILD 12e7fe973f456d0847ce63d3b293131a7f4c3bdd
> src/main/python/apache/aurora/tools/thermos_observer.py fd9465d2e2b3135f3fdf8230777117adaa89337c
> src/main/python/apache/thermos/monitoring/resource.py 72ed4e5a82dfd8a09e0a8262f6da4992ac98542a
> src/main/python/apache/thermos/observer/task_observer.py 94cd6c541bb7f8a4c153cc51caa63d2c08888a49
> src/test/python/apache/thermos/monitoring/test_resource.py 44450647a180f86903ebd37f2a9f4327496597e9
>
>
> Diff: https://reviews.apache.org/r/67627/diff/1/
>
>
> Testing
> -------
>
> We are running our Mesos agents with enabled PID namespaces (i.e.
> `--isolation='namespaces/ipc,namespaces/pid,...'`). Sometimes the hosts are
> also tightly packed with many small tasks (e.g. `~130` active tasks and `~1000`
> finished tasks). Even with very relaxed scrape settings of
> `--task_process_collection_interval_secs=3000` and
> `--task_disk_collection_interval_secs=3000` it can take between `150ms-2500ms`
> to render the observer landing page `/main`. This patch reduces this to about
> `100ms-150ms`. There is no immediate downside as metrics reporting is broken
> anyway due to the PID namespacing.
>
>
> Thanks,
>
> Stephan Erb
>
>