You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Reza Motamedi <re...@gmail.com> on 2017/06/22 05:31:49 UTC
Review Request 60354: Observer task page to load consumption info from
history
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/
-----------------------------------------------------------
Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
Repository: aurora
Description
-------
# Observer task page to load consumption info from history
Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
Diffs
-----
src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2
src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28
Diff: https://reviews.apache.org/r/60354/diff/1/
Testing
-------
I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
File Attachments
----------------
page load timing stats with the patch - chrome water fall view
https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
page load timing stats without the patch - chrome water fall view
https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
Thanks,
Reza Motamedi
Re: Review Request 60354: Observer task page to load consumption info
from history
Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/#review178628
-----------------------------------------------------------
Master (aae39a8) is red with this patch.
./build-support/jenkins/build.sh
Executing tasks in goals: compile[0m
05:48:38 00:01 [compile]
05:48:38 00:01 [compile-prep-command]
05:48:38 00:01 [prep_command]
05:48:40 00:03 [compile]
05:48:40 00:03 [python-eval]
05:48:40 00:03 [pythonstyle]
05:48:40 00:03 [cache] [32m
No cached artifacts for 42 targets.[0m[32m
Invalidated 42 targets.[0m
T302:ERROR src/main/python/apache/thermos/monitoring/resource.py:084 Expected 2 blank lines, found 1
|class ResourceHistory(object):
F401:ERROR src/main/python/apache/thermos/monitoring/resource.py:035 'attrgetter' imported but unused
|from operator import attrgetter
E302:ERROR PythonFile(src/main/python/apache/thermos/monitoring/resource.py):084 expected 2 blank lines, found 1
|class ResourceHistory(object):
E272:ERROR PythonFile(src/main/python/apache/thermos/monitoring/resource.py):168 multiple spaces before keyword
| procs = (el.num_procs for el in _sample[1].proc_usage.values())
E221:ERROR PythonFile(src/main/python/apache/thermos/monitoring/resource.py):168 multiple spaces before operator
| procs = (el.num_procs for el in _sample[1].proc_usage.values())
T301:ERROR src/test/python/apache/thermos/monitoring/test_resource.py:111-112 Expected 1 blank lines, found 2
| @mock.patch('apache.thermos.monitoring.monitor.TaskMonitor.get_active_processes',
| autospec=True, spec_set=True)
E303:ERROR PythonFile(src/test/python/apache/thermos/monitoring/test_resource.py):111-112 too many blank lines (2)
| @mock.patch('apache.thermos.monitoring.monitor.TaskMonitor.get_active_processes',
| autospec=True, spec_set=True)
FAILURE: 7 Python Style issues found. For import order related issues, please try `./pants fmt.isort <targets>`
05:48:56 00:19 [complete][31m
FAILURE[0m
I will refresh this build result if you post a review containing "@ReviewBot retry"
- Aurora ReviewBot
On June 22, 2017, 5:31 a.m., Reza Motamedi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60354/
> -----------------------------------------------------------
>
> (Updated June 22, 2017, 5:31 a.m.)
>
>
> Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> # Observer task page to load consumption info from history
>
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
>
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
>
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
>
>
> Diffs
> -----
>
> src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2
> src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28
>
>
> Diff: https://reviews.apache.org/r/60354/diff/1/
>
>
> Testing
> -------
>
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
>
>
> File Attachments
> ----------------
>
> page load timing stats with the patch - chrome water fall view
> https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
> page load timing stats without the patch - chrome water fall view
> https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
>
>
> Thanks,
>
> Reza Motamedi
>
>
Re: Review Request 60354: Observer task page to load consumption info
from history
Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/#review178696
-----------------------------------------------------------
Ship it!
Master (13055df) is green with this patch.
./build-support/jenkins/build.sh
I will refresh this build result if you post a review containing "@ReviewBot retry"
- Aurora ReviewBot
On June 22, 2017, 5:28 p.m., Reza Motamedi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60354/
> -----------------------------------------------------------
>
> (Updated June 22, 2017, 5:28 p.m.)
>
>
> Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> # Observer task page to load consumption info from history
>
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
>
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
>
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
>
>
> Diffs
> -----
>
> src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2
> src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28
>
>
> Diff: https://reviews.apache.org/r/60354/diff/2/
>
>
> Testing
> -------
>
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
>
>
> File Attachments
> ----------------
>
> page load timing stats with the patch - chrome water fall view
> https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
> page load timing stats without the patch - chrome water fall view
> https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
>
>
> Thanks,
>
> Reza Motamedi
>
>
Re: Review Request 60354: Observer task page to load consumption info
from history
Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/
-----------------------------------------------------------
(Updated June 22, 2017, 7:46 p.m.)
Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
Repository: aurora
Description
-------
# Observer task page to load consumption info from history
Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
Diffs
-----
src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2
src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28
Diff: https://reviews.apache.org/r/60354/diff/3/
Testing
-------
I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
Thanks,
Reza Motamedi
Re: Review Request 60354: Observer task page to load consumption info
from history
Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/#review178725
-----------------------------------------------------------
Ship it!
Master (a922b05) is green with this patch.
./build-support/jenkins/build.sh
I will refresh this build result if you post a review containing "@ReviewBot retry"
- Aurora ReviewBot
On June 22, 2017, 6:02 p.m., Reza Motamedi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60354/
> -----------------------------------------------------------
>
> (Updated June 22, 2017, 6:02 p.m.)
>
>
> Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> # Observer task page to load consumption info from history
>
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
>
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
>
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
>
>
> Diffs
> -----
>
> src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2
> src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28
>
>
> Diff: https://reviews.apache.org/r/60354/diff/3/
>
>
> Testing
> -------
>
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
>
>
> File Attachments
> ----------------
>
> page load timing stats with the patch - chrome water fall view
> https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
> page load timing stats without the patch - chrome water fall view
> https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
>
>
> Thanks,
>
> Reza Motamedi
>
>
Re: Review Request 60354: Observer task page to load consumption info
from history
Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/
-----------------------------------------------------------
(Updated June 22, 2017, 6:02 p.m.)
Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
Repository: aurora
Description
-------
# Observer task page to load consumption info from history
Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
Diffs (updated)
-----
src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2
src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28
Diff: https://reviews.apache.org/r/60354/diff/3/
Changes: https://reviews.apache.org/r/60354/diff/2-3/
Testing
-------
I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
File Attachments
----------------
page load timing stats with the patch - chrome water fall view
https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
page load timing stats without the patch - chrome water fall view
https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
Thanks,
Reza Motamedi
Re: Review Request 60354: Observer task page to load consumption info
from history
Posted by Reza Motamedi <re...@gmail.com>.
> On June 22, 2017, 5:07 p.m., David McLaughlin wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Line 160 (original), 166 (patched)
> > <https://reviews.apache.org/r/60354/diff/2/?file=1758476#file1758476line167>
> >
> > I think better variable names would help with reading this code.
You're right.
- Reza
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/#review178711
-----------------------------------------------------------
On June 22, 2017, 3:28 p.m., Reza Motamedi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60354/
> -----------------------------------------------------------
>
> (Updated June 22, 2017, 3:28 p.m.)
>
>
> Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> # Observer task page to load consumption info from history
>
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
>
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
>
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
>
>
> Diffs
> -----
>
> src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2
> src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28
>
>
> Diff: https://reviews.apache.org/r/60354/diff/2/
>
>
> Testing
> -------
>
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
>
>
> File Attachments
> ----------------
>
> page load timing stats with the patch - chrome water fall view
> https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
> page load timing stats without the patch - chrome water fall view
> https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
>
>
> Thanks,
>
> Reza Motamedi
>
>
Re: Review Request 60354: Observer task page to load consumption info
from history
Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/#review178711
-----------------------------------------------------------
Overall approach LGTM. Just a little readability nit below.
src/main/python/apache/thermos/monitoring/resource.py
Line 160 (original), 166 (patched)
<https://reviews.apache.org/r/60354/#comment252861>
I think better variable names would help with reading this code.
- David McLaughlin
On June 22, 2017, 3:28 p.m., Reza Motamedi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60354/
> -----------------------------------------------------------
>
> (Updated June 22, 2017, 3:28 p.m.)
>
>
> Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> # Observer task page to load consumption info from history
>
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
>
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
>
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
>
>
> Diffs
> -----
>
> src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2
> src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28
>
>
> Diff: https://reviews.apache.org/r/60354/diff/2/
>
>
> Testing
> -------
>
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
>
>
> File Attachments
> ----------------
>
> page load timing stats with the patch - chrome water fall view
> https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
> page load timing stats without the patch - chrome water fall view
> https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
>
>
> Thanks,
>
> Reza Motamedi
>
>
Re: Review Request 60354: Observer task page to load consumption info
from history
Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/
-----------------------------------------------------------
(Updated June 22, 2017, 3:28 p.m.)
Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
Repository: aurora
Description
-------
# Observer task page to load consumption info from history
Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
Diffs (updated)
-----
src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2
src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28
Diff: https://reviews.apache.org/r/60354/diff/2/
Changes: https://reviews.apache.org/r/60354/diff/1-2/
Testing
-------
I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
File Attachments
----------------
page load timing stats with the patch - chrome water fall view
https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
page load timing stats without the patch - chrome water fall view
https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
Thanks,
Reza Motamedi