You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Reza Motamedi <re...@gmail.com> on 2017/06/22 05:31:49 UTC

Review Request 60354: Observer task page to load consumption info from history

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/
-----------------------------------------------------------

Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.


Repository: aurora


Description
-------

# Observer task page to load consumption info from history

Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.

On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.

By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.


Diffs
-----

  src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
  src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 


Diff: https://reviews.apache.org/r/60354/diff/1/


Testing
-------

I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.


File Attachments
----------------

page load timing stats with the patch - chrome water fall view
  https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
page load timing stats without the patch - chrome water fall view
  https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png


Thanks,

Reza Motamedi


Re: Review Request 60354: Observer task page to load consumption info from history

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/#review178628
-----------------------------------------------------------



Master (aae39a8) is red with this patch.
  ./build-support/jenkins/build.sh

               Executing tasks in goals: compile
05:48:38 00:01   [compile]
05:48:38 00:01     [compile-prep-command]
05:48:38 00:01       [prep_command]
05:48:40 00:03     [compile]
05:48:40 00:03     [python-eval]
05:48:40 00:03     [pythonstyle]
05:48:40 00:03       [cache]                                          
                   No cached artifacts for 42 targets.
                   Invalidated 42 targets.
T302:ERROR   src/main/python/apache/thermos/monitoring/resource.py:084 Expected 2 blank lines, found 1
     |class ResourceHistory(object):

F401:ERROR   src/main/python/apache/thermos/monitoring/resource.py:035 'attrgetter' imported but unused
     |from operator import attrgetter

E302:ERROR   PythonFile(src/main/python/apache/thermos/monitoring/resource.py):084 expected 2 blank lines, found 1
     |class ResourceHistory(object):

E272:ERROR   PythonFile(src/main/python/apache/thermos/monitoring/resource.py):168 multiple spaces before keyword
     |    procs   = (el.num_procs      for el in _sample[1].proc_usage.values())

E221:ERROR   PythonFile(src/main/python/apache/thermos/monitoring/resource.py):168 multiple spaces before operator
     |    procs   = (el.num_procs      for el in _sample[1].proc_usage.values())


T301:ERROR   src/test/python/apache/thermos/monitoring/test_resource.py:111-112 Expected 1 blank lines, found 2
     |  @mock.patch('apache.thermos.monitoring.monitor.TaskMonitor.get_active_processes',
     |      autospec=True, spec_set=True)

E303:ERROR   PythonFile(src/test/python/apache/thermos/monitoring/test_resource.py):111-112 too many blank lines (2)
     |  @mock.patch('apache.thermos.monitoring.monitor.TaskMonitor.get_active_processes',
     |      autospec=True, spec_set=True)


FAILURE: 7 Python Style issues found. For import order related issues, please try `./pants fmt.isort <targets>`


05:48:56 00:19   [complete]
               FAILURE


I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On June 22, 2017, 5:31 a.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60354/
> -----------------------------------------------------------
> 
> (Updated June 22, 2017, 5:31 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60354/diff/1/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> page load timing stats with the patch - chrome water fall view
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
> page load timing stats without the patch - chrome water fall view
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60354: Observer task page to load consumption info from history

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/#review178696
-----------------------------------------------------------


Ship it!




Master (13055df) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On June 22, 2017, 5:28 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60354/
> -----------------------------------------------------------
> 
> (Updated June 22, 2017, 5:28 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60354/diff/2/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> page load timing stats with the patch - chrome water fall view
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
> page load timing stats without the patch - chrome water fall view
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60354: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/
-----------------------------------------------------------

(Updated June 22, 2017, 7:46 p.m.)


Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.


Repository: aurora


Description
-------

# Observer task page to load consumption info from history

Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.

On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.

By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.


Diffs
-----

  src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
  src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 


Diff: https://reviews.apache.org/r/60354/diff/3/


Testing
-------

I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.


Thanks,

Reza Motamedi


Re: Review Request 60354: Observer task page to load consumption info from history

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/#review178725
-----------------------------------------------------------


Ship it!




Master (a922b05) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On June 22, 2017, 6:02 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60354/
> -----------------------------------------------------------
> 
> (Updated June 22, 2017, 6:02 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60354/diff/3/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> page load timing stats with the patch - chrome water fall view
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
> page load timing stats without the patch - chrome water fall view
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60354: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/
-----------------------------------------------------------

(Updated June 22, 2017, 6:02 p.m.)


Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.


Repository: aurora


Description
-------

# Observer task page to load consumption info from history

Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.

On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.

By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.


Diffs (updated)
-----

  src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
  src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 


Diff: https://reviews.apache.org/r/60354/diff/3/

Changes: https://reviews.apache.org/r/60354/diff/2-3/


Testing
-------

I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.


File Attachments
----------------

page load timing stats with the patch - chrome water fall view
  https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
page load timing stats without the patch - chrome water fall view
  https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png


Thanks,

Reza Motamedi


Re: Review Request 60354: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.

> On June 22, 2017, 5:07 p.m., David McLaughlin wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Line 160 (original), 166 (patched)
> > <https://reviews.apache.org/r/60354/diff/2/?file=1758476#file1758476line167>
> >
> >     I think better variable names would help with reading this code.

You're right.


- Reza


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/#review178711
-----------------------------------------------------------


On June 22, 2017, 3:28 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60354/
> -----------------------------------------------------------
> 
> (Updated June 22, 2017, 3:28 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60354/diff/2/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> page load timing stats with the patch - chrome water fall view
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
> page load timing stats without the patch - chrome water fall view
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60354: Observer task page to load consumption info from history

Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/#review178711
-----------------------------------------------------------



Overall approach LGTM. Just a little readability nit below.


src/main/python/apache/thermos/monitoring/resource.py
Line 160 (original), 166 (patched)
<https://reviews.apache.org/r/60354/#comment252861>

    I think better variable names would help with reading this code.


- David McLaughlin


On June 22, 2017, 3:28 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60354/
> -----------------------------------------------------------
> 
> (Updated June 22, 2017, 3:28 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60354/diff/2/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> page load timing stats with the patch - chrome water fall view
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
> page load timing stats without the patch - chrome water fall view
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60354: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60354/
-----------------------------------------------------------

(Updated June 22, 2017, 3:28 p.m.)


Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham.


Repository: aurora


Description
-------

# Observer task page to load consumption info from history

Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.

On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.

By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.


Diffs (updated)
-----

  src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
  src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 


Diff: https://reviews.apache.org/r/60354/diff/2/

Changes: https://reviews.apache.org/r/60354/diff/1-2/


Testing
-------

I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.


File Attachments
----------------

page load timing stats with the patch - chrome water fall view
  https://reviews.apache.org/media/uploaded/files/2017/06/22/6cec6645-6a2d-46bb-997f-fef53bb15c19__with_patch_-_Screen_Shot_2017-06-21_at_10.17.24_PM.png
page load timing stats without the patch - chrome water fall view
  https://reviews.apache.org/media/uploaded/files/2017/06/22/0916cd47-07ec-48da-bf52-9560c21c1f60__without_patch_-_Screen_Shot_2017-06-21_at_10.16.28_PM.png


Thanks,

Reza Motamedi