You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Reza Motamedi <re...@gmail.com> on 2017/06/22 20:18:14 UTC

Review Request 60376: Observer task page to load consumption info from history

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/
-----------------------------------------------------------

Review request for Aurora, David McLaughlin, Joshua Cohen, and Santhosh Kumar Shanmugham.


Repository: aurora


Description
-------

# Observer task page to load consumption info from history

Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.

On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.

By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.


Diffs
-----

  src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
  src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 


Diff: https://reviews.apache.org/r/60376/diff/1/


Testing
-------

I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.


File Attachments
----------------

without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png


Thanks,

Reza Motamedi


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.

> On June 27, 2017, 9:10 p.m., Stephan Erb wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Line 54 (original), 53 (patched)
> > <https://reviews.apache.org/r/60376/diff/1/?file=1758879#file1758879line54>
> >
> >     It might be worth to rename this to something like 'AggregateResourceResult' or 'TaskResourceResult'.

Makes sense.


> On June 27, 2017, 9:10 p.m., Stephan Erb wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Line 160 (original), 166 (patched)
> > <https://reviews.apache.org/r/60376/diff/1/?file=1758879#file1758879line167>
> >
> >     The usages of `ts_value_pair` has made it difficult to understand what is going on. I believe it becomes a little bit clearer to the casual reader if you do something like this:
> >     
> >         timestamp, resources = self._history.get(timestamp)
> >         
> >     This will replace usages of `ts_value_pair[1].proc_usage.values()` with `resources.proc_usage.values()` which makes it slightly clearer.
> >     
> >     Calling this `full_resources` instead of `resources` should work as well.

Cool. The code looks much nicer.


> On June 27, 2017, 9:10 p.m., Stephan Erb wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Lines 190 (patched)
> > <https://reviews.apache.org/r/60376/diff/1/?file=1758879#file1758879line191>
> >
> >     in this case you coud do
> >     
> >         _, resources = self._history.get(time.time())
> >         
> >     
> >     as you don't care about the timestamp.

Done.


> On June 27, 2017, 9:10 p.m., Stephan Erb wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Line 222 (original), 243-244 (patched)
> > <https://reviews.apache.org/r/60376/diff/1/?file=1758879#file1758879line244>
> >
> >     We normally don't do hanging indentation https://github.com/twitter/commons/blob/master/src/python/twitter/common/styleguide.md#best-practices

Good to know.


> On June 27, 2017, 9:10 p.m., Stephan Erb wrote:
> > src/test/python/apache/thermos/monitoring/test_resource.py
> > Lines 84-103 (patched)
> > <https://reviews.apache.org/r/60376/diff/1/?file=1758880#file1758880line86>
> >
> >     I am not a big fan of the mocking here, even though the sibling testcases are not much better :). Mocking results in high coupling, making the implementation harder to change in the future.
> >     
> >     Please give it a try if you can manage to inject a normal, pre-populated `ResourceHistory` into the constructor of the `TaskResourceMonitor` instead.
> >     
> >     (If you don't have any conclusive results after 10-15 minutes feel free to give up. There might be some further interdependency that I am not spotting right now)

Addressed. Please let me know if this is what you had in mind.


- Reza


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review179043
-----------------------------------------------------------


On June 28, 2017, 7:01 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated June 28, 2017, 7:01 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/2/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review179043
-----------------------------------------------------------



Let's see how this works out in practice! I am really looking forward to snappy thermos pages again.


src/main/python/apache/thermos/monitoring/resource.py
Line 54 (original), 53 (patched)
<https://reviews.apache.org/r/60376/#comment253446>

    It might be worth to rename this to something like 'AggregateResourceResult' or 'TaskResourceResult'.



src/main/python/apache/thermos/monitoring/resource.py
Line 160 (original), 166 (patched)
<https://reviews.apache.org/r/60376/#comment253448>

    The usages of `ts_value_pair` has made it difficult to understand what is going on. I believe it becomes a little bit clearer to the casual reader if you do something like this:
    
        timestamp, resources = self._history.get(timestamp)
        
    This will replace usages of `ts_value_pair[1].proc_usage.values()` with `resources.proc_usage.values()` which makes it slightly clearer.
    
    Calling this `full_resources` instead of `resources` should work as well.



src/main/python/apache/thermos/monitoring/resource.py
Lines 190 (patched)
<https://reviews.apache.org/r/60376/#comment253449>

    in this case you coud do
    
        _, resources = self._history.get(time.time())
        
    
    as you don't care about the timestamp.



src/main/python/apache/thermos/monitoring/resource.py
Line 222 (original), 243-244 (patched)
<https://reviews.apache.org/r/60376/#comment253450>

    We normally don't do hanging indentation https://github.com/twitter/commons/blob/master/src/python/twitter/common/styleguide.md#best-practices



src/test/python/apache/thermos/monitoring/test_resource.py
Lines 84-103 (patched)
<https://reviews.apache.org/r/60376/#comment253451>

    I am not a big fan of the mocking here, even though the sibling testcases are not much better :). Mocking results in high coupling, making the implementation harder to change in the future.
    
    Please give it a try if you can manage to inject a normal, pre-populated `ResourceHistory` into the constructor of the `TaskResourceMonitor` instead.
    
    (If you don't have any conclusive results after 10-15 minutes feel free to give up. There might be some further interdependency that I am not spotting right now)


- Stephan Erb


On June 22, 2017, 10:18 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated June 22, 2017, 10:18 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/1/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review178736
-----------------------------------------------------------


Ship it!




Master (a922b05) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On June 22, 2017, 8:18 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated June 22, 2017, 8:18 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/1/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.

> On June 28, 2017, 7:15 p.m., Aurora ReviewBot wrote:
> > Master (a922b05) is red with this patch.
> >   ./build-support/jenkins/build.sh
> > 
> >   Downloading six-1.10.0-py2.py3-none-any.whl
> > Collecting ansicolors==1.0.2 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading ansicolors-1.0.2.tar.gz
> > Collecting packaging==16.7 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading packaging-16.7-py2.py3-none-any.whl
> > Collecting pathspec==0.3.4 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading pathspec-0.3.4.tar.gz
> > Collecting scandir==1.2 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading scandir-1.2.zip
> > Collecting twitter.common.dirutil<0.4,>=0.3.1 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading twitter.common.dirutil-0.3.9.tar.gz
> > Collecting psutil==4.3.0 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading psutil-4.3.0.tar.gz (316kB)
> > Collecting requests<2.6,>=2.5.0 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading requests-2.5.3-py2.py3-none-any.whl (468kB)
> > Collecting pystache==0.5.3 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading pystache-0.5.3.tar.gz (74kB)
> > Collecting pex==1.1.16 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading pex-1.1.16-py2.py3-none-any.whl (105kB)
> > Collecting docutils<0.13,>=0.12 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading docutils-0.12.tar.gz (1.6MB)
> > Collecting Markdown==2.1.1 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading Markdown-2.1.1.tar.gz (242kB)
> > Collecting Pygments==1.4 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading Pygments-1.4.tar.gz (3.5MB)
> > Collecting twitter.common.confluence<0.4,>=0.3.1 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading twitter.common.confluence-0.3.9.tar.gz
> > Collecting fasteners==0.14.1 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading fasteners-0.14.1-py2.py3-none-any.whl
> > Collecting coverage<3.8,>=3.7 (from pantsbuild.pants==1.3.0.dev3)
> >   Downloading coverage-3.7.1.tar.gz (284kB)
> >     Complete output from command python setup.py egg_info:
> >     running egg_info
> >     error: error in 'egg_base' option: 'pip-egg-info' does not exist or is not a directory
> >     
> >     ----------------------------------------
> > Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-asSx_T/coverage/
> > You are using pip version 8.1.2, however version 9.0.1 is available.
> > You should consider upgrading via the 'pip install --upgrade pip' command.
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/python/../../pants: line 99: /home/jenkins/jenkins-slave/workspace/AuroraBot/.home/.cache/pants/setup/bootstrap-Linux-x86_64/1.3.0.dev3/bin/python: No such file or directory
> > 
> > 
> > I will refresh this build result if you post a review containing "@ReviewBot retry"

@ReviewBot retry


- Reza


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review179153
-----------------------------------------------------------


On June 28, 2017, 7:01 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated June 28, 2017, 7:01 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/2/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review179153
-----------------------------------------------------------



Master (a922b05) is red with this patch.
  ./build-support/jenkins/build.sh

  Downloading six-1.10.0-py2.py3-none-any.whl
Collecting ansicolors==1.0.2 (from pantsbuild.pants==1.3.0.dev3)
  Downloading ansicolors-1.0.2.tar.gz
Collecting packaging==16.7 (from pantsbuild.pants==1.3.0.dev3)
  Downloading packaging-16.7-py2.py3-none-any.whl
Collecting pathspec==0.3.4 (from pantsbuild.pants==1.3.0.dev3)
  Downloading pathspec-0.3.4.tar.gz
Collecting scandir==1.2 (from pantsbuild.pants==1.3.0.dev3)
  Downloading scandir-1.2.zip
Collecting twitter.common.dirutil<0.4,>=0.3.1 (from pantsbuild.pants==1.3.0.dev3)
  Downloading twitter.common.dirutil-0.3.9.tar.gz
Collecting psutil==4.3.0 (from pantsbuild.pants==1.3.0.dev3)
  Downloading psutil-4.3.0.tar.gz (316kB)
Collecting requests<2.6,>=2.5.0 (from pantsbuild.pants==1.3.0.dev3)
  Downloading requests-2.5.3-py2.py3-none-any.whl (468kB)
Collecting pystache==0.5.3 (from pantsbuild.pants==1.3.0.dev3)
  Downloading pystache-0.5.3.tar.gz (74kB)
Collecting pex==1.1.16 (from pantsbuild.pants==1.3.0.dev3)
  Downloading pex-1.1.16-py2.py3-none-any.whl (105kB)
Collecting docutils<0.13,>=0.12 (from pantsbuild.pants==1.3.0.dev3)
  Downloading docutils-0.12.tar.gz (1.6MB)
Collecting Markdown==2.1.1 (from pantsbuild.pants==1.3.0.dev3)
  Downloading Markdown-2.1.1.tar.gz (242kB)
Collecting Pygments==1.4 (from pantsbuild.pants==1.3.0.dev3)
  Downloading Pygments-1.4.tar.gz (3.5MB)
Collecting twitter.common.confluence<0.4,>=0.3.1 (from pantsbuild.pants==1.3.0.dev3)
  Downloading twitter.common.confluence-0.3.9.tar.gz
Collecting fasteners==0.14.1 (from pantsbuild.pants==1.3.0.dev3)
  Downloading fasteners-0.14.1-py2.py3-none-any.whl
Collecting coverage<3.8,>=3.7 (from pantsbuild.pants==1.3.0.dev3)
  Downloading coverage-3.7.1.tar.gz (284kB)
    Complete output from command python setup.py egg_info:
    running egg_info
    error: error in 'egg_base' option: 'pip-egg-info' does not exist or is not a directory
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-asSx_T/coverage/
You are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/python/../../pants: line 99: /home/jenkins/jenkins-slave/workspace/AuroraBot/.home/.cache/pants/setup/bootstrap-Linux-x86_64/1.3.0.dev3/bin/python: No such file or directory


I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On June 28, 2017, 7:01 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated June 28, 2017, 7:01 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/2/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review179297
-----------------------------------------------------------


Ship it!




Master (a922b05) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On June 29, 2017, 6:32 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated June 29, 2017, 6:32 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/3/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.

> On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Line 54 (original), 53-60 (patched)
> > <https://reviews.apache.org/r/60376/diff/3/?file=1766459#file1766459line54>
> >
> >     Can you add some docstrings to these classes to explain what they will contain? Particularly the difference between the `AggregateResourceResult` and `(Proc|Full)ResourceResult`.

Done.


> On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Line 61 (original), 67 (patched)
> > <https://reviews.apache.org/r/60376/diff/3/?file=1766459#file1766459line68>
> >
> >     Update docstring.

Done.


> On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Line 68 (original), 74 (patched)
> > <https://reviews.apache.org/r/60376/diff/3/?file=1766459#file1766459line75>
> >
> >     Update docstring.

Done.


> On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Lines 88 (patched)
> > <https://reviews.apache.org/r/60376/diff/3/?file=1766459#file1766459line89>
> >
> >     snake-case the field names.

Done.


> On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Lines 175-179 (patched)
> > <https://reviews.apache.org/r/60376/diff/3/?file=1766459#file1766459line181>
> >
> >     Can we use `sum` and `zip` to sum the tuples column-wise?

Not sure if that makes it easier to read. I moved to something that matches what this part of the calculation looked like before:

```
    aggregated_procs = sum(map(attrgetter('num_procs'), full_resources.proc_usage.values()))
    aggregated_sample = sum(map(attrgetter('process_sample'), full_resources.proc_usage.values()),
        ProcessSample.empty())
```


> On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote:
> > src/test/python/apache/thermos/monitoring/test_resource.py
> > Lines 104-105 (patched)
> > <https://reviews.apache.org/r/60376/diff/3/?file=1766461#file1766461line106>
> >
> >     Feed some random values.

done.


> On July 18, 2017, 12:46 a.m., Santhosh Kumar Shanmugham wrote:
> > src/test/python/apache/thermos/monitoring/test_resource.py
> > Lines 113 (patched)
> > <https://reviews.apache.org/r/60376/diff/3/?file=1766461#file1766461line115>
> >
> >     Assert disk_usage.

done.


- Reza


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review180775
-----------------------------------------------------------


On July 18, 2017, 6:26 a.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated July 18, 2017, 6:26 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/4/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Santhosh Kumar Shanmugham <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review180775
-----------------------------------------------------------



LGTM. Some minor comments.


src/main/python/apache/thermos/monitoring/resource.py
Line 54 (original), 53-60 (patched)
<https://reviews.apache.org/r/60376/#comment255999>

    Can you add some docstrings to these classes to explain what they will contain? Particularly the difference between the `AggregateResourceResult` and `(Proc|Full)ResourceResult`.



src/main/python/apache/thermos/monitoring/resource.py
Line 61 (original), 67 (patched)
<https://reviews.apache.org/r/60376/#comment256000>

    Update docstring.



src/main/python/apache/thermos/monitoring/resource.py
Line 68 (original), 74 (patched)
<https://reviews.apache.org/r/60376/#comment256001>

    Update docstring.



src/main/python/apache/thermos/monitoring/resource.py
Lines 88 (patched)
<https://reviews.apache.org/r/60376/#comment256002>

    snake-case the field names.



src/main/python/apache/thermos/monitoring/resource.py
Lines 175-179 (patched)
<https://reviews.apache.org/r/60376/#comment256004>

    Can we use `sum` and `zip` to sum the tuples column-wise?



src/test/python/apache/thermos/monitoring/test_resource.py
Lines 104-105 (patched)
<https://reviews.apache.org/r/60376/#comment256005>

    Feed some random values.



src/test/python/apache/thermos/monitoring/test_resource.py
Lines 113 (patched)
<https://reviews.apache.org/r/60376/#comment256007>

    Assert disk_usage.


- Santhosh Kumar Shanmugham


On June 29, 2017, 11:32 a.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated June 29, 2017, 11:32 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/3/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review179411
-----------------------------------------------------------


Ship it!




Ship It!

- Stephan Erb


On June 29, 2017, 8:32 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated June 29, 2017, 8:32 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/3/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review180825
-----------------------------------------------------------


Ship it!




Master (243d6fa) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On July 18, 2017, 4:14 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated July 18, 2017, 4:14 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/5/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/
-----------------------------------------------------------

(Updated July 18, 2017, 4:14 p.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.


Repository: aurora


Description
-------

# Observer task page to load consumption info from history

Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.

On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.

By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.


Diffs (updated)
-----

  src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
  src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
  src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 


Diff: https://reviews.apache.org/r/60376/diff/5/

Changes: https://reviews.apache.org/r/60376/diff/4-5/


Testing
-------

I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.


File Attachments
----------------

without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png


Thanks,

Reza Motamedi


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.

> On July 18, 2017, 7:15 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/python/apache/thermos/monitoring/resource.py
> > Lines 101 (patched)
> > <https://reviews.apache.org/r/60376/diff/4/?file=1778101#file1778101line101>
> >
> >     s/ProcessStatus/process_status/

Done.


- Reza


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review180798
-----------------------------------------------------------


On July 18, 2017, 4:14 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated July 18, 2017, 4:14 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/5/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Santhosh Kumar Shanmugham <sa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review180798
-----------------------------------------------------------


Ship it!





src/main/python/apache/thermos/monitoring/resource.py
Lines 101 (patched)
<https://reviews.apache.org/r/60376/#comment256054>

    s/ProcessStatus/process_status/


- Santhosh Kumar Shanmugham


On July 17, 2017, 11:26 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated July 17, 2017, 11:26 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/4/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Aurora ReviewBot <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review180795
-----------------------------------------------------------



Master (243d6fa) is red with this patch.
  ./build-support/jenkins/build.sh

/home/jenkins/jenkins-slave/workspace/AuroraBot
+ exec /usr/bin/python2.7 /home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/virtualenv-15.0.2/virtualenv.py --no-download /home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/python/isort.venv
New python executable in /home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/python/isort.venv/bin/python2.7
Also creating executable in /home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/python/isort.venv/bin/python
Installing setuptools, pip, wheel...done.
Collecting isort==4.0.0
/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/python/isort.venv/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/python/isort.venv/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Downloading isort-4.0.0-py2.py3-none-any.whl
Installing collected packages: isort
Successfully installed isort-4.0.0
/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/python/isort.venv/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
You are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
ERROR: /home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/python/apache/thermos/monitoring/test_resource.py Imports are incorrectly sorted.
--- /home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/python/apache/thermos/monitoring/test_resource.py:before	2017-07-18 06:31:11.809995
+++ /home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/python/apache/thermos/monitoring/test_resource.py:after	2017-07-18 06:36:40.261237
@@ -14,8 +14,10 @@
 
 from time import time
 from unittest import TestCase
+
 import mock
 import pytest
+from twitter.common.quantity import Amount, Time
 
 from apache.thermos.monitoring.monitor import TaskMonitor
 from apache.thermos.monitoring.process import ProcessSample
@@ -27,8 +29,6 @@
 )
 
 from gen.apache.thermos.ttypes import ProcessStatus
-
-from twitter.common.quantity import Amount, Time
 
 
 class TestResourceHistoryProvider(TestCase):


I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On July 18, 2017, 2:26 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated July 18, 2017, 2:26 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/4/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/
-----------------------------------------------------------

(Updated July 18, 2017, 6:26 a.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.


Changes
-------

Addressed review feedbacks. Added one more test.


Repository: aurora


Description
-------

# Observer task page to load consumption info from history

Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.

On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.

By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.


Diffs (updated)
-----

  src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
  src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
  src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 


Diff: https://reviews.apache.org/r/60376/diff/4/

Changes: https://reviews.apache.org/r/60376/diff/3-4/


Testing
-------

I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.


File Attachments
----------------

without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png


Thanks,

Reza Motamedi


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/
-----------------------------------------------------------

(Updated June 29, 2017, 6:32 p.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.


Changes
-------

@ReviewBot retry


Repository: aurora


Description
-------

# Observer task page to load consumption info from history

Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.

On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.

By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.


Diffs (updated)
-----

  src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
  src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
  src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 


Diff: https://reviews.apache.org/r/60376/diff/3/

Changes: https://reviews.apache.org/r/60376/diff/2-3/


Testing
-------

I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.


File Attachments
----------------

without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png


Thanks,

Reza Motamedi


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/
-----------------------------------------------------------

(Updated June 28, 2017, 7:01 p.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, Jordan Ly, and Santhosh Kumar Shanmugham.


Repository: aurora


Description
-------

# Observer task page to load consumption info from history

Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.

On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.

By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.


Diffs (updated)
-----

  src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
  src/test/python/apache/aurora/executor/common/test_resource_manager.py a898e4d81d34d1e30e39db1be1a66bc9e0ab1a35 
  src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 


Diff: https://reviews.apache.org/r/60376/diff/2/

Changes: https://reviews.apache.org/r/60376/diff/1-2/


Testing
-------

I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
- We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
- Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.


File Attachments
----------------

without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
  https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png


Thanks,

Reza Motamedi


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.

> On June 26, 2017, 5:38 p.m., Stephan Erb wrote:
> > What is your main optimization objective? Reducing page load time or reducing steady observer CPU load? 
> > 
> > I have observed that when running many tasks per node (say ~30-100), it can happen that the metric collection threads essentially starve the UI from almost all CPU time (due to the Python GIL). In these cases, it would actually be better to just use fresh metrics all the time and eliminate the regular collection instead. This would result in slower UI rending but should yield more consistent latency.
> 
> Reza Motamedi wrote:
>     I observed the same problem as well. My objective was to reduce page load time and what worked best was to reuse the collected resource consumption data. This lets us keep all the information that we currently provide. 
>     
>     I did a more or less through profiling of what consumes the most CPU and takes the longest and saw that looking up the children of a pid seems to be very CPU intensive. Check the psutil implementation here: [Process.children](https://pythonhosted.org/psutil/_modules/psutil.html#Process.children). Constanly running this in the background does not seem to help :).
>     
>     I agree that the background thread that computes the resource consumption of all processes isn't very useful, and perhaps it might be better to collect all consumption data as users visit pages. However, We need to remember that the thread is actually performing some collections that could easily become slow to compute, for instance running DU on `n` sandboxes. Also, users can easily flood the UI by constantly refreshing the page, and triggering repeated work.
>     
>     An alternative solution would be to keep the disk collection inside an always running thread and collect the CPU and mem as users visit the page. This should only change what we do in showing the Thermos host (landing) page.

Although, I am not sure how that would perform in practice when the `du` is backlogged.


- Reza


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review178907
-----------------------------------------------------------


On June 22, 2017, 8:18 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated June 22, 2017, 8:18 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/1/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Reza Motamedi <re...@gmail.com>.

> On June 26, 2017, 5:38 p.m., Stephan Erb wrote:
> > What is your main optimization objective? Reducing page load time or reducing steady observer CPU load? 
> > 
> > I have observed that when running many tasks per node (say ~30-100), it can happen that the metric collection threads essentially starve the UI from almost all CPU time (due to the Python GIL). In these cases, it would actually be better to just use fresh metrics all the time and eliminate the regular collection instead. This would result in slower UI rending but should yield more consistent latency.

I observed the same problem as well. My objective was to reduce page load time and what worked best was to reuse the collected resource consumption data. This lets us keep all the information that we currently provide. 

I did a more or less through profiling of what consumes the most CPU and takes the longest and saw that looking up the children of a pid seems to be very CPU intensive. Check the psutil implementation here: [Process.children](https://pythonhosted.org/psutil/_modules/psutil.html#Process.children). Constanly running this in the background does not seem to help :).

I agree that the background thread that computes the resource consumption of all processes isn't very useful, and perhaps it might be better to collect all consumption data as users visit pages. However, We need to remember that the thread is actually performing some collections that could easily become slow to compute, for instance running DU on `n` sandboxes. Also, users can easily flood the UI by constantly refreshing the page, and triggering repeated work.

An alternative solution would be to keep the disk collection inside an always running thread and collect the CPU and mem as users visit the page. This should only change what we do in showing the Thermos host (landing) page.


- Reza


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review178907
-----------------------------------------------------------


On June 22, 2017, 8:18 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated June 22, 2017, 8:18 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/1/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>


Re: Review Request 60376: Observer task page to load consumption info from history

Posted by Stephan Erb <se...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60376/#review178907
-----------------------------------------------------------



What is your main optimization objective? Reducing page load time or reducing steady observer CPU load? 

I have observed that when running many tasks per node (say ~30-100), it can happen that the metric collection threads essentially starve the UI from almost all CPU time (due to the Python GIL). In these cases, it would actually be better to just use fresh metrics all the time and eliminate the regular collection instead. This would result in slower UI rending but should yield more consistent latency.

- Stephan Erb


On June 22, 2017, 10:18 p.m., Reza Motamedi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60376/
> -----------------------------------------------------------
> 
> (Updated June 22, 2017, 10:18 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Observer task page to load consumption info from history
> 
> Resource consumptions of Thermos Processes are periodically calculated by TaskResourceMonitor threads (one thread per Thermos task). This information is used to display a (semi) fresh state of the tasks running on a host in the Observer host page, aka landing page. An aggregate history of the consumptions is kept at the task level, although TaskResourceMonitor needs to first collect the resource at the Process level and then aggregate them.
> 
> On the other hand, when an Observer _task page_ is visited, the resources consumption of Thermos Processes within that task are calculated again and displayed without being aggregated. This can become very slow since time to complete resource calculation is affected by the load on the host.
> 
> By applying this patch we take advantage of the periodic work and fulfill information resource requested in Observer task page from already collected resource consumptions.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/monitoring/resource.py 434666696e600a0e6c19edd986c86575539976f2 
>   src/test/python/apache/thermos/monitoring/test_resource.py d794a998f1d9fc52ba260cd31ac444aee7f8ed28 
> 
> 
> Diff: https://reviews.apache.org/r/60376/diff/1/
> 
> 
> Testing
> -------
> 
> I stress tested this patch on a host that had a slow Observer page. Interestingly, I did not need to do much to make the Observer slow. There are a few points to be made clear first.
> - We at Twitter limit the resources allocated to the Observer using `systemd`. The observer is allowed to use only 20% of a CPU core. The attached screen shots are from such a setup.
> - Having assigned 20% of a cpu core to Observer, starting only 8 `task`s, each with 3 `process`es is enough to make the Observer slow; 11secs to load `task page`.
> 
> 
> File Attachments
> ----------------
> 
> without the patch -- Screen Shot 2017-06-22 at 1.11.12 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/03968028-a2f5-4a99-ba57-b7a41c471436__without_the_patch_--_Screen_Shot_2017-06-22_at_1.11.12_PM.png
> with the patch -- Screen Shot 2017-06-22 at 1.07.41 PM.png
>   https://reviews.apache.org/media/uploaded/files/2017/06/22/5962c018-27d3-4463-a277-f6ad48b7f2d7__with_the_patch_--_Screen_Shot_2017-06-22_at_1.07.41_PM.png
> 
> 
> Thanks,
> 
> Reza Motamedi
> 
>