You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@aurora.apache.org by "Bill Farner (JIRA)" <ji...@apache.org> on 2015/05/14 03:52:59 UTC

[jira] [Commented] (AURORA-1320) when instance is running in docker container, thermos observer reports 0 resources

    [ https://issues.apache.org/jira/browse/AURORA-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543061#comment-14543061 ] 

Bill Farner commented on AURORA-1320:
-------------------------------------

{quote}
Or we can just get rid of the observer?
{quote}

I believe there is consensus \[1\] on this point, caveat being that it would likely move/evolve rather than disappear.  I think the utility of resource reporting by the observer might be held under higher contention than the anatomy of the component, however.  I'm of the opinion that resource reporting should be the responsibility of mesos, but a fair counter-point could be made that the user shouldn't care.  IMHO the remaining question is whether this is a valuable feature.

\[1\] https://mail-archives.apache.org/mod_mbox/aurora-dev/201501.mbox/%3CCAFTdr0DZvH21tR%3DNLK0qP-Y9-oL9SyULy6GLah%3DCApuW0SVvnw%40mail.gmail.com%3E

cc [~wickman]

> when instance is running in docker container, thermos observer reports 0 resources
> ----------------------------------------------------------------------------------
>
>                 Key: AURORA-1320
>                 URL: https://issues.apache.org/jira/browse/AURORA-1320
>             Project: Aurora
>          Issue Type: Bug
>          Components: Docker, Thermos
>            Reporter: Jay Buffington
>
> To see the problem start a job inside a docker container and view the task/instance page.  You'll cpu/ram/disk all at zero regardless of their actual usage.
> I see errors like this in the thermos observer log:
> {noformat}
>     W0513 18:41:39.415406 3564 process_collector_psutil.py:42] Error during process sampling [pid=112]: process no longer exists (pid=112)
>     W0513 18:41:39.415612 3564 process_collector_psutil.py:76] Error during process sampling: process no longer exists (pid=112)
>     W0513 18:41:39.513972 3564 process_collector_psutil.py:76] Error during process sampling: no process found with pid 122
> {noformat}
> This is likely because observer is running in a different pid namespace than the process.  One solution would be for the runner to write out the pid namespace it is running in to the checkpoint and then have observer enter that namespace while sampling.
> Or we can just get rid of the observer?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)