You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2012/07/25 20:38:33 UTC

[jira] [Created] (MESOS-244) Mesos webui process is not doing FLO_EXEC

Vinod Kone created MESOS-244:
--------------------------------

             Summary: Mesos webui process is not doing FLO_EXEC
                 Key: MESOS-244
                 URL: https://issues.apache.org/jira/browse/MESOS-244
             Project: Mesos
          Issue Type: Bug
            Reporter: Vinod Kone


This appeared in one of our clusters at Twitter.

Looks like the slave webui process (which is a fork of mesos-slave) is not properly doing FD_CLOEXEC, because we see a bunch of shared file descriptors between slave webui and the executors.

[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
mesos-sla 9273 root    0r  FIFO      0,6          72770782 pipe
mesos-sla 9273 root    1w  FIFO      0,6          72770783 pipe
mesos-sla 9273 root    2w  FIFO      0,6          72770784 pipe
mesos-sla 9273 root    8r  FIFO      0,6          72770790 pipe
mesos-sla 9273 root    9w  FIFO      0,6          72770790 pipe
mesos-sla 9273 root   11r  FIFO      0,6          72770807 pipe
mesos-sla 9273 root   12w  FIFO      0,6          72770807 pipe
mesos-sla 9273 root   13r  FIFO      0,6          72770808 pipe

[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
python2.6 11298 graphservice    0r  FIFO      0,6           72770782 pipe
python2.6 11298 graphservice   11r  FIFO      0,6           72770807 pipe
python2.6 11298 graphservice   12w  FIFO      0,6           72770807 pipe
python2.6 11298 graphservice   14w  FIFO      0,6           72770808 pipe
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MESOS-244) Mesos slave process is not shutting down cleanly

Posted by "Vinod Kone (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MESOS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kone resolved MESOS-244.
------------------------------

    Resolution: Fixed

https://reviews.apache.org/r/6146/
https://reviews.apache.org/r/6263/
                
> Mesos slave process is not shutting down cleanly
> ------------------------------------------------
>
>                 Key: MESOS-244
>                 URL: https://issues.apache.org/jira/browse/MESOS-244
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>            Assignee: Vinod Kone
>
> This appeared in one of our clusters at Twitter.
> Looks like the slave webui process (which is a fork of mesos-slave) is not properly shutting down. 
> Couple of things that need to happen to fix this
> 1) Set FD_CLOEXEC on any opened pipes, because we see a bunch of shared file descriptors between slave webui and the executors.
> 2) Explicitly call executor shutdown, to give it a chance to clean up
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
> mesos-sla 9273 root    0r  FIFO      0,6          72770782 pipe
> mesos-sla 9273 root    1w  FIFO      0,6          72770783 pipe
> mesos-sla 9273 root    2w  FIFO      0,6          72770784 pipe
> mesos-sla 9273 root    8r  FIFO      0,6          72770790 pipe
> mesos-sla 9273 root    9w  FIFO      0,6          72770790 pipe
> mesos-sla 9273 root   11r  FIFO      0,6          72770807 pipe
> mesos-sla 9273 root   12w  FIFO      0,6          72770807 pipe
> mesos-sla 9273 root   13r  FIFO      0,6          72770808 pipe
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
> python2.6 11298 graphservice    0r  FIFO      0,6           72770782 pipe
> python2.6 11298 graphservice   11r  FIFO      0,6           72770807 pipe
> python2.6 11298 graphservice   12w  FIFO      0,6           72770807 pipe
> python2.6 11298 graphservice   14w  FIFO      0,6           72770808 pipe
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MESOS-244) Mesos slave process is not shutting down cleanly

Posted by "Vinod Kone (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MESOS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kone reassigned MESOS-244:
--------------------------------

    Assignee: Vinod Kone
    
> Mesos slave process is not shutting down cleanly
> ------------------------------------------------
>
>                 Key: MESOS-244
>                 URL: https://issues.apache.org/jira/browse/MESOS-244
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>            Assignee: Vinod Kone
>
> This appeared in one of our clusters at Twitter.
> Looks like the slave webui process (which is a fork of mesos-slave) is not properly shutting down. 
> Couple of things that need to happen to fix this
> 1) Set FD_CLOEXEC on any opened pipes, because we see a bunch of shared file descriptors between slave webui and the executors.
> 2) Explicitly call executor shutdown, to give it a chance to clean up
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
> mesos-sla 9273 root    0r  FIFO      0,6          72770782 pipe
> mesos-sla 9273 root    1w  FIFO      0,6          72770783 pipe
> mesos-sla 9273 root    2w  FIFO      0,6          72770784 pipe
> mesos-sla 9273 root    8r  FIFO      0,6          72770790 pipe
> mesos-sla 9273 root    9w  FIFO      0,6          72770790 pipe
> mesos-sla 9273 root   11r  FIFO      0,6          72770807 pipe
> mesos-sla 9273 root   12w  FIFO      0,6          72770807 pipe
> mesos-sla 9273 root   13r  FIFO      0,6          72770808 pipe
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
> python2.6 11298 graphservice    0r  FIFO      0,6           72770782 pipe
> python2.6 11298 graphservice   11r  FIFO      0,6           72770807 pipe
> python2.6 11298 graphservice   12w  FIFO      0,6           72770807 pipe
> python2.6 11298 graphservice   14w  FIFO      0,6           72770808 pipe
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MESOS-244) Mesos slave process is not shutting down cleanly

Posted by "Vinod Kone (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MESOS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kone updated MESOS-244:
-----------------------------

    Description: 
This appeared in one of our clusters at Twitter.

Looks like the slave webui process (which is a fork of mesos-slave) is not properly shutting down. 
Couple of things that need to happen to fix this

1) Set FD_CLOEXEC on any opened pipes, because we see a bunch of shared file descriptors between slave webui and the executors.
2) Explicitly call executor shutdown, to give it a chance to clean up


[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
mesos-sla 9273 root    0r  FIFO      0,6          72770782 pipe
mesos-sla 9273 root    1w  FIFO      0,6          72770783 pipe
mesos-sla 9273 root    2w  FIFO      0,6          72770784 pipe
mesos-sla 9273 root    8r  FIFO      0,6          72770790 pipe
mesos-sla 9273 root    9w  FIFO      0,6          72770790 pipe
mesos-sla 9273 root   11r  FIFO      0,6          72770807 pipe
mesos-sla 9273 root   12w  FIFO      0,6          72770807 pipe
mesos-sla 9273 root   13r  FIFO      0,6          72770808 pipe

[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
python2.6 11298 graphservice    0r  FIFO      0,6           72770782 pipe
python2.6 11298 graphservice   11r  FIFO      0,6           72770807 pipe
python2.6 11298 graphservice   12w  FIFO      0,6           72770807 pipe
python2.6 11298 graphservice   14w  FIFO      0,6           72770808 pipe
 

  was:
This appeared in one of our clusters at Twitter.

Looks like the slave webui process (which is a fork of mesos-slave) is not properly doing FD_CLOEXEC, because we see a bunch of shared file descriptors between slave webui and the executors.

[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
mesos-sla 9273 root    0r  FIFO      0,6          72770782 pipe
mesos-sla 9273 root    1w  FIFO      0,6          72770783 pipe
mesos-sla 9273 root    2w  FIFO      0,6          72770784 pipe
mesos-sla 9273 root    8r  FIFO      0,6          72770790 pipe
mesos-sla 9273 root    9w  FIFO      0,6          72770790 pipe
mesos-sla 9273 root   11r  FIFO      0,6          72770807 pipe
mesos-sla 9273 root   12w  FIFO      0,6          72770807 pipe
mesos-sla 9273 root   13r  FIFO      0,6          72770808 pipe

[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
python2.6 11298 graphservice    0r  FIFO      0,6           72770782 pipe
python2.6 11298 graphservice   11r  FIFO      0,6           72770807 pipe
python2.6 11298 graphservice   12w  FIFO      0,6           72770807 pipe
python2.6 11298 graphservice   14w  FIFO      0,6           72770808 pipe
 

        Summary: Mesos slave process is not shutting down cleanly  (was: Mesos webui process is not doing FD_CLOEXEC)
    
> Mesos slave process is not shutting down cleanly
> ------------------------------------------------
>
>                 Key: MESOS-244
>                 URL: https://issues.apache.org/jira/browse/MESOS-244
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>
> This appeared in one of our clusters at Twitter.
> Looks like the slave webui process (which is a fork of mesos-slave) is not properly shutting down. 
> Couple of things that need to happen to fix this
> 1) Set FD_CLOEXEC on any opened pipes, because we see a bunch of shared file descriptors between slave webui and the executors.
> 2) Explicitly call executor shutdown, to give it a chance to clean up
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
> mesos-sla 9273 root    0r  FIFO      0,6          72770782 pipe
> mesos-sla 9273 root    1w  FIFO      0,6          72770783 pipe
> mesos-sla 9273 root    2w  FIFO      0,6          72770784 pipe
> mesos-sla 9273 root    8r  FIFO      0,6          72770790 pipe
> mesos-sla 9273 root    9w  FIFO      0,6          72770790 pipe
> mesos-sla 9273 root   11r  FIFO      0,6          72770807 pipe
> mesos-sla 9273 root   12w  FIFO      0,6          72770807 pipe
> mesos-sla 9273 root   13r  FIFO      0,6          72770808 pipe
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
> python2.6 11298 graphservice    0r  FIFO      0,6           72770782 pipe
> python2.6 11298 graphservice   11r  FIFO      0,6           72770807 pipe
> python2.6 11298 graphservice   12w  FIFO      0,6           72770807 pipe
> python2.6 11298 graphservice   14w  FIFO      0,6           72770808 pipe
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MESOS-244) Mesos webui process is not doing FD_CLOEXEC

Posted by "Vinod Kone (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MESOS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kone updated MESOS-244:
-----------------------------

    Summary: Mesos webui process is not doing FD_CLOEXEC  (was: Mesos webui process is not doing FLO_EXEC)
    
> Mesos webui process is not doing FD_CLOEXEC
> -------------------------------------------
>
>                 Key: MESOS-244
>                 URL: https://issues.apache.org/jira/browse/MESOS-244
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>
> This appeared in one of our clusters at Twitter.
> Looks like the slave webui process (which is a fork of mesos-slave) is not properly doing FD_CLOEXEC, because we see a bunch of shared file descriptors between slave webui and the executors.
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
> mesos-sla 9273 root    0r  FIFO      0,6          72770782 pipe
> mesos-sla 9273 root    1w  FIFO      0,6          72770783 pipe
> mesos-sla 9273 root    2w  FIFO      0,6          72770784 pipe
> mesos-sla 9273 root    8r  FIFO      0,6          72770790 pipe
> mesos-sla 9273 root    9w  FIFO      0,6          72770790 pipe
> mesos-sla 9273 root   11r  FIFO      0,6          72770807 pipe
> mesos-sla 9273 root   12w  FIFO      0,6          72770807 pipe
> mesos-sla 9273 root   13r  FIFO      0,6          72770808 pipe
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
> python2.6 11298 graphservice    0r  FIFO      0,6           72770782 pipe
> python2.6 11298 graphservice   11r  FIFO      0,6           72770807 pipe
> python2.6 11298 graphservice   12w  FIFO      0,6           72770807 pipe
> python2.6 11298 graphservice   14w  FIFO      0,6           72770808 pipe
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira