You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2012/07/25 20:38:33 UTC
[jira] [Created] (MESOS-244) Mesos webui process is not doing
FLO_EXEC
Vinod Kone created MESOS-244:
--------------------------------
Summary: Mesos webui process is not doing FLO_EXEC
Key: MESOS-244
URL: https://issues.apache.org/jira/browse/MESOS-244
Project: Mesos
Issue Type: Bug
Reporter: Vinod Kone
This appeared in one of our clusters at Twitter.
Looks like the slave webui process (which is a fork of mesos-slave) is not properly doing FD_CLOEXEC, because we see a bunch of shared file descriptors between slave webui and the executors.
[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
mesos-sla 9273 root 0r FIFO 0,6 72770782 pipe
mesos-sla 9273 root 1w FIFO 0,6 72770783 pipe
mesos-sla 9273 root 2w FIFO 0,6 72770784 pipe
mesos-sla 9273 root 8r FIFO 0,6 72770790 pipe
mesos-sla 9273 root 9w FIFO 0,6 72770790 pipe
mesos-sla 9273 root 11r FIFO 0,6 72770807 pipe
mesos-sla 9273 root 12w FIFO 0,6 72770807 pipe
mesos-sla 9273 root 13r FIFO 0,6 72770808 pipe
[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
python2.6 11298 graphservice 0r FIFO 0,6 72770782 pipe
python2.6 11298 graphservice 11r FIFO 0,6 72770807 pipe
python2.6 11298 graphservice 12w FIFO 0,6 72770807 pipe
python2.6 11298 graphservice 14w FIFO 0,6 72770808 pipe
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MESOS-244) Mesos slave process is not shutting
down cleanly
Posted by "Vinod Kone (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MESOS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kone resolved MESOS-244.
------------------------------
Resolution: Fixed
https://reviews.apache.org/r/6146/
https://reviews.apache.org/r/6263/
> Mesos slave process is not shutting down cleanly
> ------------------------------------------------
>
> Key: MESOS-244
> URL: https://issues.apache.org/jira/browse/MESOS-244
> Project: Mesos
> Issue Type: Bug
> Reporter: Vinod Kone
> Assignee: Vinod Kone
>
> This appeared in one of our clusters at Twitter.
> Looks like the slave webui process (which is a fork of mesos-slave) is not properly shutting down.
> Couple of things that need to happen to fix this
> 1) Set FD_CLOEXEC on any opened pipes, because we see a bunch of shared file descriptors between slave webui and the executors.
> 2) Explicitly call executor shutdown, to give it a chance to clean up
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
> mesos-sla 9273 root 0r FIFO 0,6 72770782 pipe
> mesos-sla 9273 root 1w FIFO 0,6 72770783 pipe
> mesos-sla 9273 root 2w FIFO 0,6 72770784 pipe
> mesos-sla 9273 root 8r FIFO 0,6 72770790 pipe
> mesos-sla 9273 root 9w FIFO 0,6 72770790 pipe
> mesos-sla 9273 root 11r FIFO 0,6 72770807 pipe
> mesos-sla 9273 root 12w FIFO 0,6 72770807 pipe
> mesos-sla 9273 root 13r FIFO 0,6 72770808 pipe
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
> python2.6 11298 graphservice 0r FIFO 0,6 72770782 pipe
> python2.6 11298 graphservice 11r FIFO 0,6 72770807 pipe
> python2.6 11298 graphservice 12w FIFO 0,6 72770807 pipe
> python2.6 11298 graphservice 14w FIFO 0,6 72770808 pipe
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MESOS-244) Mesos slave process is not shutting
down cleanly
Posted by "Vinod Kone (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MESOS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kone reassigned MESOS-244:
--------------------------------
Assignee: Vinod Kone
> Mesos slave process is not shutting down cleanly
> ------------------------------------------------
>
> Key: MESOS-244
> URL: https://issues.apache.org/jira/browse/MESOS-244
> Project: Mesos
> Issue Type: Bug
> Reporter: Vinod Kone
> Assignee: Vinod Kone
>
> This appeared in one of our clusters at Twitter.
> Looks like the slave webui process (which is a fork of mesos-slave) is not properly shutting down.
> Couple of things that need to happen to fix this
> 1) Set FD_CLOEXEC on any opened pipes, because we see a bunch of shared file descriptors between slave webui and the executors.
> 2) Explicitly call executor shutdown, to give it a chance to clean up
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
> mesos-sla 9273 root 0r FIFO 0,6 72770782 pipe
> mesos-sla 9273 root 1w FIFO 0,6 72770783 pipe
> mesos-sla 9273 root 2w FIFO 0,6 72770784 pipe
> mesos-sla 9273 root 8r FIFO 0,6 72770790 pipe
> mesos-sla 9273 root 9w FIFO 0,6 72770790 pipe
> mesos-sla 9273 root 11r FIFO 0,6 72770807 pipe
> mesos-sla 9273 root 12w FIFO 0,6 72770807 pipe
> mesos-sla 9273 root 13r FIFO 0,6 72770808 pipe
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
> python2.6 11298 graphservice 0r FIFO 0,6 72770782 pipe
> python2.6 11298 graphservice 11r FIFO 0,6 72770807 pipe
> python2.6 11298 graphservice 12w FIFO 0,6 72770807 pipe
> python2.6 11298 graphservice 14w FIFO 0,6 72770808 pipe
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MESOS-244) Mesos slave process is not shutting
down cleanly
Posted by "Vinod Kone (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MESOS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kone updated MESOS-244:
-----------------------------
Description:
This appeared in one of our clusters at Twitter.
Looks like the slave webui process (which is a fork of mesos-slave) is not properly shutting down.
Couple of things that need to happen to fix this
1) Set FD_CLOEXEC on any opened pipes, because we see a bunch of shared file descriptors between slave webui and the executors.
2) Explicitly call executor shutdown, to give it a chance to clean up
[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
mesos-sla 9273 root 0r FIFO 0,6 72770782 pipe
mesos-sla 9273 root 1w FIFO 0,6 72770783 pipe
mesos-sla 9273 root 2w FIFO 0,6 72770784 pipe
mesos-sla 9273 root 8r FIFO 0,6 72770790 pipe
mesos-sla 9273 root 9w FIFO 0,6 72770790 pipe
mesos-sla 9273 root 11r FIFO 0,6 72770807 pipe
mesos-sla 9273 root 12w FIFO 0,6 72770807 pipe
mesos-sla 9273 root 13r FIFO 0,6 72770808 pipe
[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
python2.6 11298 graphservice 0r FIFO 0,6 72770782 pipe
python2.6 11298 graphservice 11r FIFO 0,6 72770807 pipe
python2.6 11298 graphservice 12w FIFO 0,6 72770807 pipe
python2.6 11298 graphservice 14w FIFO 0,6 72770808 pipe
was:
This appeared in one of our clusters at Twitter.
Looks like the slave webui process (which is a fork of mesos-slave) is not properly doing FD_CLOEXEC, because we see a bunch of shared file descriptors between slave webui and the executors.
[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
mesos-sla 9273 root 0r FIFO 0,6 72770782 pipe
mesos-sla 9273 root 1w FIFO 0,6 72770783 pipe
mesos-sla 9273 root 2w FIFO 0,6 72770784 pipe
mesos-sla 9273 root 8r FIFO 0,6 72770790 pipe
mesos-sla 9273 root 9w FIFO 0,6 72770790 pipe
mesos-sla 9273 root 11r FIFO 0,6 72770807 pipe
mesos-sla 9273 root 12w FIFO 0,6 72770807 pipe
mesos-sla 9273 root 13r FIFO 0,6 72770808 pipe
[wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
python2.6 11298 graphservice 0r FIFO 0,6 72770782 pipe
python2.6 11298 graphservice 11r FIFO 0,6 72770807 pipe
python2.6 11298 graphservice 12w FIFO 0,6 72770807 pipe
python2.6 11298 graphservice 14w FIFO 0,6 72770808 pipe
Summary: Mesos slave process is not shutting down cleanly (was: Mesos webui process is not doing FD_CLOEXEC)
> Mesos slave process is not shutting down cleanly
> ------------------------------------------------
>
> Key: MESOS-244
> URL: https://issues.apache.org/jira/browse/MESOS-244
> Project: Mesos
> Issue Type: Bug
> Reporter: Vinod Kone
>
> This appeared in one of our clusters at Twitter.
> Looks like the slave webui process (which is a fork of mesos-slave) is not properly shutting down.
> Couple of things that need to happen to fix this
> 1) Set FD_CLOEXEC on any opened pipes, because we see a bunch of shared file descriptors between slave webui and the executors.
> 2) Explicitly call executor shutdown, to give it a chance to clean up
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
> mesos-sla 9273 root 0r FIFO 0,6 72770782 pipe
> mesos-sla 9273 root 1w FIFO 0,6 72770783 pipe
> mesos-sla 9273 root 2w FIFO 0,6 72770784 pipe
> mesos-sla 9273 root 8r FIFO 0,6 72770790 pipe
> mesos-sla 9273 root 9w FIFO 0,6 72770790 pipe
> mesos-sla 9273 root 11r FIFO 0,6 72770807 pipe
> mesos-sla 9273 root 12w FIFO 0,6 72770807 pipe
> mesos-sla 9273 root 13r FIFO 0,6 72770808 pipe
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
> python2.6 11298 graphservice 0r FIFO 0,6 72770782 pipe
> python2.6 11298 graphservice 11r FIFO 0,6 72770807 pipe
> python2.6 11298 graphservice 12w FIFO 0,6 72770807 pipe
> python2.6 11298 graphservice 14w FIFO 0,6 72770808 pipe
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MESOS-244) Mesos webui process is not doing
FD_CLOEXEC
Posted by "Vinod Kone (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MESOS-244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kone updated MESOS-244:
-----------------------------
Summary: Mesos webui process is not doing FD_CLOEXEC (was: Mesos webui process is not doing FLO_EXEC)
> Mesos webui process is not doing FD_CLOEXEC
> -------------------------------------------
>
> Key: MESOS-244
> URL: https://issues.apache.org/jira/browse/MESOS-244
> Project: Mesos
> Issue Type: Bug
> Reporter: Vinod Kone
>
> This appeared in one of our clusters at Twitter.
> Looks like the slave webui process (which is a fork of mesos-slave) is not properly doing FD_CLOEXEC, because we see a bunch of shared file descriptors between slave webui and the executors.
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 9273 | grep FIFO
> mesos-sla 9273 root 0r FIFO 0,6 72770782 pipe
> mesos-sla 9273 root 1w FIFO 0,6 72770783 pipe
> mesos-sla 9273 root 2w FIFO 0,6 72770784 pipe
> mesos-sla 9273 root 8r FIFO 0,6 72770790 pipe
> mesos-sla 9273 root 9w FIFO 0,6 72770790 pipe
> mesos-sla 9273 root 11r FIFO 0,6 72770807 pipe
> mesos-sla 9273 root 12w FIFO 0,6 72770807 pipe
> mesos-sla 9273 root 13r FIFO 0,6 72770808 pipe
> [wickman@atla-aai-11-sr1 ~]$ sudo /usr/sbin/lsof -p 11298 | grep FIFO
> python2.6 11298 graphservice 0r FIFO 0,6 72770782 pipe
> python2.6 11298 graphservice 11r FIFO 0,6 72770807 pipe
> python2.6 11298 graphservice 12w FIFO 0,6 72770807 pipe
> python2.6 11298 graphservice 14w FIFO 0,6 72770808 pipe
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira