You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Ian Babrou (JIRA)" <ji...@apache.org> on 2016/09/23 11:55:20 UTC

[jira] [Issue Comment Deleted] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

     [ https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Babrou updated MESOS-6118:
------------------------------
    Comment: was deleted

(was: I've tried it and it didn't fix the issue for me:

{noformat}
Sep 23 11:52:01 myhost mesos-agent[10627]: F0923 11:52:01.524873 10648 fs.cpp:140] Check failed: !visitedParents.contains(parentId)
Sep 23 11:52:01 myhost mesos-agent[10627]: *** Check failure stack trace: ***
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3bb5a253d  google::LogMessage::Fail()
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3bb5a41bd  google::LogMessage::SendToLog()
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3bb5a2102  google::LogMessage::Flush()
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3bb5a4ba9  google::LogMessageFatal::~LogMessageFatal()
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3bb07183d  _ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3bb0717a5  _ZNSt17_Function_handlerIFviEZN5mesos8internal2fs14MountInfoTable4readERK6OptionIiEbEUliE_E9_M_invokeERKSt9_Any_datai
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3bb078c5a  mesos::internal::fs::MountInfoTable::read()
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3bae2c346  mesos::internal::slave::DockerContainerizerProcess::unmountPersistentVolumes()
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3bae48157  mesos::internal::slave::DockerContainerizerProcess::___destroy()
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3bb546094  process::ProcessManager::resume()
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3bb5463b7  _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3b9c20970  (unknown)
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3b973f0a4  start_thread
Sep 23 11:52:01 myhost mesos-agent[10627]: @     0x7ff3b947487d  (unknown)
{noformat}

/proc/mounts:

{noformat}
rootfs / rootfs rw,size=65513288k,nr_inodes=16378322 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=65513304k,nr_inodes=16378326,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
selinuxfs /sys/fs/selinux selinuxfs rw,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
/dev/md127 /state ext4 rw,relatime,stripe=384,data=ordered 0 0
10.10.14.18:/srv/hosts/myhost /srv nfs4 rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.23.25,local_lock=none,addr=10.10.14.18 0 0
10.10.14.18:/srv /srv-master nfs4 rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.23.25,local_lock=none,addr=10.10.14.18 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
{noformat}

Build procedure changes:

{noformat}
 build:
        cd $(BUILDDIR)/mesos                                                        && \
+       patch -p1 < $(TOP)/rb51620.patch                                            && \
        autoreconf -f -i -Wall,no-obsolete                                          && \
        ./bootstrap                                                                 && \
        ./configure --enable-optimize --disable-python --prefix=/usr                && \
{noformat})

> Agent would crash with docker container tasks due to host mount table read.
> ---------------------------------------------------------------------------
>
>                 Key: MESOS-6118
>                 URL: https://issues.apache.org/jira/browse/MESOS-6118
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>    Affects Versions: 1.0.1
>         Environment: Build: 2016-08-26 23:06:27 by centos
> Version: 1.0.1
> Git tag: 1.0.1
> Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> systemd version `219` detected
> Inializing systemd state
> Created systemd slice: `/run/systemd/system/mesos_executors.slice`
> Started systemd slice `mesos_executors.slice`
> Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
>  Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Jamie Briant
>            Assignee: Kevin Klues
>            Priority: Critical
>              Labels: linux, slave
>             Fix For: 1.1.0, 1.0.2
>
>         Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, cycle6.log, slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: ***
> Version 1.0.0 works without this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)