You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Matthias Veit (JIRA)" <ji...@apache.org> on 2015/11/02 10:42:27 UTC

[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

    [ https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984970#comment-14984970 ] 

Matthias Veit commented on MESOS-3793:
--------------------------------------


Starting mesos local with  --launcher=posix has no effect.
With env variable export MESOS_LAUNCHER=posix I can start mesos local.

Mounting /sys/fs/cgroup and starting mesos local fails with this error:

{noformat}
➔ docker run -v /sys/fs/cgroup:/sys/fs/cgroup:rw -it marathon-buildbase:test sh
# mesos local
I1102 09:35:15.839287     5 leveldb.cpp:176] Opened db in 4.975612ms
I1102 09:35:15.840312     5 leveldb.cpp:183] Compacted db in 981189ns
I1102 09:35:15.840348     5 leveldb.cpp:198] Created db iterator in 9033ns
I1102 09:35:15.840353     5 leveldb.cpp:204] Seeked to beginning of db in 1414ns
I1102 09:35:15.840358     5 leveldb.cpp:273] Iterated through 0 keys in the db in 1025ns
I1102 09:35:15.840389     5 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I1102 09:35:15.840790     9 recover.cpp:449] Starting replica recovery
I1102 09:35:15.840991    10 recover.cpp:475] Replica is in EMPTY status
I1102 09:35:15.841492     9 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request
I1102 09:35:15.841908     6 recover.cpp:195] Received a recover response from a replica in EMPTY status
I1102 09:35:15.842003     6 recover.cpp:566] Updating replica status to STARTING
I1102 09:35:15.843122     7 master.cpp:376] Master af8c1547-e308-4348-99d4-93879f06d853 (833b280a4c4a) started on 172.17.0.7:5050
I1102 09:35:15.843327     7 master.cpp:378] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/tmp/mesos/local/JU6SZj" --zk_session_timeout="10secs"
I1102 09:35:15.843575     7 master.cpp:425] Master allowing unauthenticated frameworks to register
I1102 09:35:15.843822     7 master.cpp:430] Master allowing unauthenticated slaves to register
I1102 09:35:15.843950     7 master.cpp:467] Using default 'crammd5' authenticator
W1102 09:35:15.844105     7 authenticator.cpp:505] No credentials provided, authentication requests will be refused
I1102 09:35:15.844224     7 authenticator.cpp:512] Initializing server SASL
I1102 09:35:15.843875     5 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix
I1102 09:35:15.843231    11 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 1.186846ms
I1102 09:35:15.844820    11 replica.cpp:323] Persisted replica status to STARTING
I1102 09:35:15.845212    11 recover.cpp:475] Replica is in STARTING status
I1102 09:35:15.845577    11 replica.cpp:641] Replica in STARTING status received a broadcasted recover request
I1102 09:35:15.845881    11 recover.cpp:195] Received a recover response from a replica in STARTING status
I1102 09:35:15.846217    11 recover.cpp:566] Updating replica status to VOTING
I1102 09:35:15.846650    11 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 265224ns
I1102 09:35:15.846683    11 replica.cpp:323] Persisted replica status to VOTING
I1102 09:35:15.846721    11 recover.cpp:580] Successfully joined the Paxos group
I1102 09:35:15.846835    11 recover.cpp:464] Recover process terminated
I1102 09:35:15.849839     7 master.cpp:1603] The newly elected leader is master@172.17.0.7:5050 with id af8c1547-e308-4348-99d4-93879f06d853
I1102 09:35:15.853528     7 master.cpp:1616] Elected as the leading master!
I1102 09:35:15.853793     7 master.cpp:1376] Recovering from registrar
I1102 09:35:15.854033    13 registrar.cpp:309] Recovering registrar
I1102 09:35:15.854266     9 log.cpp:661] Attempting to start the writer
I1102 09:35:15.854802     9 replica.cpp:477] Replica received implicit promise request with proposal 1
I1102 09:35:15.853359     5 linux_launcher.cpp:103] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I1102 09:35:15.856086     9 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 1.148617ms
I1102 09:35:15.856168     9 replica.cpp:345] Persisted promised to 1
I1102 09:35:15.857818     6 coordinator.cpp:231] Coordinator attemping to fill missing position
I1102 09:35:15.858723    13 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2
I1102 09:35:15.859380    13 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 599989ns
I1102 09:35:15.859414    13 replica.cpp:679] Persisted action at 0
I1102 09:35:15.859788     9 replica.cpp:511] Replica received write request for position 0
I1102 09:35:15.859863     9 leveldb.cpp:438] Reading position from leveldb took 16229ns
I1102 09:35:15.860203     9 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 317011ns
I1102 09:35:15.860257     9 replica.cpp:679] Persisted action at 0
I1102 09:35:15.860366     9 replica.cpp:658] Replica received learned notice for position 0
I1102 09:35:15.861297     9 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 789105ns
I1102 09:35:15.861330     9 replica.cpp:679] Persisted action at 0
I1102 09:35:15.861371     9 replica.cpp:664] Replica learned NOP action at position 0
I1102 09:35:15.861457     9 log.cpp:677] Writer started with ending position 0
I1102 09:35:15.861711     9 leveldb.cpp:438] Reading position from leveldb took 7791ns
I1102 09:35:15.862535     9 registrar.cpp:342] Successfully fetched the registry (0B) in 8.40192ms
I1102 09:35:15.862589     9 registrar.cpp:441] Applied 1 operations in 4352ns; attempting to update the 'registry'
I1102 09:35:15.862763     9 log.cpp:685] Attempting to append 165 bytes to the log
I1102 09:35:15.862846     9 coordinator.cpp:341] Coordinator attempting to write APPEND action at position 1
I1102 09:35:15.863004     9 replica.cpp:511] Replica received write request for position 1
I1102 09:35:15.863351     9 leveldb.cpp:343] Persisting action (184 bytes) to leveldb took 282975ns
I1102 09:35:15.863426     9 replica.cpp:679] Persisted action at 1
I1102 09:35:15.863567    10 replica.cpp:658] Replica received learned notice for position 1
I1102 09:35:15.863859    10 leveldb.cpp:343] Persisting action (186 bytes) to leveldb took 267957ns
I1102 09:35:15.863886    10 replica.cpp:679] Persisted action at 1
I1102 09:35:15.863898    10 replica.cpp:664] Replica learned APPEND action at position 1
I1102 09:35:15.864140     9 registrar.cpp:486] Successfully updated the 'registry' in 1.516032ms
I1102 09:35:15.864183    10 log.cpp:704] Attempting to truncate the log to 1
I1102 09:35:15.864302    10 coordinator.cpp:341] Coordinator attempting to write TRUNCATE action at position 2
I1102 09:35:15.864197     9 registrar.cpp:372] Successfully recovered registrar
I1102 09:35:15.864425     9 replica.cpp:511] Replica received write request for position 2
I1102 09:35:15.864423    10 master.cpp:1413] Recovered 0 slaves from the Registry (127B) ; allowing 10mins for slaves to re-register
I1102 09:35:15.866138     9 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 1.676671ms
I1102 09:35:15.866181     9 replica.cpp:679] Persisted action at 2
I1102 09:35:15.866294     9 replica.cpp:658] Replica received learned notice for position 2
I1102 09:35:15.866595     5 systemd.cpp:128] systemd version `215` detected
W1102 09:35:15.866622     5 systemd.cpp:136] Required functionality `Delegate` was introduced in Version `218`. Your system may not function properly; however since some distributions have patched systemd packages, your system may still be functional. This is why we keep running. See MESOS-3352 for more information
I1102 09:35:15.866664     9 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 294030ns
I1102 09:35:15.866722     9 leveldb.cpp:401] Deleting ~1 keys from leveldb took 13730ns
I1102 09:35:15.866750     9 replica.cpp:679] Persisted action at 2
I1102 09:35:15.866780     9 replica.cpp:664] Replica learned TRUNCATE action at position 2
Failed to create a containerizer: Could not create MesosContainerizer: Failed to create launcher: Failed to initialize systemd: Failed to locate systemd runtime directory: /run/systemd/system
{noformat}

> Cannot start mesos local on a Debian GNU/Linux 8 docker machine
> ---------------------------------------------------------------
>
>                 Key: MESOS-3793
>                 URL: https://issues.apache.org/jira/browse/MESOS-3793
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.25.0
>         Environment: Debian GNU/Linux 8 docker machine
>            Reporter: Matthias Veit
>            Assignee: Jojy Varghese
>              Labels: mesosphere
>
> We updated the mesos version to 0.25.0 in our Marathon docker image, that runs our integration tests.
> We use mesos local for those tests. This fails with this message:
> {noformat}
> root@a06e4b4eb776:/marathon# mesos local
> I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
> I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
> I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
> I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 1287ns
> I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the db in 1111ns
> I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
> I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
> I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request
> I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from a replica in EMPTY status
> I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to STARTING
> I1022 18:42:26.855655   138 master.cpp:376] Master a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 172.17.0.14:5050
> I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
> I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated frameworks to register
> I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated slaves to register
> I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' authenticator
> W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, authentication requests will be refused
> I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
> I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: posix/cpu,posix/mem,filesystem/posix
> I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 1.888185ms
> I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to STARTING
> I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
> I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status received a broadcasted recover request
> I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from a replica in STARTING status
> I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
> I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 371121ns
> I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to VOTING
> I1022 18:42:26.857895   140 recover.cpp:580] Successfully joined the Paxos group
> I1022 18:42:26.857928   140 recover.cpp:464] Recover process terminated
> I1022 18:42:26.862455   137 master.cpp:1603] The newly elected leader is master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8
> I1022 18:42:26.862498   137 master.cpp:1616] Elected as the leading master!
> I1022 18:42:26.862511   137 master.cpp:1376] Recovering from registrar
> I1022 18:42:26.862560   137 registrar.cpp:309] Recovering registrar
> Failed to create a containerizer: Could not create MesosContainerizer: Failed to create launcher: Failed to create Linux launcher: Failed to mount cgroups hierarchy at '/sys/fs/cgroup/freezer': 'freezer' is already attached to another hierarchy
> {noformat}
> The setup worked with mesos 0.24.0.
> The Dockerfile is here: https://github.com/mesosphere/marathon/blob/mv/mesos_0.25/Dockerfile
> {noformat}
> root@a06e4b4eb776:/marathon# ls /sys/fs/cgroup/
> root@a06e4b4eb776:/marathon# 
> {noformat}
> {noformat}
> root@a06e4b4eb776:/marathon# cat /proc/mounts 
> none / aufs rw,relatime,si=6e7ac87f36042e03,dio,dirperm1 0 0
> proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
> tmpfs /dev tmpfs rw,nosuid,mode=755 0 0
> devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
> shm /dev/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=65536k 0 0
> mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
> sysfs /sys sysfs ro,nosuid,nodev,noexec,relatime 0 0
> /dev/sda1 /etc/resolv.conf ext4 rw,relatime,data=ordered 0 0
> /dev/sda1 /etc/hostname ext4 rw,relatime,data=ordered 0 0
> /dev/sda1 /etc/hosts ext4 rw,relatime,data=ordered 0 0
> devpts /dev/console devpts rw,relatime,mode=600,ptmxmode=000 0 0
> proc /proc/bus proc ro,nosuid,nodev,noexec,relatime 0 0
> proc /proc/fs proc ro,nosuid,nodev,noexec,relatime 0 0
> proc /proc/irq proc ro,nosuid,nodev,noexec,relatime 0 0
> proc /proc/sys proc ro,nosuid,nodev,noexec,relatime 0 0
> proc /proc/sysrq-trigger proc ro,nosuid,nodev,noexec,relatime 0 0
> tmpfs /proc/kcore tmpfs rw,nosuid,mode=755 0 0
> tmpfs /proc/timer_stats tmpfs rw,nosuid,mode=755 0 0
> {noformat}
> [~bernd-mesos] Can you please assign to the correct person?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)