You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "James Peach (JIRA)" <ji...@apache.org> on 2018/10/29 16:40:00 UTC

[jira] [Created] (MESOS-9361) CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively always fails.

James Peach created MESOS-9361:
----------------------------------

             Summary: CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively always fails.
                 Key: MESOS-9361
                 URL: https://issues.apache.org/jira/browse/MESOS-9361
             Project: Mesos
          Issue Type: Bug
          Components: flaky, test
            Reporter: James Peach


On Fedora 28:

 

 {noformat}
[ RUN      ] CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively
I1029 09:38:31.866564 31397 cgroups.cpp:2838] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce
I1029 09:38:31.867048 31398 cgroups.cpp:1229] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce after 359936ns
I1029 09:38:31.869033 31397 cgroups.cpp:2856] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce
I1029 09:38:31.869357 31403 cgroups.cpp:1258] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce after 261888ns
I1029 09:38:31.884752 31382 cluster.cpp:173] Creating default 'local' authorizer
I1029 09:38:31.892966 31397 master.cpp:413] Master 0b04a175-fe62-41a1-a387-8d679d1d9609 (jpeach.scv.apple.com) started on 17.228.8.72:42153
I1029 09:38:31.892992 31397 master.cpp:416] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/mFB69h/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --publish_per_framework_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/mFB69h/master" --zk_session_timeout="10secs"
I1029 09:38:31.893931 31397 master.cpp:465] Master only allowing authenticated frameworks to register
I1029 09:38:31.893942 31397 master.cpp:471] Master only allowing authenticated agents to register
I1029 09:38:31.893951 31397 master.cpp:477] Master only allowing authenticated HTTP frameworks to register
I1029 09:38:31.893962 31397 credentials.hpp:37] Loading credentials for authentication from '/tmp/mFB69h/credentials'
I1029 09:38:31.894204 31397 master.cpp:521] Using default 'crammd5' authenticator
I1029 09:38:31.894359 31397 authenticator.cpp:520] Initializing server SASL
I1029 09:38:31.898878 31397 auxprop.cpp:73] Initialized in-memory auxiliary property plugin
I1029 09:38:31.898983 31397 http.cpp:1038] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
I1029 09:38:31.899279 31397 http.cpp:1038] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
I1029 09:38:31.899395 31397 http.cpp:1038] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
I1029 09:38:31.899507 31397 master.cpp:602] Authorization enabled
I1029 09:38:31.900339 31406 whitelist_watcher.cpp:77] No whitelist given
I1029 09:38:31.900434 31400 hierarchical.cpp:175] Initialized hierarchical allocator process
I1029 09:38:31.908254 31403 master.cpp:2105] Elected as the leading master!
I1029 09:38:31.908313 31403 master.cpp:1660] Recovering from registrar
I1029 09:38:31.908717 31404 registrar.cpp:339] Recovering registrar
I1029 09:38:31.910310 31400 registrar.cpp:383] Successfully fetched the registry (0B) in 1.547776ms
I1029 09:38:31.910684 31400 registrar.cpp:487] Applied 1 operations in 150793ns; attempting to update the registry
I1029 09:38:31.913811 31400 registrar.cpp:544] Successfully updated the registry in 2.979072ms
I1029 09:38:31.914028 31400 registrar.cpp:416] Successfully recovered registrar
I1029 09:38:31.914872 31398 master.cpp:1774] Recovered 0 agents from the registry (154B); allowing 10mins for agents to reregister
I1029 09:38:31.914912 31406 hierarchical.cpp:215] Skipping recovery of hierarchical allocator: nothing to recover
I1029 09:38:31.920753 31382 containerizer.cpp:305] Using isolation { network/cni, filesystem/posix, environment_secret, cgroups/mem }
I1029 09:38:31.926185 31382 linux_launcher.cpp:144] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I1029 09:38:31.927129 31382 provisioner.cpp:298] Using default backend 'overlay'
W1029 09:38:31.942937 31382 process.cpp:2829] Attempted to spawn already running process files@17.228.8.72:42153
I1029 09:38:31.943821 31382 cluster.cpp:485] Creating default 'local' authorizer
I1029 09:38:31.946377 31402 slave.cpp:267] Mesos agent started on (1)@17.228.8.72:42153
I1029 09:38:31.946439 31402 slave.cpp:268] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authentication_timeout_max="1mins" --authentication_timeout_min="5secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_destroy_timeout="1mins" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos_test_0ace7d1c-d155-43db-b4f8-df1d91ce4270" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/credential" --default_role="*" --disallow_sharing_agent_pid_namespace="false" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/fetch" --fetcher_cache_size="2GB" --fetcher_stall_timeout="1mins" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --gc_non_executor_container_sandboxes="false" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_credentials="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/http_credentials" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="cgroups/mem" --launcher="linux" --launcher_dir="/home/jpeach/upstream/mesos/build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --memory_profiling="false" --network_cni_metrics="true" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --reconfiguration_policy="equal" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x" --xfs_kill_containers="false" --xfs_project_range="[5000-10000]" --zk_session_timeout="10secs"
I1029 09:38:31.946923 31402 credentials.hpp:86] Loading credential for authentication from '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/credential'
I1029 09:38:31.947036 31402 slave.cpp:300] Agent using credential for: test-principal
I1029 09:38:31.947049 31402 credentials.hpp:37] Loading credentials for authentication from '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/http_credentials'
I1029 09:38:31.947134 31402 http.cpp:1038] Creating default 'basic' HTTP authenticator for realm 'mesos-agent-readonly'
I1029 09:38:31.947352 31402 disk_profile_adaptor.cpp:80] Creating default disk profile adaptor module
I1029 09:38:31.949756 31402 slave.cpp:615] Agent resources: [{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I1029 09:38:31.949865 31402 slave.cpp:623] Agent attributes: [  ]
I1029 09:38:31.949882 31402 slave.cpp:632] Agent hostname: jpeach.scv.apple.com
I1029 09:38:31.950031 31401 task_status_update_manager.cpp:181] Pausing sending task status updates
I1029 09:38:31.951694 31403 state.cpp:66] Recovering state from '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x/meta'
I1029 09:38:31.951923 31402 slave.cpp:6915] Finished recovering checkpointed state from '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x/meta', beginning agent recovery
I1029 09:38:31.952055 31404 task_status_update_manager.cpp:207] Recovering task status update manager
I1029 09:38:31.952318 31405 containerizer.cpp:727] Recovering Mesos containers
I1029 09:38:31.952589 31401 linux_launcher.cpp:286] Recovering Linux launcher
I1029 09:38:31.953091 31396 containerizer.cpp:1053] Recovering isolators
E1029 09:38:31.954258 31405 slave.cpp:7275] EXIT with status 1: Failed to perform recovery: Collect failed: Collect failed: Failed to list cgroups under '/sys/fs/cgroup/memory': Failed to determine canonical path of '/sys/fs/cgroup/memory/mesos_test_0ace7d1c-d155-43db-b4f8-df1d91ce4270': No such file or directory
If recovery failed due to a change in configuration and you want to
keep the current agent id, you might want to change the
`--reconfiguration_policy` flag to a more permissive value.

To restart this agent with a new agent id instead, do as follows:
rm -f /tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x/meta/slaves/latest
This ensures that the agent does not recover old live executors.

If you use the Docker containerizer and think that the Docker
daemon state is broken, you can try to clear it. But be careful:
these commands will erase all containers and images from this host,
not just those started by Mesos!
docker kill $(docker ps -q)
docker rm $(docker ps -a -q)
docker rmi $(docker images -q)

Finally, restart the agent.

../../3rdparty/libprocess/include/process/gmock.hpp:247: ERROR: this mock object (used in test CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively) should be deleted but never is. Its address is @0x56491dd05de8.
../../src/tests/mock_registrar.cpp:54: ERROR: this mock object (used in test CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively) should be deleted but never is. Its address is @0x56491e267c00.
../../src/tests/containerizer/cgroups_isolator_tests.cpp:737: ERROR: this mock object (used in test CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively) should be deleted but never is. Its address is @0x7ffca24acd10.
ERROR: 3 leaked mock objects found at program exit.
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)