You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Neil Conway (JIRA)" <ji...@apache.org> on 2017/05/17 00:32:04 UTC

[jira] [Commented] (MESOS-7517) HealthCheckTest.ConsecutiveFailures is flaky

    [ https://issues.apache.org/jira/browse/MESOS-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013304#comment-16013304 ] 

Neil Conway commented on MESOS-7517:
------------------------------------

cc [~bmahler]

> HealthCheckTest.ConsecutiveFailures is flaky
> --------------------------------------------
>
>                 Key: MESOS-7517
>                 URL: https://issues.apache.org/jira/browse/MESOS-7517
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Neil Conway
>              Labels: mesosphere
>
> {noformat}
> [ RUN      ] HealthCheckTest.ConsecutiveFailures
> I0516 17:12:44.380421 28941 cluster.cpp:162] Creating default 'local' authorizer
> I0516 17:12:44.389566 28996 master.cpp:436] Master 2b745611-28cc-491b-80ea-2b6e94a9cab8 (core-dev) started on 10.0.49.2:37598
> I0516 17:12:44.389619 28996 master.cpp:438] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/kYELQI/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/kYELQI/master" --zk_session_timeout="10secs"
> I0516 17:12:44.389943 28996 master.cpp:488] Master only allowing authenticated frameworks to register
> I0516 17:12:44.389971 28996 master.cpp:502] Master only allowing authenticated agents to register
> I0516 17:12:44.389988 28996 master.cpp:515] Master only allowing authenticated HTTP frameworks to register
> I0516 17:12:44.390012 28996 credentials.hpp:37] Loading credentials for authentication from '/tmp/kYELQI/credentials'
> I0516 17:12:44.390353 28996 master.cpp:560] Using default 'crammd5' authenticator
> I0516 17:12:44.390504 28996 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> I0516 17:12:44.390661 28996 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
> I0516 17:12:44.390993 28996 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
> I0516 17:12:44.391158 28996 master.cpp:640] Authorization enabled
> I0516 17:12:44.393784 28958 master.cpp:2161] Elected as the leading master!
> I0516 17:12:44.393831 28958 master.cpp:1700] Recovering from registrar
> I0516 17:12:44.394521 28969 registrar.cpp:389] Successfully fetched the registry (0B) in 536064ns
> I0516 17:12:44.394621 28969 registrar.cpp:493] Applied 1 operations in 16653ns; attempting to update the registry
> I0516 17:12:44.395346 28969 registrar.cpp:550] Successfully updated the registry in 664832ns
> I0516 17:12:44.395448 28969 registrar.cpp:422] Successfully recovered registrar
> I0516 17:12:44.395992 28958 master.cpp:1799] Recovered 0 agents from the registry (119B); allowing 10mins for agents to re-register
> I0516 17:12:44.404881 28941 containerizer.cpp:221] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
> W0516 17:12:44.405333 28941 backend.cpp:76] Failed to create 'overlay' backend: OverlayBackend requires root privileges
> W0516 17:12:44.405426 28941 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges
> I0516 17:12:44.405462 28941 provisioner.cpp:249] Using default backend 'copy'
> I0516 17:12:44.406657 28941 cluster.cpp:448] Creating default 'local' authorizer
> I0516 17:12:44.407929 28989 slave.cpp:225] Mesos agent started on (203)@10.0.49.2:37598
> I0516 17:12:44.407973 28989 slave.cpp:226] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_credentials="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/http_credentials" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/home/nrc/build-mesos-default-opts/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/HealthCheckTest_ConsecutiveFailures_WXsqod"
> I0516 17:12:44.408372 28989 credentials.hpp:86] Loading credential for authentication from '/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/credential'
> I0516 17:12:44.408543 28989 slave.cpp:258] Agent using credential for: test-principal
> I0516 17:12:44.408593 28989 credentials.hpp:37] Loading credentials for authentication from '/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/http_credentials'
> I0516 17:12:44.408852 28989 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-agent-readonly'
> I0516 17:12:44.409008 28989 http.cpp:975] Creating default 'basic' HTTP authenticator for realm 'mesos-agent-readwrite'
> I0516 17:12:44.414839 28989 slave.cpp:529] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0516 17:12:44.414953 28989 slave.cpp:537] Agent attributes: [  ]
> I0516 17:12:44.414980 28989 slave.cpp:542] Agent hostname: core-dev
> I0516 17:12:44.415108 28961 status_update_manager.cpp:177] Pausing sending status updates
> I0516 17:12:44.416466 28961 state.cpp:62] Recovering state from '/tmp/HealthCheckTest_ConsecutiveFailures_WXsqod/meta'
> I0516 17:12:44.416718 28958 status_update_manager.cpp:203] Recovering status update manager
> I0516 17:12:44.417064 28960 containerizer.cpp:608] Recovering containerizer
> I0516 17:12:44.419234 28976 provisioner.cpp:410] Provisioner recovery complete
> I0516 17:12:44.419749 28986 slave.cpp:5974] Finished recovery
> I0516 17:12:44.420372 28998 status_update_manager.cpp:177] Pausing sending status updates
> I0516 17:12:44.420370 28986 slave.cpp:922] New master detected at master@10.0.49.2:37598
> I0516 17:12:44.420516 28986 slave.cpp:957] Detecting new master
> I0516 17:12:44.424572 28941 sched.cpp:232] Version: 1.4.0
> I0516 17:12:44.425042 28995 sched.cpp:336] New master detected at master@10.0.49.2:37598
> I0516 17:12:44.425138 28995 sched.cpp:407] Authenticating with master master@10.0.49.2:37598
> I0516 17:12:44.425168 28995 sched.cpp:414] Using default CRAM-MD5 authenticatee
> I0516 17:12:44.425364 28958 authenticatee.cpp:121] Creating new client SASL connection
> I0516 17:12:44.429754 28999 slave.cpp:984] Authenticating with master master@10.0.49.2:37598
> I0516 17:12:44.429811 28999 slave.cpp:995] Using default CRAM-MD5 authenticatee
> I0516 17:12:44.429942 28955 authenticatee.cpp:121] Creating new client SASL connection
> I0516 17:12:44.437100 28984 master.cpp:7475] Authenticating slave(203)@10.0.49.2:37598
> I0516 17:12:44.437371 28965 authenticator.cpp:98] Creating new server SASL connection
> W0516 17:12:49.426436 28956 sched.cpp:537] Authentication timed out
> W0516 17:12:49.430752 28985 slave.cpp:1098] Authentication timed out
> W0516 17:12:49.431509 28973 slave.cpp:1043] Failed to authenticate with master master@10.0.49.2:37598: Authentication discarded
> W0516 17:12:49.437960 29000 master.cpp:7522] Authentication timed out
> I0516 17:12:49.442778 28996 master.cpp:7475] Authenticating scheduler-ea29d6eb-1bfe-4cc9-b196-e3535ed6dd4e@10.0.49.2:37598
> I0516 17:12:49.443080 28995 authenticator.cpp:98] Creating new server SASL connection
> I0516 17:12:49.443548 28966 sched.cpp:477] Failed to authenticate with master master@10.0.49.2:37598: Authentication discarded
> W0516 17:12:49.449880 28964 master.cpp:7502] Failed to authenticate scheduler-ea29d6eb-1bfe-4cc9-b196-e3535ed6dd4e@10.0.49.2:37598: Failed to communicate with authenticatee
> I0516 17:12:49.888478 29000 slave.cpp:984] Authenticating with master master@10.0.49.2:37598
> I0516 17:12:49.888593 29000 slave.cpp:995] Using default CRAM-MD5 authenticatee
> I0516 17:12:49.888759 28995 authenticatee.cpp:121] Creating new client SASL connection
> I0516 17:12:49.896517 28995 master.cpp:7461] Queuing up authentication request from slave(203)@10.0.49.2:37598 because authentication is still in progress
> I0516 17:12:51.343961 28977 sched.cpp:407] Authenticating with master master@10.0.49.2:37598
> I0516 17:12:51.344002 28977 sched.cpp:414] Using default CRAM-MD5 authenticatee
> I0516 17:12:51.344451 29000 authenticatee.cpp:121] Creating new client SASL connection
> I0516 17:12:51.373108 29001 master.cpp:7475] Authenticating scheduler-ea29d6eb-1bfe-4cc9-b196-e3535ed6dd4e@10.0.49.2:37598
> I0516 17:12:51.373463 28975 authenticator.cpp:98] Creating new server SASL connection
> I0516 17:12:51.415412 28957 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
> I0516 17:12:51.415469 28957 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0516 17:12:51.415738 28978 authenticator.cpp:204] Received SASL authentication start
> I0516 17:12:51.415832 28978 authenticator.cpp:326] Authentication requires more steps
> I0516 17:12:51.415956 28969 authenticatee.cpp:259] Received SASL authentication step
> I0516 17:12:51.416134 28996 authenticator.cpp:232] Received SASL authentication step
> I0516 17:12:51.416249 28996 authenticator.cpp:318] Authentication success
> I0516 17:12:51.416415 28970 master.cpp:7505] Successfully authenticated principal 'test-principal' at scheduler-ea29d6eb-1bfe-4cc9-b196-e3535ed6dd4e@10.0.49.2:37598
> I0516 17:12:51.416525 28964 authenticatee.cpp:299] Authentication success
> I0516 17:12:51.416913 28980 sched.cpp:513] Successfully authenticated with master master@10.0.49.2:37598
> I0516 17:12:51.417172 28987 master.cpp:2813] Received SUBSCRIBE call for framework 'default' at scheduler-ea29d6eb-1bfe-4cc9-b196-e3535ed6dd4e@10.0.49.2:37598
> I0516 17:12:51.417279 28987 master.cpp:2197] Authorizing framework principal 'test-principal' to receive offers for roles '{ * }'
> I0516 17:12:51.417778 29001 master.cpp:2890] Subscribing framework default with checkpointing disabled and capabilities [  ]
> I0516 17:12:51.418303 29002 sched.cpp:759] Framework registered with 2b745611-28cc-491b-80ea-2b6e94a9cab8-0000
> I0516 17:12:51.418393 28958 hierarchical.cpp:273] Added framework 2b745611-28cc-491b-80ea-2b6e94a9cab8-0000
> W0516 17:12:54.888931 28985 slave.cpp:1098] Authentication timed out
> W0516 17:12:54.889354 28985 slave.cpp:1043] Failed to authenticate with master master@10.0.49.2:37598: Authentication discarded
> I0516 17:12:55.118023 28973 slave.cpp:984] Authenticating with master master@10.0.49.2:37598
> I0516 17:12:55.118098 28973 slave.cpp:995] Using default CRAM-MD5 authenticatee
> I0516 17:12:55.118614 28967 authenticatee.cpp:121] Creating new client SASL connection
> ../../mesos/src/tests/health_check_tests.cpp:957: Failure
> Failed to wait 15secs for offers
> *** Aborted at 1494979979 (unix time) try "date -d @1494979979" if you are using GNU date ***
> PC: @          0x2011328 testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 28941 (TID 0x7f3981a4a8c0) from PID 0; stack trace: ***
>     @     0x7f3978acc370 (unknown)
> W0516 17:12:59.454641 28978 master.cpp:7502] Failed to authenticate slave(203)@10.0.49.2:37598: Failed to communicate with authenticatee
> I0516 17:12:59.454766 28978 master.cpp:7475] Authenticating slave(203)@10.0.49.2:37598
> W0516 17:12:59.455497 28958 master.cpp:7502] Failed to authenticate slave(203)@10.0.49.2:37598: Failed to communicate with authenticatee
>     @          0x2011328 testing::UnitTest::AddTestPartResult()
>     @          0x2004467 testing::internal::AssertHelper::operator=()
>     @          0x11ca5d0 mesos::internal::tests::HealthCheckTest_ConsecutiveFailures_Test::TestBody()
>     @          0x2030820 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>     @          0x202ae80 testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x200b04d testing::Test::Run()
>     @          0x200b866 testing::TestInfo::Run()
>     @          0x200beac testing::TestCase::Run()
>     @          0x2012800 testing::internal::UnitTestImpl::RunAllTests()
>     @          0x2031445 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>     @          0x202b9fe testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x2011546 testing::UnitTest::Run()
>     @          0x138ca1b RUN_ALL_TESTS()
>     @          0x138c4ec main
>     @     0x7f39778dab35 __libc_start_main
>     @           0xb0a049 (unknown)
> zsh: segmentation fault (core dumped)  ./src/mesos-tests --gtest_filter="HealthCheckTest.ConsecutiveFailures"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)