You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2017/06/27 17:55:00 UTC

[jira] [Created] (MESOS-7726) MasterTest.IgnoreOldAgentReregistration test is flaky

Vinod Kone created MESOS-7726:
---------------------------------

             Summary: MasterTest.IgnoreOldAgentReregistration test is flaky
                 Key: MESOS-7726
                 URL: https://issues.apache.org/jira/browse/MESOS-7726
             Project: Mesos
          Issue Type: Bug
            Reporter: Vinod Kone
            Assignee: Neil Conway


Observed this on ASF CI.

{code}
[ RUN      ] MasterTest.IgnoreOldAgentReregistration
I0627 05:23:06.031154  4917 cluster.cpp:162] Creating default 'local' authorizer
I0627 05:23:06.033433  4945 master.cpp:438] Master a8778782-0da1-49a5-9cb8-9f6d11701733 (c43debbe7e32) started on 172.17.0.4:41747
I0627 05:23:06.033457  4945 master.cpp:440] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/2BARnF/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui" --work_dir="/tmp/2BARnF/master" --zk_session_timeout="10secs"
I0627 05:23:06.033771  4945 master.cpp:490] Master only allowing authenticated frameworks to register
I0627 05:23:06.033787  4945 master.cpp:504] Master only allowing authenticated agents to register
I0627 05:23:06.033798  4945 master.cpp:517] Master only allowing authenticated HTTP frameworks to register
I0627 05:23:06.033812  4945 credentials.hpp:37] Loading credentials for authentication from '/tmp/2BARnF/credentials'
I0627 05:23:06.034080  4945 master.cpp:562] Using default 'crammd5' authenticator
I0627 05:23:06.034221  4945 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
I0627 05:23:06.034409  4945 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
I0627 05:23:06.034569  4945 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
I0627 05:23:06.034688  4945 master.cpp:642] Authorization enabled
I0627 05:23:06.034862  4938 whitelist_watcher.cpp:77] No whitelist given
I0627 05:23:06.034868  4950 hierarchical.cpp:169] Initialized hierarchical allocator process
I0627 05:23:06.037211  4957 master.cpp:2161] Elected as the leading master!
I0627 05:23:06.037236  4957 master.cpp:1700] Recovering from registrar
I0627 05:23:06.037333  4938 registrar.cpp:345] Recovering registrar
I0627 05:23:06.038146  4938 registrar.cpp:389] Successfully fetched the registry (0B) in 768256ns
I0627 05:23:06.038290  4938 registrar.cpp:493] Applied 1 operations in 30798ns; attempting to update the registry
I0627 05:23:06.038861  4938 registrar.cpp:550] Successfully updated the registry in 510976ns
I0627 05:23:06.038960  4938 registrar.cpp:422] Successfully recovered registrar
I0627 05:23:06.039364  4941 hierarchical.cpp:207] Skipping recovery of hierarchical allocator: nothing to recover
I0627 05:23:06.039594  4958 master.cpp:1799] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register
I0627 05:23:06.043999  4917 containerizer.cpp:230] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
W0627 05:23:06.044456  4917 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges
W0627 05:23:06.044548  4917 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges
I0627 05:23:06.044580  4917 provisioner.cpp:255] Using default backend 'copy'
I0627 05:23:06.046222  4917 cluster.cpp:448] Creating default 'local' authorizer
I0627 05:23:06.047572  4950 slave.cpp:249] Mesos agent started on (269)@172.17.0.4:41747
I0627 05:23:06.047591  4950 slave.cpp:250] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_credentials="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/http_credentials" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/mesos/mesos-1.4.0/_build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c"
I0627 05:23:06.047937  4950 credentials.hpp:86] Loading credential for authentication from '/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/credential'
I0627 05:23:06.048065  4950 slave.cpp:282] Agent using credential for: test-principal
I0627 05:23:06.048085  4950 credentials.hpp:37] Loading credentials for authentication from '/tmp/MasterTest_IgnoreOldAgentReregistration_Bgz7OK/http_credentials'
I0627 05:23:06.048274  4950 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-agent-readonly'
I0627 05:23:06.048396  4950 http.cpp:974] Creating default 'basic' HTTP authenticator for realm 'mesos-agent-readwrite'
I0627 05:23:06.049597  4950 slave.cpp:553] Agent resources: [{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I0627 05:23:06.049815  4950 slave.cpp:561] Agent attributes: [  ]
I0627 05:23:06.049830  4950 slave.cpp:566] Agent hostname: c43debbe7e32
I0627 05:23:06.049934  4941 status_update_manager.cpp:177] Pausing sending status updates
I0627 05:23:06.051394  4957 state.cpp:64] Recovering state from '/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c/meta'
I0627 05:23:06.051775  4954 status_update_manager.cpp:203] Recovering status update manager
I0627 05:23:06.052050  4959 containerizer.cpp:582] Recovering containerizer
I0627 05:23:06.053479  4946 provisioner.cpp:416] Provisioner recovery complete
I0627 05:23:06.053802  4937 slave.cpp:6168] Finished recovery
I0627 05:23:06.054203  4937 slave.cpp:6350] Querying resource estimator for oversubscribable resources
I0627 05:23:06.054503  4954 slave.cpp:946] New master detected at master@172.17.0.4:41747
I0627 05:23:06.054517  4950 status_update_manager.cpp:177] Pausing sending status updates
I0627 05:23:06.054641  4954 slave.cpp:981] Detecting new master
I0627 05:23:06.054774  4954 slave.cpp:6364] Received oversubscribable resources {} from the resource estimator
I0627 05:23:06.065029  4943 slave.cpp:1008] Authenticating with master master@172.17.0.4:41747
I0627 05:23:06.065158  4943 slave.cpp:1019] Using default CRAM-MD5 authenticatee
I0627 05:23:06.065412  4945 authenticatee.cpp:121] Creating new client SASL connection
I0627 05:23:06.065798  4938 master.cpp:7640] Authenticating slave(269)@172.17.0.4:41747
I0627 05:23:06.065937  4954 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(577)@172.17.0.4:41747
I0627 05:23:06.066174  4937 authenticator.cpp:98] Creating new server SASL connection
I0627 05:23:06.066419  4958 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I0627 05:23:06.066448  4958 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I0627 05:23:06.066550  4958 authenticator.cpp:204] Received SASL authentication start
I0627 05:23:06.066612  4958 authenticator.cpp:326] Authentication requires more steps
I0627 05:23:06.066845  4952 authenticatee.cpp:259] Received SASL authentication step
I0627 05:23:06.067041  4936 authenticator.cpp:232] Received SASL authentication step
I0627 05:23:06.067081  4936 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0627 05:23:06.067096  4936 auxprop.cpp:181] Looking up auxiliary property '*userPassword'
I0627 05:23:06.067129  4936 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0627 05:23:06.067149  4936 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0627 05:23:06.067159  4936 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.067167  4936 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.067184  4936 authenticator.cpp:318] Authentication success
I0627 05:23:06.067267  4945 authenticatee.cpp:299] Authentication success
I0627 05:23:06.067364  4954 master.cpp:7670] Successfully authenticated principal 'test-principal' at slave(269)@172.17.0.4:41747
I0627 05:23:06.067433  4943 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(577)@172.17.0.4:41747
I0627 05:23:06.067656  4939 slave.cpp:1103] Successfully authenticated with master master@172.17.0.4:41747
I0627 05:23:06.067855  4939 slave.cpp:1545] Will retry registration in 1.835348ms if necessary
I0627 05:23:06.068022  4949 master.cpp:5602] Received register agent message from slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.068155  4949 master.cpp:3742] Authorizing agent with principal 'test-principal'
I0627 05:23:06.068568  4957 master.cpp:5662] Authorized registration of agent at slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.068668  4957 master.cpp:5741] Registering agent at slave(269)@172.17.0.4:41747 (c43debbe7e32) with id a8778782-0da1-49a5-9cb8-9f6d11701733-S0
I0627 05:23:06.069067  4937 registrar.cpp:493] Applied 1 operations in 56568ns; attempting to update the registry
I0627 05:23:06.069788  4937 registrar.cpp:550] Successfully updated the registry in 659968ns
I0627 05:23:06.069957  4946 master.cpp:5788] Admitted agent a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.070394  4950 slave.cpp:1545] Will retry registration in 34.487053ms if necessary
I0627 05:23:06.070508  4950 slave.cpp:4866] Received ping from slave-observer(272)@172.17.0.4:41747
I0627 05:23:06.070776  4950 slave.cpp:1149] Registered with master master@172.17.0.4:41747; given agent ID a8778782-0da1-49a5-9cb8-9f6d11701733-S0
I0627 05:23:06.070569  4946 master.cpp:5819] Registered agent a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 (c43debbe7e32) with [{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I0627 05:23:06.070894  4942 status_update_manager.cpp:184] Resuming sending status updates
I0627 05:23:06.070890  4949 hierarchical.cpp:587] Added agent a8778782-0da1-49a5-9cb8-9f6d11701733-S0 (c43debbe7e32) with cpus:2; mem:1024; disk:1024; ports:[31000-32000] (allocated: {})
I0627 05:23:06.071032  4946 master.cpp:5602] Received register agent message from slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.071089  4950 slave.cpp:1169] Checkpointing SlaveInfo to '/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c/meta/slaves/a8778782-0da1-49a5-9cb8-9f6d11701733-S0/slave.info'
I0627 05:23:06.071142  4946 master.cpp:3742] Authorizing agent with principal 'test-principal'
I0627 05:23:06.071179  4949 hierarchical.cpp:1938] No allocations performed
I0627 05:23:06.071238  4949 hierarchical.cpp:1493] Performed allocation for 1 agents in 180904ns
I0627 05:23:06.071403  4950 slave.cpp:1207] Forwarding total oversubscribed resources {}
I0627 05:23:06.071549  4944 master.cpp:5662] Authorized registration of agent at slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:06.071624  4952 status_update_manager.cpp:177] Pausing sending status updates
I0627 05:23:06.071624  4944 master.cpp:5719] Agent a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 (c43debbe7e32) already registered, resending acknowledgement
I0627 05:23:06.071624  4950 slave.cpp:946] New master detected at master@172.17.0.4:41747
I0627 05:23:06.071753  4950 slave.cpp:981] Detecting new master
I0627 05:23:06.071769  4944 master.cpp:6508] Received update of agent a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 (c43debbe7e32) with total oversubscribed resources {}
I0627 05:23:06.071905  4950 slave.cpp:1149] Registered with master master@172.17.0.4:41747; given agent ID a8778782-0da1-49a5-9cb8-9f6d11701733-S0
I0627 05:23:06.072108  4943 status_update_manager.cpp:184] Resuming sending status updates
I0627 05:23:06.072243  4950 slave.cpp:1169] Checkpointing SlaveInfo to '/tmp/MasterTest_IgnoreOldAgentReregistration_US8t6c/meta/slaves/a8778782-0da1-49a5-9cb8-9f6d11701733-S0/slave.info'
I0627 05:23:06.072391  4948 hierarchical.cpp:1938] No allocations performed
I0627 05:23:06.072443  4948 hierarchical.cpp:1493] Performed allocation for 1 agents in 158723ns
I0627 05:23:06.072525  4950 slave.cpp:1207] Forwarding total oversubscribed resources {}
I0627 05:23:06.072618  4950 slave.cpp:1008] Authenticating with master master@172.17.0.4:41747
I0627 05:23:06.072661  4950 slave.cpp:1019] Using default CRAM-MD5 authenticatee
I0627 05:23:06.072693  4959 master.cpp:6508] Received update of agent a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 (c43debbe7e32) with total oversubscribed resources {}
I0627 05:23:06.072882  4946 authenticatee.cpp:121] Creating new client SASL connection
I0627 05:23:06.073169  4944 master.cpp:7640] Authenticating slave(269)@172.17.0.4:41747
I0627 05:23:06.073289  4950 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(578)@172.17.0.4:41747
I0627 05:23:06.073531  4952 authenticator.cpp:98] Creating new server SASL connection
I0627 05:23:06.073796  4949 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I0627 05:23:06.073827  4949 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I0627 05:23:06.073998  4936 authenticator.cpp:204] Received SASL authentication start
I0627 05:23:06.074061  4936 authenticator.cpp:326] Authentication requires more steps
I0627 05:23:06.074195  4957 authenticatee.cpp:259] Received SASL authentication step
I0627 05:23:06.074335  4953 authenticator.cpp:232] Received SASL authentication step
I0627 05:23:06.074370  4953 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0627 05:23:06.074388  4953 auxprop.cpp:181] Looking up auxiliary property '*userPassword'
I0627 05:23:06.074416  4953 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0627 05:23:06.074436  4953 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: 'c43debbe7e32' server FQDN: 'c43debbe7e32' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0627 05:23:06.074446  4953 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.074455  4953 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0627 05:23:06.074468  4953 authenticator.cpp:318] Authentication success
I0627 05:23:06.074553  4956 authenticatee.cpp:299] Authentication success
I0627 05:23:06.074594  4945 master.cpp:7670] Successfully authenticated principal 'test-principal' at slave(269)@172.17.0.4:41747
I0627 05:23:06.074642  4952 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(578)@172.17.0.4:41747
I0627 05:23:06.074981  4951 slave.cpp:1103] Successfully authenticated with master master@172.17.0.4:41747
../../src/tests/master_tests.cpp:7110: Failure
Failed to wait 15secs for reregisterSlaveMessage
I0627 05:23:21.082969  4953 slave.cpp:818] Agent terminating
I0627 05:23:21.083305  4936 master.cpp:1316] Agent a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 (c43debbe7e32) disconnected
I0627 05:23:21.083349  4936 master.cpp:3240] Disconnecting agent a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:21.083443  4936 master.cpp:3259] Deactivating agent a8778782-0da1-49a5-9cb8-9f6d11701733-S0 at slave(269)@172.17.0.4:41747 (c43debbe7e32)
I0627 05:23:21.083665  4936 hierarchical.cpp:715] Agent a8778782-0da1-49a5-9cb8-9f6d11701733-S0 deactivated
I0627 05:23:21.089186  4917 master.cpp:1158] Master terminating
I0627 05:23:21.090092  4944 hierarchical.cpp:620] Removed agent a8778782-0da1-49a5-9cb8-9f6d11701733-S0
../../3rdparty/libprocess/include/process/gmock.hpp:441: Failure
Actual function call count doesn't match EXPECT_CALL(filter->mock, filter(testing::A<const MessageEvent&>()))...
    Expected args: message matcher (8-byte object <E8-AB 06-80 6C-2B 00-00>, 1-byte object <AB>, 1-byte object <E8>)
         Expected: to be called once
           Actual: never called - unsatisfied and active
[  FAILED  ] MasterTest.IgnoreOldAgentReregistration (15070 ms)

{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)