You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Jan Schlicht (JIRA)" <ji...@apache.org> on 2018/07/03 12:17:00 UTC

[jira] [Commented] (MESOS-7441) RegisterSlaveValidationTest.DropInvalidRegistration is flaky

    [ https://issues.apache.org/jira/browse/MESOS-7441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531286#comment-16531286 ] 

Jan Schlicht commented on MESOS-7441:
-------------------------------------

Reopened, as there was a recent test run (on {{master}}, SHA {{b50f6c8a}}) failing on CentOS 6 with
{noformat}
[ RUN      ] RegisterSlaveValidationTest.DropInvalidRegistration
I0703 11:44:46.746553 16172 cluster.cpp:173] Creating default 'local' authorizer
I0703 11:44:46.747535 16196 master.cpp:463] Master cce3860c-7d4f-4996-b865-fc8ce8302705 (ip-172-16-10-44.ec2.internal) started on 172.16.10.44:33909
I0703 11:44:46.747611 16196 master.cpp:466] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/dwPsJP/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/dwPsJP/master" --zk_session_timeout="10secs"
I0703 11:44:46.747733 16196 master.cpp:515] Master only allowing authenticated frameworks to register
I0703 11:44:46.747748 16196 master.cpp:521] Master only allowing authenticated agents to register
I0703 11:44:46.747754 16196 master.cpp:527] Master only allowing authenticated HTTP frameworks to register
I0703 11:44:46.747761 16196 credentials.hpp:37] Loading credentials for authentication from '/tmp/dwPsJP/credentials'
I0703 11:44:46.747872 16196 master.cpp:571] Using default 'crammd5' authenticator
I0703 11:44:46.747907 16196 http.cpp:959] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
I0703 11:44:46.747944 16196 http.cpp:959] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
I0703 11:44:46.747967 16196 http.cpp:959] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
I0703 11:44:46.747997 16196 master.cpp:652] Authorization enabled
I0703 11:44:46.748157 16194 hierarchical.cpp:177] Initialized hierarchical allocator process
I0703 11:44:46.748183 16194 whitelist_watcher.cpp:77] No whitelist given
I0703 11:44:46.748715 16196 master.cpp:2162] Elected as the leading master!
I0703 11:44:46.748736 16196 master.cpp:1717] Recovering from registrar
I0703 11:44:46.748950 16196 registrar.cpp:339] Recovering registrar
I0703 11:44:46.749035 16196 registrar.cpp:383] Successfully fetched the registry (0B) in 68864ns
I0703 11:44:46.749059 16196 registrar.cpp:487] Applied 1 operations in 5058ns; attempting to update the registry
I0703 11:44:46.749349 16196 registrar.cpp:544] Successfully updated the registry in 275968ns
I0703 11:44:46.749385 16196 registrar.cpp:416] Successfully recovered registrar
I0703 11:44:46.749465 16196 master.cpp:1831] Recovered 0 agents from the registry (172B); allowing 10mins for agents to reregister
I0703 11:44:46.749589 16196 hierarchical.cpp:215] Skipping recovery of hierarchical allocator: nothing to recover
W0703 11:44:46.751214 16172 process.cpp:2824] Attempted to spawn already running process files@172.16.10.44:33909
I0703 11:44:46.751505 16172 containerizer.cpp:300] Using isolation { environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }
I0703 11:44:46.753739 16172 linux_launcher.cpp:146] Using /cgroup/freezer as the freezer hierarchy for the Linux launcher
I0703 11:44:46.754091 16172 provisioner.cpp:298] Using default backend 'copy'
I0703 11:44:46.754447 16172 cluster.cpp:479] Creating default 'local' authorizer
I0703 11:44:46.754907 16195 slave.cpp:268] Mesos agent started on (361)@172.16.10.44:33909
I0703 11:44:46.754920 16195 slave.cpp:269] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_W7jYUL/store/appc" --authenticate_http_executors="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_W7jYUL/credential" --default_role="*" --disallow_sharing_agent_pid_namespace="false" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_W7jYUL/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_W7jYUL/fetch" --fetcher_cache_size="2GB" --fetcher_stall_timeout="1mins" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_credentials="/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_W7jYUL/http_credentials" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --jwt_secret_key="/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_W7jYUL/jwt_secret_key" --launcher="linux" --launcher_dir="/home/centos/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mesos-ec2-centos-6/mesos/build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --memory_profiling="false" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --reconfiguration_policy="equal" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_W7jYUL" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_oKndPi" --zk_session_timeout="10secs"
I0703 11:44:46.755080 16195 credentials.hpp:86] Loading credential for authentication from '/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_W7jYUL/credential'
I0703 11:44:46.755143 16195 slave.cpp:301] Agent using credential for: test-principal
I0703 11:44:46.755153 16195 credentials.hpp:37] Loading credentials for authentication from '/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_W7jYUL/http_credentials'
I0703 11:44:46.755244 16195 http.cpp:959] Creating default 'basic' HTTP authenticator for realm 'mesos-agent-executor'
I0703 11:44:46.755276 16195 http.cpp:980] Creating default 'jwt' HTTP authenticator for realm 'mesos-agent-executor'
I0703 11:44:46.755334 16195 http.cpp:959] Creating default 'basic' HTTP authenticator for realm 'mesos-agent-readonly'
I0703 11:44:46.755355 16195 http.cpp:980] Creating default 'jwt' HTTP authenticator for realm 'mesos-agent-readonly'
I0703 11:44:46.755385 16195 http.cpp:959] Creating default 'basic' HTTP authenticator for realm 'mesos-agent-readwrite'
I0703 11:44:46.755404 16195 http.cpp:980] Creating default 'jwt' HTTP authenticator for realm 'mesos-agent-readwrite'
I0703 11:44:46.755463 16195 disk_profile_adaptor.cpp:80] Creating default disk profile adaptor module
I0703 11:44:46.755616 16198 hierarchical.cpp:1522] Performed allocation for 0 agents in 4274ns
I0703 11:44:46.758384 16195 slave.cpp:616] Agent resources: [{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}]
I0703 11:44:46.758685 16195 slave.cpp:624] Agent attributes: [  ]
I0703 11:44:46.758695 16195 slave.cpp:633] Agent hostname: ip-172-16-10-44.ec2.internal
I0703 11:44:46.758954 16195 task_status_update_manager.cpp:181] Pausing sending task status updates
I0703 11:44:46.759140 16195 state.cpp:66] Recovering state from '/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_oKndPi/meta'
I0703 11:44:46.759189 16195 slave.cpp:6929] Finished recovering checkpointed state from '/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_oKndPi/meta', beginning agent recovery
I0703 11:44:46.759224 16195 task_status_update_manager.cpp:207] Recovering task status update manager
I0703 11:44:46.759284 16195 containerizer.cpp:718] Recovering Mesos containers
I0703 11:44:46.759349 16195 linux_launcher.cpp:299] Recovering Linux launcher
I0703 11:44:46.759835 16195 containerizer.cpp:1025] Recovering isolators
I0703 11:44:46.759996 16195 containerizer.cpp:1064] Recovering provisioner
I0703 11:44:46.760098 16195 provisioner.cpp:494] Provisioner recovery complete
I0703 11:44:46.760238 16195 composing.cpp:339] Finished recovering all containerizers
I0703 11:44:46.760278 16195 slave.cpp:7158] Recovering executors
I0703 11:44:46.760329 16195 slave.cpp:7311] Finished recovery
I0703 11:44:46.760675 16195 slave.cpp:1276] New master detected at master@172.16.10.44:33909
I0703 11:44:46.760711 16195 slave.cpp:1331] Detecting new master
I0703 11:44:46.760738 16195 task_status_update_manager.cpp:181] Pausing sending task status updates
I0703 11:44:46.760764 16195 slave.cpp:1358] Authenticating with master master@172.16.10.44:33909
I0703 11:44:46.760781 16195 slave.cpp:1367] Using default CRAM-MD5 authenticatee
I0703 11:44:46.760828 16195 authenticatee.cpp:121] Creating new client SASL connection
I0703 11:44:46.760885 16195 master.cpp:9772] Authenticating slave(361)@172.16.10.44:33909
I0703 11:44:46.760917 16195 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(740)@172.16.10.44:33909
I0703 11:44:46.760962 16195 authenticator.cpp:98] Creating new server SASL connection
I0703 11:44:46.761013 16195 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I0703 11:44:46.761025 16195 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I0703 11:44:46.761050 16195 authenticator.cpp:204] Received SASL authentication start
I0703 11:44:46.761080 16195 authenticator.cpp:326] Authentication requires more steps
I0703 11:44:46.761104 16195 authenticatee.cpp:259] Received SASL authentication step
I0703 11:44:46.761132 16195 authenticator.cpp:232] Received SASL authentication step
I0703 11:44:46.761145 16195 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: 'ip-172-16-10-44' server FQDN: 'ip-172-16-10-44' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0703 11:44:46.761152 16195 auxprop.cpp:181] Looking up auxiliary property '*userPassword'
I0703 11:44:46.761162 16195 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0703 11:44:46.761171 16195 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: 'ip-172-16-10-44' server FQDN: 'ip-172-16-10-44' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0703 11:44:46.761178 16195 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0703 11:44:46.761184 16195 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0703 11:44:46.761195 16195 authenticator.cpp:318] Authentication success
I0703 11:44:46.761224 16195 authenticatee.cpp:299] Authentication success
I0703 11:44:46.761245 16195 master.cpp:9802] Successfully authenticated principal 'test-principal' at slave(361)@172.16.10.44:33909
I0703 11:44:46.761265 16195 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(740)@172.16.10.44:33909
I0703 11:44:46.761337 16195 slave.cpp:1450] Successfully authenticated with master master@172.16.10.44:33909
I0703 11:44:46.761406 16195 slave.cpp:1895] Will retry registration in 5.406739ms if necessary
I0703 11:44:46.761972 16195 slave.cpp:1895] Will retry registration in 9.887201ms if necessary
I0703 11:44:46.762025 16195 master.cpp:6733] Received register agent message from slave(361)@172.16.10.44:33909 (ip-172-16-10-44.ec2.internal)
I0703 11:44:46.762079 16195 master.cpp:4000] Authorizing agent providing resources 'cpus:2; mem:1024; disk:1024; ports:[31000-32000]' with principal 'test-principal'
I0703 11:44:46.762190 16195 master.cpp:6804] Authorized registration of agent at slave(361)@172.16.10.44:33909 (ip-172-16-10-44.ec2.internal)
I0703 11:44:46.762217 16195 master.cpp:6919] Registering agent at slave(361)@172.16.10.44:33909 (ip-172-16-10-44.ec2.internal) with id cce3860c-7d4f-4996-b865-fc8ce8302705-S0
I0703 11:44:46.762353 16195 registrar.cpp:487] Applied 1 operations in 68027ns; attempting to update the registry
I0703 11:44:46.762461 16195 registrar.cpp:544] Successfully updated the registry in 0ns
I0703 11:44:46.762506 16195 master.cpp:6967] Admitted agent cce3860c-7d4f-4996-b865-fc8ce8302705-S0 at slave(361)@172.16.10.44:33909 (ip-172-16-10-44.ec2.internal)
I0703 11:44:46.762600 16195 master.cpp:7012] Registered agent cce3860c-7d4f-4996-b865-fc8ce8302705-S0 at slave(361)@172.16.10.44:33909 (ip-172-16-10-44.ec2.internal) with cpus:2; mem:1024; disk:1024; ports:[31000-32000]
I0703 11:44:46.762706 16195 hierarchical.cpp:576] Added agent cce3860c-7d4f-4996-b865-fc8ce8302705-S0 (ip-172-16-10-44.ec2.internal) with cpus:2; mem:1024; disk:1024; ports:[31000-32000] (allocated: {})
I0703 11:44:46.762748 16195 hierarchical.cpp:1522] Performed allocation for 1 agents in 9088ns
../../3rdparty/libprocess/include/process/gmock.hpp:621: Failure
Mock function called more times than expected - returning default value.
    Function call: filter(@0x7f3af8057610 216-byte object <88-39 AA-18 3B-7F 00-00 58-E4 07-EC 3A-7F 00-00 C0-AB 38-3D 4F-56 00-00 B0-AB 38-3D 4F-56 00-00 02-00 00-00 AC-10 0A-2C 00-00 00-00 00-00 00-00 00-00 00-00 75-84 00-00 01-00 00-00 3A-7F 00-00 ... 01-00 00-00 00-00 00-00 88-C8 AE-18 3B-7F 00-00 00-00 00-00 00-00 00-00 05-08 00-00 00-00 00-00 00-00 00-00 00-00 00-00 50-06 4D-3D 4F-56 00-00 40-06 4D-3D 4F-56 00-00 B8-9D 04-F8 3A-7F 00-00>)
          Returns: false
         Expected: to be never called
           Actual: called once - over-saturated and active
I0703 11:44:46.762857 16195 slave.cpp:1497] Registered with master master@172.16.10.44:33909; given agent ID cce3860c-7d4f-4996-b865-fc8ce8302705-S0
I0703 11:44:46.763015 16195 slave.cpp:1517] Checkpointing SlaveInfo to '/tmp/RegisterSlaveValidationTest_DropInvalidRegistration_oKndPi/meta/slaves/cce3860c-7d4f-4996-b865-fc8ce8302705-S0/slave.info'
I0703 11:44:46.763259 16195 slave.cpp:1566] Forwarding agent update {"operations":{},"resource_version_uuid":{"value":"olzd7ArqQDqYBOILRitZmw=="},"slave_id":{"value":"cce3860c-7d4f-4996-b865-fc8ce8302705-S0"},"update_oversubscribed_resources":false}
I0703 11:44:46.763340 16195 task_status_update_manager.cpp:188] Resuming sending task status updates
I0703 11:44:46.763743 16193 master.cpp:8079] Ignoring update on agent cce3860c-7d4f-4996-b865-fc8ce8302705-S0 at slave(361)@172.16.10.44:33909 (ip-172-16-10-44.ec2.internal) as it reports no changes
W0703 11:44:46.778954 16194 master.cpp:6718] Dropping registration of agent at slave(361)@172.16.10.44:33909 because it sent an invalid registration: '14a4d23d-5019-4a6b-94fc-21bf58f9530a/../dcf0a5d0-6f69-48d0-beba-cff931ded77f/../ad8200c2-7a61-47ab-b870-6c9a74b7665a' contains invalid characters
I0703 11:44:46.779024 16172 slave.cpp:931] Agent terminating
I0703 11:44:46.779287 16193 master.cpp:1330] Agent cce3860c-7d4f-4996-b865-fc8ce8302705-S0 at slave(361)@172.16.10.44:33909 (ip-172-16-10-44.ec2.internal) disconnected
I0703 11:44:46.779345 16193 master.cpp:3340] Disconnecting agent cce3860c-7d4f-4996-b865-fc8ce8302705-S0 at slave(361)@172.16.10.44:33909 (ip-172-16-10-44.ec2.internal)
I0703 11:44:46.779361 16193 master.cpp:3359] Deactivating agent cce3860c-7d4f-4996-b865-fc8ce8302705-S0 at slave(361)@172.16.10.44:33909 (ip-172-16-10-44.ec2.internal)
I0703 11:44:46.779392 16193 hierarchical.cpp:769] Agent cce3860c-7d4f-4996-b865-fc8ce8302705-S0 deactivated
I0703 11:44:46.782009 16172 master.cpp:1172] Master terminating
I0703 11:44:46.782124 16194 hierarchical.cpp:612] Removed agent cce3860c-7d4f-4996-b865-fc8ce8302705-S0
[  FAILED  ] RegisterSlaveValidationTest.DropInvalidRegistration (36 ms)
{noformat}

> RegisterSlaveValidationTest.DropInvalidRegistration is flaky
> ------------------------------------------------------------
>
>                 Key: MESOS-7441
>                 URL: https://issues.apache.org/jira/browse/MESOS-7441
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>         Environment: ASF CI Configuration autotools,clang,--verbose,GLOG_v=1 MESOS_VERBOSE=1,ubuntu:14.04,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)
>            Reporter: Yan Xu
>            Assignee: Neil Conway
>            Priority: Major
>              Labels: flaky
>             Fix For: 1.4.0
>
>         Attachments: log.txt
>
>
> {noformat:title=}
> [ RUN      ] RegisterSlaveValidationTest.DropInvalidRegistration
> ../../3rdparty/libprocess/include/process/gmock.hpp:601: Failure
> Mock function called more times than expected - returning default value.
>     Function call: filter(@0x7fd658026000 16-byte object <50-D5 AD-78 D6-7F 00-00 B0-BA 01-58 D6-7F 00-00>)
>           Returns: false
>          Expected: to be never called
>            Actual: called once - over-saturated and active
> [  FAILED  ] RegisterSlaveValidationTest.DropInvalidRegistration (112 ms)
> {noformat}
> Verbose log see [jenkins|https://builds.apache.org/job/Mesos-Buildbot/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/3583/consoleFull] or attachment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)