You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2017/06/29 17:27:01 UTC

[jira] [Commented] (MESOS-7029) FaultToleranceTest.FrameworkReregister is flaky

    [ https://issues.apache.org/jira/browse/MESOS-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068661#comment-16068661 ] 

Vinod Kone commented on MESOS-7029:
-----------------------------------

I'm still seeing a similar issue on ASF CI.

{code}
I0629 00:24:10.025084 24916 sched.cpp:801] Framework re-registered with 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.025130 24916 sched.cpp:815] Scheduler::reregistered took 36991ns
I0629 00:24:10.025146 24936 hierarchical.cpp:1064] Recovered cpus(*)(allocated: *):2; mem(*)(allocated: *):1024; disk(*)(allocated: *):1024; ports(*)(allocated: *):[31000-32000] (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: {}) on agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0 from framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.027101 24927 hierarchical.cpp:1900] No inverse offers to send out!
I0629 00:24:10.027130 24927 hierarchical.cpp:1394] Performed allocation for 1 agents in 356873ns
I0629 00:24:10.027333 24927 master.cpp:7028] Sending 1 offers to framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000 (default) at scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.027504 24927 sched.cpp:933] Scheduler::resourceOffers took 30524ns
I0629 00:24:10.028666 24929 process.cpp:3704] Handling HTTP event for process 'master' with path: '/master/state'
I0629 00:24:10.029079 24936 http.cpp:420] HTTP GET for /master/state from 172.17.0.5:48878
/mesos/src/tests/fault_tolerance_tests.cpp:903: Failure
The difference between registerTime.secs() and framework.values["registered_time"].as<JSON::Number>().as<double>() is 2.0099987983703613, which exceeds 1, where
registerTime.secs() evaluates to 1498695886.0112588,
framework.values["registered_time"].as<JSON::Number>().as<double>() evaluates to 1498695884.00126, and
1 evaluates to 1.
I0629 00:24:10.032187 24915 sched.cpp:2021] Asked to stop the driver
{code}

Full log
{code}
[ RUN      ] FaultToleranceTest.FrameworkReregister
I0629 00:24:09.985801 24915 cluster.cpp:160] Creating default 'local' authorizer
I0629 00:24:09.989289 24936 master.cpp:383] Master 49802357-5b7c-4962-bd31-0d1a59d8c3df (01f4a6916e47) started on 172.17.0.5:33898
I0629 00:24:09.989342 24936 master.cpp:385] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" -
-allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --au
thenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/J4Bnep/credentials" --framework_sorter="drf" --help="false" --hostn
ame_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" 
--logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="10
00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registr
y_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" -
-version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/J4Bnep/master" --zk_session_timeout="10secs"
I0629 00:24:09.989701 24936 master.cpp:435] Master only allowing authenticated frameworks to register
I0629 00:24:09.989707 24936 master.cpp:449] Master only allowing authenticated agents to register
I0629 00:24:09.989715 24936 master.cpp:462] Master only allowing authenticated HTTP frameworks to register
I0629 00:24:09.989722 24936 credentials.hpp:37] Loading credentials for authentication from '/tmp/J4Bnep/credentials'
I0629 00:24:09.989936 24936 master.cpp:507] Using default 'crammd5' authenticator
I0629 00:24:09.989995 24936 http.cpp:919] Using default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
I0629 00:24:09.990038 24936 http.cpp:919] Using default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
I0629 00:24:09.990063 24936 http.cpp:919] Using default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'
I0629 00:24:09.990085 24936 master.cpp:587] Authorization enabled
I0629 00:24:09.991214 24925 hierarchical.cpp:161] Initialized hierarchical allocator process
I0629 00:24:09.991381 24935 whitelist_watcher.cpp:77] No whitelist given
I0629 00:24:09.991864 24936 master.cpp:2124] Elected as the leading master!
I0629 00:24:09.991879 24936 master.cpp:1646] Recovering from registrar
I0629 00:24:09.991940 24936 registrar.cpp:329] Recovering registrar
I0629 00:24:09.992192 24936 registrar.cpp:362] Successfully fetched the registry (0B) in 235264ns
I0629 00:24:09.992233 24936 registrar.cpp:461] Applied 1 operations in 10711ns; attempting to update the registry
I0629 00:24:09.992527 24936 registrar.cpp:506] Successfully updated the registry in 273152ns
I0629 00:24:09.992568 24936 registrar.cpp:392] Successfully recovered registrar
I0629 00:24:09.992689 24936 master.cpp:1762] Recovered 0 agents from the registry (129B); allowing 10mins for agents to re-register
I0629 00:24:09.992751 24936 hierarchical.cpp:188] Skipping recovery of hierarchical allocator: nothing to recover
I0629 00:24:09.997200 24915 containerizer.cpp:221] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
W0629 00:24:09.997958 24915 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges
W0629 00:24:09.998103 24915 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges
I0629 00:24:09.998183 24915 provisioner.cpp:249] Using default backend 'copy'
I0629 00:24:09.999104 24915 cluster.cpp:446] Creating default 'local' authorizer
I0629 00:24:10.000584 24934 slave.cpp:217] Mesos agent started on (27)@172.17.0.5:33898
I0629 00:24:10.000687 24934 slave.cpp:218] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authentica
te_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/FaultToleranceTest_FrameworkReregister_LNZMnu/credential" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/FaultToleranceTest_FrameworkReregister_LNZMnu/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_credentials="/tmp/FaultToleranceTest_FrameworkReregister_LNZMnu/http_credentials" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/mesos/build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/FaultToleranceTest_FrameworkReregister_LNZMnu" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/FaultToleranceTest_FrameworkReregister_cfr4Nr"
I0629 00:24:10.001286 24934 credentials.hpp:86] Loading credential for authentication from '/tmp/FaultToleranceTest_FrameworkReregister_LNZMnu/credential'
I0629 00:24:10.001468 24915 sched.cpp:232] Version: 1.2.2
I0629 00:24:10.001561 24934 slave.cpp:360] Agent using credential for: test-principal
I0629 00:24:10.001646 24934 credentials.hpp:37] Loading credentials for authentication from '/tmp/FaultToleranceTest_FrameworkReregister_LNZMnu/http_credentials'
I0629 00:24:10.001896 24934 http.cpp:919] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readonly'
I0629 00:24:10.002135 24934 http.cpp:919] Using default 'basic' HTTP authenticator for realm 'mesos-agent-readwrite'
I0629 00:24:10.002354 24931 sched.cpp:336] New master detected at master@172.17.0.5:33898
I0629 00:24:10.002557 24931 sched.cpp:407] Authenticating with master master@172.17.0.5:33898
I0629 00:24:10.002665 24931 sched.cpp:414] Using default CRAM-MD5 authenticatee
I0629 00:24:10.002954 24936 authenticatee.cpp:121] Creating new client SASL connection
I0629 00:24:10.003187 24923 master.cpp:7199] Authenticating scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.002619 24934 slave.cpp:547] Agent resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0629 00:24:10.003448 24935 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(62)@172.17.0.5:33898
I0629 00:24:10.003830 24935 authenticator.cpp:98] Creating new server SASL connection
I0629 00:24:10.004150 24935 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I0629 00:24:10.004333 24935 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I0629 00:24:10.004545 24935 authenticator.cpp:204] Received SASL authentication start
I0629 00:24:10.004732 24935 authenticator.cpp:326] Authentication requires more steps
I0629 00:24:10.004842 24935 authenticatee.cpp:259] Received SASL authentication step
I0629 00:24:10.004956 24935 authenticator.cpp:232] Received SASL authentication step
I0629 00:24:10.005033 24935 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '01f4a6916e47' server FQDN: '01f4a6916e47' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0629 00:24:10.005103 24935 auxprop.cpp:181] Looking up auxiliary property '*userPassword'
I0629 00:24:10.005190 24935 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0629 00:24:10.005259 24935 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '01f4a6916e47' server FQDN: '01f4a6916e47' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0629 00:24:10.005328 24935 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0629 00:24:10.005385 24935 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0629 00:24:10.005452 24935 authenticator.cpp:318] Authentication success
I0629 00:24:10.005580 24935 authenticatee.cpp:299] Authentication success
I0629 00:24:10.005684 24935 master.cpp:7229] Successfully authenticated principal 'test-principal' at scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.005781 24935 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(62)@172.17.0.5:33898
I0629 00:24:10.005954 24935 sched.cpp:513] Successfully authenticated with master master@172.17.0.5:33898
I0629 00:24:10.006111 24935 sched.cpp:836] Sending SUBSCRIBE call to master@172.17.0.5:33898
I0629 00:24:10.006325 24935 sched.cpp:869] Will retry registration in 397.596758ms if necessary
I0629 00:24:10.006645 24935 master.cpp:2737] Received SUBSCRIBE call for framework 'default' at scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.006801 24935 master.cpp:2160] Authorizing framework principal 'test-principal' to receive offers for role '*'
I0629 00:24:10.007100 24935 master.cpp:2813] Subscribing framework default with checkpointing disabled and capabilities [  ]
I0629 00:24:10.007555 24935 sched.cpp:759] Framework registered with 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.007808 24935 sched.cpp:773] Scheduler::registered took 25279ns
I0629 00:24:10.007730 24920 hierarchical.cpp:286] Added framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.008051 24920 hierarchical.cpp:1810] No allocations performed
I0629 00:24:10.008116 24920 hierarchical.cpp:1900] No inverse offers to send out!
I0629 00:24:10.008183 24920 hierarchical.cpp:1394] Performed allocation for 0 agents in 148142ns
I0629 00:24:10.009155 24934 slave.cpp:555] Agent attributes: [  ]
I0629 00:24:10.009186 24934 slave.cpp:560] Agent hostname: 01f4a6916e47
I0629 00:24:10.009454 24919 hierarchical.cpp:1810] No allocations performed
I0629 00:24:10.009469 24919 hierarchical.cpp:1900] No inverse offers to send out!
I0629 00:24:10.009480 24919 hierarchical.cpp:1394] Performed allocation for 0 agents in 54001ns
I0629 00:24:10.009791 24925 state.cpp:62] Recovering state from '/tmp/FaultToleranceTest_FrameworkReregister_cfr4Nr/meta'
I0629 00:24:10.010112 24937 status_update_manager.cpp:177] Pausing sending status updates
I0629 00:24:10.010145 24937 status_update_manager.cpp:203] Recovering status update manager
I0629 00:24:10.010234 24937 containerizer.cpp:600] Recovering containerizer
I0629 00:24:10.011039 24925 provisioner.cpp:410] Provisioner recovery complete
I0629 00:24:10.011214 24925 slave.cpp:5662] Finished recovery
I0629 00:24:10.011732 24925 slave.cpp:5836] Querying resource estimator for oversubscribable resources
I0629 00:24:10.011916 24925 slave.cpp:937] New master detected at master@172.17.0.5:33898
I0629 00:24:10.011967 24925 slave.cpp:972] Detecting new master
I0629 00:24:10.012023 24925 slave.cpp:5850] Received oversubscribable resources {} from the resource estimator
I0629 00:24:10.012069 24925 status_update_manager.cpp:177] Pausing sending status updates
I0629 00:24:10.013212 24932 slave.cpp:999] Authenticating with master master@172.17.0.5:33898
I0629 00:24:10.013244 24932 slave.cpp:1010] Using default CRAM-MD5 authenticatee
I0629 00:24:10.013346 24932 authenticatee.cpp:121] Creating new client SASL connection
I0629 00:24:10.013541 24932 master.cpp:7199] Authenticating slave(27)@172.17.0.5:33898
I0629 00:24:10.013609 24932 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(63)@172.17.0.5:33898
I0629 00:24:10.013710 24932 authenticator.cpp:98] Creating new server SASL connection
I0629 00:24:10.013864 24932 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I0629 00:24:10.013897 24932 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I0629 00:24:10.013945 24932 authenticator.cpp:204] Received SASL authentication start
I0629 00:24:10.014001 24932 authenticator.cpp:326] Authentication requires more steps
I0629 00:24:10.014053 24932 authenticatee.cpp:259] Received SASL authentication step
I0629 00:24:10.014117 24932 authenticator.cpp:232] Received SASL authentication step
I0629 00:24:10.014143 24932 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '01f4a6916e47' server FQDN: '01f4a6916e47' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0629 00:24:10.014161 24932 auxprop.cpp:181] Looking up auxiliary property '*userPassword'
I0629 00:24:10.014183 24932 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0629 00:24:10.014199 24932 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '01f4a6916e47' server FQDN: '01f4a6916e47' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0629 00:24:10.014211 24932 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0629 00:24:10.014219 24932 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0629 00:24:10.014235 24932 authenticator.cpp:318] Authentication success
I0629 00:24:10.014295 24932 authenticatee.cpp:299] Authentication success
I0629 00:24:10.014338 24932 master.cpp:7229] Successfully authenticated principal 'test-principal' at slave(27)@172.17.0.5:33898
I0629 00:24:10.014380 24932 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(63)@172.17.0.5:33898
I0629 00:24:10.014487 24932 slave.cpp:1094] Successfully authenticated with master master@172.17.0.5:33898
I0629 00:24:10.014565 24932 slave.cpp:1521] Will retry registration in 1.400246ms if necessary
I0629 00:24:10.014727 24932 master.cpp:5396] Registering agent at slave(27)@172.17.0.5:33898 (01f4a6916e47) with id 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0
I0629 00:24:10.014909 24932 registrar.cpp:461] Applied 1 operations in 30392ns; attempting to update the registry
I0629 00:24:10.015398 24932 registrar.cpp:506] Successfully updated the registry in 0ns
I0629 00:24:10.015694 24932 master.cpp:5470] Registered agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0 at slave(27)@172.17.0.5:33898 (01f4a6916e47) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0629 00:24:10.015925 24924 hierarchical.cpp:516] Added agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0 (01f4a6916e47) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (allocated: {})
I0629 00:24:10.016309 24924 hierarchical.cpp:1900] No inverse offers to send out!
I0629 00:24:10.016330 24924 hierarchical.cpp:1394] Performed allocation for 1 agents in 340974ns
I0629 00:24:10.016409 24924 slave.cpp:1140] Registered with master master@172.17.0.5:33898; given agent ID 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0
I0629 00:24:10.016454 24924 fetcher.cpp:94] Clearing fetcher cache
I0629 00:24:10.016870 24924 slave.cpp:1168] Checkpointing SlaveInfo to '/tmp/FaultToleranceTest_FrameworkReregister_cfr4Nr/meta/slaves/49802357-5b7c-4962-bd31-0d1a59d8c3df-S0/slave.info'
I0629 00:24:10.017231 24924 slave.cpp:1206] Forwarding total oversubscribed resources {}
I0629 00:24:10.017312 24924 slave.cpp:4395] Received ping from slave-observer(18)@172.17.0.5:33898
I0629 00:24:10.017395 24939 status_update_manager.cpp:184] Resuming sending status updates
I0629 00:24:10.017740 24929 master.cpp:7028] Sending 1 offers to framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000 (default) at scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.017935 24921 sched.cpp:933] Scheduler::resourceOffers took 23618ns
I0629 00:24:10.017997 24929 master.cpp:6045] Received update of agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0 at slave(27)@172.17.0.5:33898 (01f4a6916e47) with total oversubscribed resources {}
I0629 00:24:10.018106 24938 hierarchical.cpp:589] Agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0 (01f4a6916e47) updated with oversubscribed resources {} (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: cpus(*)(allocated: *):2; mem(*)(allocated: *):1024; disk(*)(allocated: *):1024; ports(*)(allocated: *):[31000-32000])
I0629 00:24:10.018182 24938 hierarchical.cpp:1810] No allocations performed
I0629 00:24:10.018196 24938 hierarchical.cpp:1900] No inverse offers to send out!
I0629 00:24:10.018205 24938 hierarchical.cpp:1394] Performed allocation for 1 agents in 46025ns
I0629 00:24:10.018705 24919 sched.cpp:330] Scheduler::disconnected took 25767ns
I0629 00:24:10.018749 24919 sched.cpp:336] New master detected at master@172.17.0.5:33898
I0629 00:24:10.018810 24919 sched.cpp:407] Authenticating with master master@172.17.0.5:33898
I0629 00:24:10.018851 24919 sched.cpp:414] Using default CRAM-MD5 authenticatee
I0629 00:24:10.018998 24919 authenticatee.cpp:121] Creating new client SASL connection
I0629 00:24:10.019215 24919 master.cpp:7199] Authenticating scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.019309 24919 authenticator.cpp:414] Starting authentication session for crammd5-authenticatee(64)@172.17.0.5:33898
I0629 00:24:10.019429 24919 authenticator.cpp:98] Creating new server SASL connection
I0629 00:24:10.019601 24919 authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
I0629 00:24:10.019795 24919 authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
I0629 00:24:10.019879 24919 authenticator.cpp:204] Received SASL authentication start
I0629 00:24:10.019969 24919 authenticator.cpp:326] Authentication requires more steps
I0629 00:24:10.019745 24924 hierarchical.cpp:1810] No allocations performed
I0629 00:24:10.020064 24924 hierarchical.cpp:1900] No inverse offers to send out!
I0629 00:24:10.020066 24919 authenticatee.cpp:259] Received SASL authentication step
I0629 00:24:10.020139 24924 hierarchical.cpp:1394] Performed allocation for 1 agents in 424794ns
I0629 00:24:10.021183 24919 authenticator.cpp:232] Received SASL authentication step
I0629 00:24:10.021647 24919 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '01f4a6916e47' server FQDN: '01f4a6916e47' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0629 00:24:10.022127 24919 auxprop.cpp:181] Looking up auxiliary property '*userPassword'
I0629 00:24:10.022438 24919 auxprop.cpp:181] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0629 00:24:10.022677 24919 auxprop.cpp:109] Request to lookup properties for user: 'test-principal' realm: '01f4a6916e47' server FQDN: '01f4a6916e47' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0629 00:24:10.023262 24919 auxprop.cpp:131] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0629 00:24:10.023371 24919 auxprop.cpp:131] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0629 00:24:10.023515 24919 authenticator.cpp:318] Authentication success
I0629 00:24:10.023777 24925 authenticatee.cpp:299] Authentication success
I0629 00:24:10.024005 24925 sched.cpp:513] Successfully authenticated with master master@172.17.0.5:33898
I0629 00:24:10.023813 24919 authenticator.cpp:432] Authentication session cleanup for crammd5-authenticatee(64)@172.17.0.5:33898
I0629 00:24:10.023804 24926 master.cpp:7229] Successfully authenticated principal 'test-principal' at scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.024122 24925 sched.cpp:836] Sending SUBSCRIBE call to master@172.17.0.5:33898
I0629 00:24:10.024375 24925 sched.cpp:869] Will retry registration in 935.189636ms if necessary
I0629 00:24:10.024452 24935 master.cpp:2737] Received SUBSCRIBE call for framework 'default' at scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.024472 24935 master.cpp:2160] Authorizing framework principal 'test-principal' to receive offers for role '*'
I0629 00:24:10.024678 24916 master.cpp:2813] Subscribing framework default with checkpointing disabled and capabilities [  ]
I0629 00:24:10.024696 24916 master.cpp:2891] Updating info for framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.024729 24916 master.cpp:2919] Allowing framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000 (default) at scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898 to subscribe with an already used id
I0629 00:24:10.025033 24916 sched.cpp:945] Ignoring rescind offer message because the driver is disconnected!
I0629 00:24:10.025084 24916 sched.cpp:801] Framework re-registered with 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.025130 24916 sched.cpp:815] Scheduler::reregistered took 36991ns
I0629 00:24:10.025146 24936 hierarchical.cpp:1064] Recovered cpus(*)(allocated: *):2; mem(*)(allocated: *):1024; disk(*)(allocated: *):1024; ports(*)(allocated: *):[31000-32000] (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: {}) on agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0 from framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.027101 24927 hierarchical.cpp:1900] No inverse offers to send out!
I0629 00:24:10.027130 24927 hierarchical.cpp:1394] Performed allocation for 1 agents in 356873ns
I0629 00:24:10.027333 24927 master.cpp:7028] Sending 1 offers to framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000 (default) at scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.027504 24927 sched.cpp:933] Scheduler::resourceOffers took 30524ns
I0629 00:24:10.028666 24929 process.cpp:3704] Handling HTTP event for process 'master' with path: '/master/state'
I0629 00:24:10.029079 24936 http.cpp:420] HTTP GET for /master/state from 172.17.0.5:48878
/mesos/src/tests/fault_tolerance_tests.cpp:903: Failure
The difference between registerTime.secs() and framework.values["registered_time"].as<JSON::Number>().as<double>() is 2.0099987983703613, which exceeds 1, where
registerTime.secs() evaluates to 1498695886.0112588,
framework.values["registered_time"].as<JSON::Number>().as<double>() evaluates to 1498695884.00126, and
1 evaluates to 1.
I0629 00:24:10.032187 24915 sched.cpp:2021] Asked to stop the driver
I0629 00:24:10.032758 24938 sched.cpp:1203] Stopping framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.032888 24938 master.cpp:7741] Processing TEARDOWN call for framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000 (default) at scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.032910 24938 master.cpp:7753] Removing framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000 (default) at scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.032920 24938 master.cpp:3081] Deactivating framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000 (default) at scheduler-f09657cc-5f00-4a0c-b5f3-e6b5c68ec42f@172.17.0.5:33898
I0629 00:24:10.033211 24938 hierarchical.cpp:415] Deactivated framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.033418 24938 hierarchical.cpp:1064] Recovered cpus(*)(allocated: *):2; mem(*)(allocated: *):1024; disk(*)(allocated: *):1024; ports(*)(allocated: *):[31000-32000] (total: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000], allocated: {}) on agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0 from framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.033509 24938 hierarchical.cpp:362] Removed framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.033540 24938 slave.cpp:2664] Asked to shut down framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000 by master@172.17.0.5:33898
I0629 00:24:10.033555 24938 slave.cpp:2679] Cannot shut down unknown framework 49802357-5b7c-4962-bd31-0d1a59d8c3df-0000
I0629 00:24:10.034260 24922 slave.cpp:809] Agent terminating
I0629 00:24:10.034441 24916 master.cpp:1264] Agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0 at slave(27)@172.17.0.5:33898 (01f4a6916e47) disconnected
I0629 00:24:10.034461 24916 master.cpp:3118] Disconnecting agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0 at slave(27)@172.17.0.5:33898 (01f4a6916e47)
I0629 00:24:10.034574 24916 master.cpp:3137] Deactivating agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0 at slave(27)@172.17.0.5:33898 (01f4a6916e47)
I0629 00:24:10.034729 24925 hierarchical.cpp:618] Agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0 deactivated
I0629 00:24:10.036890 24921 master.cpp:1103] Master terminating
I0629 00:24:10.037221 24933 hierarchical.cpp:549] Removed agent 49802357-5b7c-4962-bd31-0d1a59d8c3df-S0
[  FAILED  ] FaultToleranceTest.FrameworkReregister (56 ms)
{code}

> FaultToleranceTest.FrameworkReregister is flaky
> -----------------------------------------------
>
>                 Key: MESOS-7029
>                 URL: https://issues.apache.org/jira/browse/MESOS-7029
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>         Environment: ASF CI, cmake, gcc, Ubuntu 14.04, libevent/SSL enabled
>            Reporter: Greg Mann
>            Assignee: Jay Guo
>              Labels: flaky, flaky-test
>             Fix For: 1.3.0
>
>         Attachments: FaultToleranceTest.FrameworkReregister.txt
>
>
> This was observed on ASF CI:
> {code}
> /mesos/src/tests/fault_tolerance_tests.cpp:903: Failure
> The difference between registerTime.secs() and framework.values["registered_time"].as<JSON::Number>().as<double>() is 1.0100052356719971, which exceeds 1, where
> registerTime.secs() evaluates to 1485732879.7673652,
> framework.values["registered_time"].as<JSON::Number>().as<double>() evaluates to 1485732878.75736, and
> 1 evaluates to 1.
> {code}
> Find the full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)