You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/11/20 01:25:21 UTC

[jira] [Commented] (MESOS-822) AllocatorTest/0.SchedulerFailover is flaky

    [ https://issues.apache.org/jira/browse/MESOS-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13827169#comment-13827169 ] 

Benjamin Mahler commented on MESOS-822:
---------------------------------------

https://reviews.apache.org/r/15707/

> AllocatorTest/0.SchedulerFailover is flaky
> ------------------------------------------
>
>                 Key: MESOS-822
>                 URL: https://issues.apache.org/jira/browse/MESOS-822
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yan Xu
>            Assignee: Benjamin Mahler
>             Fix For: 0.16.0
>
>
> slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
> tests/allocator_tests.cpp:993: Failure
> Mock function called more times than expected - taking default action specified at:
> ./tests/mesos.hpp:412:
>     Function call: resourcesUnused(@0x7f6f80025e58 201311191201-16777343-52448-17056-0000, @0x7f6f80025e38 201311191201-16777343-52448-17056-0, @0x7f6f80025e00 { cpus(*):2, mem(*):768, disk(*):22668, ports(*):[31000-32000] }, @0x7f6f80025df0 16-byte object <00-00 00-00 00-00 00-00 30-10 03-80 6F-7F 00-00>)
>          Expected: to be called once
>            Actual: called twice - over-saturated and active
> Full Log:
> [ RUN      ] AllocatorTest/0.SchedulerFailover
> I1119 12:01:32.106143 19009 exec.cpp:84] Committing suicide by killing the process group
> I1119 12:01:32.106276 19017 exec.cpp:84] Committing suicide by killing the process group
> I1119 12:01:32.108185 18999 exec.cpp:84] Committing suicide by killing the process group
> I1119 12:01:32.113991 17076 master.cpp:285] Master started on 127.0.0.1:52448
> I1119 12:01:32.114038 17076 master.cpp:299] Master ID: 201311191201-16777343-52448-17056
> I1119 12:01:32.114047 17076 master.cpp:302] Master only allowing authenticated frameworks to register!
> I1119 12:01:32.114109 17082 slave.cpp:112] Slave started on 127)@127.0.0.1:52448
> I1119 12:01:32.114209 17082 slave.cpp:212] Slave resources: cpus(*):3; mem(*):1024; disk(*):22668; ports(*):[31000-32000]
> I1119 12:01:32.114393 17080 sched.cpp:207] New master detected at master@127.0.0.1:52448
> I1119 12:01:32.114413 17080 sched.cpp:260] Authenticating with master master@127.0.0.1:52448
> I1119 12:01:32.114461 17080 sched.cpp:229] Detecting new master
> I1119 12:01:32.114497 17080 authenticatee.hpp:124] Creating new client SASL connection
> I1119 12:01:32.118248 17082 state.cpp:33] Recovering state from '/tmp/AllocatorTest_0_SchedulerFailover_LsrJz0/meta'
> I1119 12:01:32.118343 17082 status_update_manager.cpp:180] Recovering status update manager
> I1119 12:01:32.118407 17082 slave.cpp:2743] Finished recovery
> I1119 12:01:32.118463 17082 slave.cpp:497] New master detected at master@127.0.0.1:52448
> I1119 12:01:32.118517 17082 slave.cpp:524] Detecting new master
> I1119 12:01:32.118538 17082 status_update_manager.cpp:158] New master detected at master@127.0.0.1:52448
> I1119 12:01:32.118906 17076 master.cpp:1734] Authenticating framework at scheduler(119)@127.0.0.1:52448
> W1119 12:01:32.118986 17076 master.cpp:1235] Ignoring register slave message from localhost.localdomain since not elected yet
> I1119 12:01:32.119091 17076 master.cpp:85] No whitelist given. Advertising offers for all slaves
> I1119 12:01:32.119155 17076 authenticator.hpp:140] Creating new server SASL connection
> I1119 12:01:32.119243 17076 hierarchical_allocator_process.hpp:302] Initializing hierarchical allocator process with master : master@127.0.0.1:52448
> I1119 12:01:32.119279 17076 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5
> I1119 12:01:32.119293 17076 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5'
> I1119 12:01:32.119312 17076 master.cpp:744] The newly elected leader is master@127.0.0.1:52448
> I1119 12:01:32.119321 17076 master.cpp:748] Elected as the leading master!
> I1119 12:01:32.119343 17076 authenticator.hpp:243] Received SASL authentication start
> I1119 12:01:32.119390 17076 authenticator.hpp:325] Authentication requires more steps
> I1119 12:01:32.119417 17076 authenticatee.hpp:258] Received SASL authentication step
> I1119 12:01:32.119447 17076 authenticator.hpp:271] Received SASL authentication step
> I1119 12:01:32.119463 17076 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I1119 12:01:32.119472 17076 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
> I1119 12:01:32.119482 17076 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I1119 12:01:32.119490 17076 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I1119 12:01:32.119498 17076 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1119 12:01:32.119503 17076 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1119 12:01:32.119514 17076 authenticator.hpp:317] Authentication success
> I1119 12:01:32.119532 17076 authenticatee.hpp:298] Authentication success
> I1119 12:01:32.119547 17076 master.cpp:1774] Successfully authenticated framework at scheduler(119)@127.0.0.1:52448
> I1119 12:01:32.119604 17076 sched.cpp:334] Successfully authenticated with master master@127.0.0.1:52448
> I1119 12:01:32.119642 17076 master.cpp:798] Received registration request from scheduler(119)@127.0.0.1:52448
> I1119 12:01:32.119693 17076 master.cpp:816] Registering framework 201311191201-16777343-52448-17056-0000 at scheduler(119)@127.0.0.1:52448
> I1119 12:01:32.119742 17076 sched.cpp:373] Framework registered with 201311191201-16777343-52448-17056-0000
> I1119 12:01:32.119766 17076 sched.cpp:387] Scheduler::registered took 10828ns
> I1119 12:01:32.119802 17076 hierarchical_allocator_process.hpp:332] Added framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:32.119812 17076 hierarchical_allocator_process.hpp:726] No resources available to allocate!
> I1119 12:01:32.119818 17076 hierarchical_allocator_process.hpp:688] Performed allocation for 0 slaves in 7043ns
> 2013-11-19 12:01:33,112:17056(0x7f6f7b5fe700):ZOO_ERROR@handle_socket_error_msg@1579: Socket [127.0.0.1:50700] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> I1119 12:01:33.119645 17077 master.cpp:1266] Attempting to register slave on localhost.localdomain at slave(127)@127.0.0.1:52448
> I1119 12:01:33.119696 17077 master.cpp:2513] Adding slave 201311191201-16777343-52448-17056-0 at localhost.localdomain with cpus(*):3; mem(*):1024; disk(*):22668; ports(*):[31000-32000]
> I1119 12:01:33.119916 17077 slave.cpp:542] Registered with master master@127.0.0.1:52448; given slave ID 201311191201-16777343-52448-17056-0
> I1119 12:01:33.120085 17077 hierarchical_allocator_process.hpp:445] Added slave 201311191201-16777343-52448-17056-0 (localhost.localdomain) with cpus(*):3; mem(*):1024; disk(*):22668; ports(*):[31000-32000] (and cpus(*):3; mem(*):1024; disk(*):22668; ports(*):[31000-32000] available)
> I1119 12:01:33.120193 17077 hierarchical_allocator_process.hpp:752] Offering cpus(*):3; mem(*):1024; disk(*):22668; ports(*):[31000-32000] on slave 201311191201-16777343-52448-17056-0 to framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.120355 17077 hierarchical_allocator_process.hpp:708] Performed allocation for slave 201311191201-16777343-52448-17056-0 in 202267ns
> I1119 12:01:33.120445 17077 master.hpp:428] Adding offer 201311191201-16777343-52448-17056-0 with resources cpus(*):3; mem(*):1024; disk(*):22668; ports(*):[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
> I1119 12:01:33.120522 17077 master.cpp:1700] Sending 1 offers to framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.121275 17077 sched.cpp:480] Scheduler::resourceOffers took 627781ns
> I1119 12:01:33.121891 17078 master.cpp:2026] Processing reply for offer 201311191201-16777343-52448-17056-0 on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain) for framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.122030 17078 master.hpp:400] Adding task 0 with resources cpus(*):1; mem(*):256 on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
> I1119 12:01:33.122130 17078 master.cpp:2150] Launching task 0 of framework 201311191201-16777343-52448-17056-0000 with resources cpus(*):1; mem(*):256 on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
> I1119 12:01:33.122335 17078 master.hpp:438] Removing offer 201311191201-16777343-52448-17056-0 with resources cpus(*):3; mem(*):1024; disk(*):22668; ports(*):[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
> I1119 12:01:33.122450 17078 slave.cpp:722] Got assigned task 0 for framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.122722 17078 slave.cpp:833] Launching task 0 for framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.122972 17082 hierarchical_allocator_process.hpp:547] Framework 201311191201-16777343-52448-17056-0000 left cpus(*):2; mem(*):768; disk(*):22668; ports(*):[31000-32000] unused on slave 201311191201-16777343-52448-17056-0
> I1119 12:01:33.125586 17083 hierarchical_allocator_process.hpp:752] Offering cpus(*):2; mem(*):768; disk(*):22668; ports(*):[31000-32000] on slave 201311191201-16777343-52448-17056-0 to framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.125779 17083 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 263589ns
> I1119 12:01:33.125910 17083 master.hpp:428] Adding offer 201311191201-16777343-52448-17056-1 with resources cpus(*):2; mem(*):768; disk(*):22668; ports(*):[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
> I1119 12:01:33.125987 17083 master.cpp:1700] Sending 1 offers to framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.126130 17083 sched.cpp:480] Scheduler::resourceOffers took 41584ns
> I1119 12:01:33.126202 17083 master.cpp:2026] Processing reply for offer 201311191201-16777343-52448-17056-1 on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain) for framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.126309 17083 master.hpp:438] Removing offer 201311191201-16777343-52448-17056-1 with resources cpus(*):2; mem(*):768; disk(*):22668; ports(*):[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
> tests/allocator_tests.cpp:993: Failure
> Mock function called more times than expected - taking default action specified at:
> ./tests/mesos.hpp:412:
>     Function call: resourcesUnused(@0x7f6f80025e58 201311191201-16777343-52448-17056-0000, @0x7f6f80025e38 201311191201-16777343-52448-17056-0, @0x7f6f80025e00 { cpus(*):2, mem(*):768, disk(*):22668, ports(*):[31000-32000] }, @0x7f6f80025df0 16-byte object <00-00 00-00 00-00 00-00 30-10 03-80 6F-7F 00-00>)
>          Expected: to be called once
>            Actual: called twice - over-saturated and active
> I1119 12:01:33.126698 17083 hierarchical_allocator_process.hpp:547] Framework 201311191201-16777343-52448-17056-0000 left cpus(*):2; mem(*):768; disk(*):22668; ports(*):[31000-32000] unused on slave 201311191201-16777343-52448-17056-0
> I1119 12:01:33.126823 17083 hierarchical_allocator_process.hpp:590] Framework 201311191201-16777343-52448-17056-0000 filtered slave 201311191201-16777343-52448-17056-0 for 5secs
> I1119 12:01:33.131556 17077 exec.cpp:178] Executor started at: executor(38)@127.0.0.1:52448 with pid 17056
> I1119 12:01:33.132163 17078 slave.cpp:943] Queuing task '0' for executor default of framework '201311191201-16777343-52448-17056-0000
> I1119 12:01:33.132760 17078 slave.cpp:466] Successfully attached file '/tmp/AllocatorTest_0_SchedulerFailover_LsrJz0/slaves/201311191201-16777343-52448-17056-0/frameworks/201311191201-16777343-52448-17056-0000/executors/default/runs/9e17d16e-9bcc-4c74-a963-bd3aebe198de'
> I1119 12:01:33.133375 17078 slave.cpp:1406] Got registration for executor 'default' of framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.133811 17076 exec.cpp:202] Executor registered on slave 201311191201-16777343-52448-17056-0
> I1119 12:01:33.134820 17076 exec.cpp:214] Executor::registered took 16123ns
> I1119 12:01:33.135026 17078 slave.cpp:1527] Flushing queued task 0 for executor 'default' of framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.135469 17082 exec.cpp:289] Executor asked to run task '0'
> I1119 12:01:33.135507 17082 exec.cpp:298] Executor::launchTask took 20880ns
> I1119 12:01:33.135705 17077 sched.cpp:638] Stopping framework '201311191201-16777343-52448-17056-0000'
> I1119 12:01:33.135766 17077 master.cpp:562] Framework 201311191201-16777343-52448-17056-0000 disconnected
> I1119 12:01:33.135777 17077 master.cpp:1032] Deactivating framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.135797 17077 master.cpp:584] Giving framework 201311191201-16777343-52448-17056-0000 100ms to failover
> I1119 12:01:33.135895 17077 hierarchical_allocator_process.hpp:408] Deactivated framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.136298 17077 sched.cpp:207] New master detected at master@127.0.0.1:52448
> I1119 12:01:33.136318 17077 sched.cpp:260] Authenticating with master master@127.0.0.1:52448
> I1119 12:01:33.136359 17077 sched.cpp:229] Detecting new master
> I1119 12:01:33.136404 17077 authenticatee.hpp:124] Creating new client SASL connection
> I1119 12:01:33.136474 17077 master.cpp:1734] Authenticating framework at scheduler(120)@127.0.0.1:52448
> I1119 12:01:33.136549 17077 authenticator.hpp:140] Creating new server SASL connection
> I1119 12:01:33.136607 17077 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5
> I1119 12:01:33.136622 17077 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5'
> I1119 12:01:33.136646 17077 authenticator.hpp:243] Received SASL authentication start
> I1119 12:01:33.136682 17077 authenticator.hpp:325] Authentication requires more steps
> I1119 12:01:33.136708 17077 authenticatee.hpp:258] Received SASL authentication step
> I1119 12:01:33.136744 17077 authenticator.hpp:271] Received SASL authentication step
> I1119 12:01:33.136761 17077 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I1119 12:01:33.136770 17077 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
> I1119 12:01:33.136781 17077 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I1119 12:01:33.136790 17077 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I1119 12:01:33.136796 17077 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1119 12:01:33.136802 17077 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1119 12:01:33.136814 17077 authenticator.hpp:317] Authentication success
> I1119 12:01:33.136868 17077 authenticatee.hpp:298] Authentication success
> I1119 12:01:33.136894 17077 master.cpp:1774] Successfully authenticated framework at scheduler(120)@127.0.0.1:52448
> I1119 12:01:33.136978 17077 sched.cpp:334] Successfully authenticated with master master@127.0.0.1:52448
> I1119 12:01:33.137032 17077 master.cpp:886] Re-registering framework 201311191201-16777343-52448-17056-0000 at scheduler(120)@127.0.0.1:52448
> I1119 12:01:33.137045 17077 master.cpp:912] Framework 201311191201-16777343-52448-17056-0000 failed over
> I1119 12:01:33.137140 17077 sched.cpp:373] Framework registered with 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.137161 17077 sched.cpp:387] Scheduler::registered took 8413ns
> I1119 12:01:33.137184 17077 hierarchical_allocator_process.hpp:378] Activated framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.137224 17077 hierarchical_allocator_process.hpp:752] Offering cpus(*):2; mem(*):768; disk(*):22668; ports(*):[31000-32000] on slave 201311191201-16777343-52448-17056-0 to framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.137315 17077 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 118282ns
> I1119 12:01:33.137363 17077 master.hpp:428] Adding offer 201311191201-16777343-52448-17056-2 with resources cpus(*):2; mem(*):768; disk(*):22668; ports(*):[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
> I1119 12:01:33.137404 17077 master.cpp:1700] Sending 1 offers to framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.137485 17077 sched.cpp:480] Scheduler::resourceOffers took 12196ns
> I1119 12:01:33.137682 17056 master.cpp:554] Master terminating
> I1119 12:01:33.137706 17056 master.cpp:210] Shutting down master
> I1119 12:01:33.137748 17056 master.hpp:418] Removing task 0 with resources cpus(*):1; mem(*):256 on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
> I1119 12:01:33.137806 17056 master.hpp:438] Removing offer 201311191201-16777343-52448-17056-2 with resources cpus(*):2; mem(*):768; disk(*):22668; ports(*):[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
> I1119 12:01:33.137882 17056 master.cpp:253] Removing slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
> I1119 12:01:33.138190 17081 sched.cpp:638] Stopping framework '201311191201-16777343-52448-17056-0000'
> I1119 12:01:33.138613 17078 slave.cpp:391] Slave terminating
> I1119 12:01:33.138898 17078 slave.cpp:1122] Asked to shut down framework 201311191201-16777343-52448-17056-0000 by @0.0.0.0:0
> I1119 12:01:33.139130 17078 slave.cpp:1147] Shutting down framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.139390 17078 slave.cpp:2406] Shutting down executor 'default' of framework 201311191201-16777343-52448-17056-0000
> I1119 12:01:33.139701 17080 exec.cpp:375] Executor asked to shutdown
> I1119 12:01:33.139726 17080 exec.cpp:390] Executor::shutdown took 8678ns
> [  FAILED  ] AllocatorTest/0.SchedulerFailover, where TypeParam = mesos::internal::master::allocator::HierarchicalAllocatorProcess<mesos::internal::master::allocator::DRFSorter, mesos::internal::master::allocator::DRFSorter> (1041 ms)



--
This message was sent by Atlassian JIRA
(v6.1#6144)