You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Cody Maloney (JIRA)" <ji...@apache.org> on 2014/12/04 22:23:13 UTC

[jira] [Updated] (MESOS-1250) ReservationAllocatorTest.ResourcesReturned flaky - repeated_field.h:824 CHECK failed: (index) < (size())

     [ https://issues.apache.org/jira/browse/MESOS-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cody Maloney updated MESOS-1250:
--------------------------------
    Component/s: test

> ReservationAllocatorTest.ResourcesReturned flaky - repeated_field.h:824 CHECK failed: (index) < (size())
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-1250
>                 URL: https://issues.apache.org/jira/browse/MESOS-1250
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Yan Xu
>              Labels: flaky
>
> {noformat:title=The CHECK failure}
> [libprotobuf FATAL ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824] CHECK failed: (index) < (size()): 
> {noformat}
> {noformat:title=Full log}
> [ RUN      ] ReservationAllocatorTest.ResourcesReturned
> I0425 18:24:35.375020 23940 leveldb.cpp:174] Opened db in 80.11386ms
> I0425 18:24:35.385205 23940 leveldb.cpp:181] Compacted db in 10.16346ms
> I0425 18:24:35.385226 23940 leveldb.cpp:196] Created db iterator in 3534ns
> I0425 18:24:35.385233 23940 leveldb.cpp:202] Seeked to beginning of db in 513ns
> I0425 18:24:35.385239 23940 leveldb.cpp:271] Iterated through 0 keys in the db in 200ns
> I0425 18:24:35.385251 23940 replica.cpp:729] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I0425 18:24:35.385484 23963 recover.cpp:425] Starting replica recovery
> I0425 18:24:35.385722 23961 recover.cpp:451] Replica is in EMPTY status
> I0425 18:24:35.385973 23963 master.cpp:266] Master 20140425-182435-1032504131-35984-23940 (juno.apache.org) started on 67.195.138.61:35984
> I0425 18:24:35.385993 23963 master.cpp:304] Master only allowing authenticated frameworks to register
> I0425 18:24:35.386001 23963 credentials.hpp:35] Loading credentials for authentication
> W0425 18:24:35.386051 23963 credentials.hpp:48] Failed to stat credentials file 'file:///tmp/ReservationAllocatorTest_ResourcesReturned_K72a9s/credentials': No such file or directory
> I0425 18:24:35.386209 23966 replica.cpp:626] Replica in EMPTY status received a broadcasted recover request
> I0425 18:24:35.386451 23964 recover.cpp:188] Received a recover response from a replica in EMPTY status
> I0425 18:24:35.386603 23966 hierarchical_allocator_process.hpp:302] Initializing hierarchical allocator process with master : master@67.195.138.61:35984
> I0425 18:24:35.386672 23966 master.cpp:104] No whitelist given. Advertising offers for all slaves
> I0425 18:24:35.386672 23964 recover.cpp:542] Updating replica status to STARTING
> I0425 18:24:35.387045 23968 master.cpp:918] The newly elected leader is master@67.195.138.61:35984 with id 20140425-182435-1032504131-35984-23940
> I0425 18:24:35.387058 23968 master.cpp:928] Elected as the leading master!
> I0425 18:24:35.387064 23968 master.cpp:749] Recovering from registrar
> I0425 18:24:35.387130 23968 registrar.cpp:275] Recovering registrar
> I0425 18:24:35.416841 23964 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 30.043801ms
> I0425 18:24:35.416859 23964 replica.cpp:320] Persisted replica status to STARTING
> I0425 18:24:35.416947 23964 recover.cpp:451] Replica is in STARTING status
> I0425 18:24:35.417343 23968 replica.cpp:626] Replica in STARTING status received a broadcasted recover request
> I0425 18:24:35.417513 23968 recover.cpp:188] Received a recover response from a replica in STARTING status
> I0425 18:24:35.417755 23967 recover.cpp:542] Updating replica status to VOTING
> I0425 18:24:35.428218 23968 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 10.386348ms
> I0425 18:24:35.428267 23968 replica.cpp:320] Persisted replica status to VOTING
> I0425 18:24:35.428336 23967 recover.cpp:556] Successfully joined the Paxos group
> I0425 18:24:35.428447 23967 recover.cpp:440] Recover process terminated
> I0425 18:24:35.428591 23967 log.cpp:656] Attempting to start the writer
> I0425 18:24:35.429041 23966 replica.cpp:474] Replica received implicit promise request with proposal 1
> I0425 18:24:35.436518 23966 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 7.457582ms
> I0425 18:24:35.436539 23966 replica.cpp:342] Persisted promised to 1
> I0425 18:24:35.436861 23964 coordinator.cpp:229] Coordinator attemping to fill missing position
> I0425 18:24:35.436869 23967 hierarchical_allocator_process.hpp:726] No resources available to allocate!
> I0425 18:24:35.436878 23967 hierarchical_allocator_process.hpp:688] Performed allocation for 0 slaves in 15305ns
> I0425 18:24:35.437417 23968 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2
> I0425 18:24:35.444845 23968 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 7.408147ms
> I0425 18:24:35.444866 23968 replica.cpp:664] Persisted action at 0
> I0425 18:24:35.445366 23961 replica.cpp:508] Replica received write request for position 0
> I0425 18:24:35.445392 23961 leveldb.cpp:436] Reading position from leveldb took 12659ns
> I0425 18:24:35.453176 23961 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 7.767811ms
> I0425 18:24:35.453198 23961 replica.cpp:664] Persisted action at 0
> I0425 18:24:35.453548 23965 replica.cpp:643] Replica received learned notice for position 0
> I0425 18:24:35.461506 23965 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 7.919252ms
> I0425 18:24:35.461527 23965 replica.cpp:664] Persisted action at 0
> I0425 18:24:35.461536 23965 replica.cpp:649] Replica learned NOP action at position 0
> I0425 18:24:35.461809 23966 log.cpp:672] Writer started with ending position 0
> I0425 18:24:35.462334 23961 leveldb.cpp:436] Reading position from leveldb took 15347ns
> I0425 18:24:35.464083 23967 registrar.cpp:308] Successfully recovered registrar
> I0425 18:24:35.464109 23967 registrar.cpp:379] Attempting to update the 'registry'
> I0425 18:24:35.465710 23962 log.cpp:680] Attempting to append 137 bytes to the log
> I0425 18:24:35.465852 23963 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 1
> I0425 18:24:35.466189 23968 replica.cpp:508] Replica received write request for position 1
> I0425 18:24:35.475623 23968 leveldb.cpp:341] Persisting action (156 bytes) to leveldb took 9.417683ms
> I0425 18:24:35.475642 23968 replica.cpp:664] Persisted action at 1
> I0425 18:24:35.475893 23968 replica.cpp:643] Replica received learned notice for position 1
> I0425 18:24:35.483952 23968 leveldb.cpp:341] Persisting action (158 bytes) to leveldb took 8.039894ms
> I0425 18:24:35.483976 23968 replica.cpp:664] Persisted action at 1
> I0425 18:24:35.483985 23968 replica.cpp:649] Replica learned APPEND action at position 1
> I0425 18:24:35.484416 23965 registrar.cpp:427] Successfully updated 'registry'
> I0425 18:24:35.484467 23968 log.cpp:699] Attempting to truncate the log to 1
> I0425 18:24:35.484530 23962 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 2
> I0425 18:24:35.484622 23961 master.cpp:776] Recovered 0 slaves from the Registry (99B) ; allowing 10mins for slaves to re-register
> I0425 18:24:35.485131 23968 replica.cpp:508] Replica received write request for position 2
> I0425 18:24:35.487083 23965 hierarchical_allocator_process.hpp:726] No resources available to allocate!
> I0425 18:24:35.487120 23965 hierarchical_allocator_process.hpp:688] Performed allocation for 0 slaves in 51286ns
> I0425 18:24:35.487566 23963 slave.cpp:130] Slave started on 6)@67.195.138.61:35984
> I0425 18:24:35.487763 23963 slave.cpp:218] Slave resources: cpus(role1):1; mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200; disk(*):0; ports(*):[31000-32000]
> I0425 18:24:35.487838 23963 slave.cpp:246] Slave hostname: juno.apache.org
> I0425 18:24:35.487845 23963 slave.cpp:247] Slave checkpoint: false
> I0425 18:24:35.488287 23965 state.cpp:33] Recovering state from '/tmp/ReservationAllocatorTest_ResourcesReturned_cRsGAx/meta'
> I0425 18:24:35.488456 23963 status_update_manager.cpp:193] Recovering status update manager
> I0425 18:24:35.488628 23963 slave.cpp:2804] Finished recovery
> I0425 18:24:35.488865 23963 slave.cpp:499] New master detected at master@67.195.138.61:35984
> I0425 18:24:35.488914 23963 slave.cpp:733] Will retry registration in 7.800343128secs if necessary
> I0425 18:24:35.488931 23963 slave.cpp:524] Detecting new master
> I0425 18:24:35.488968 23963 status_update_manager.cpp:167] New master detected at master@67.195.138.61:35984
> I0425 18:24:35.489169 23963 registrar.cpp:379] Attempting to update the 'registry'
> I0425 18:24:35.494820 23968 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 9.670136ms
> I0425 18:24:35.494839 23968 replica.cpp:664] Persisted action at 2
> I0425 18:24:35.495146 23962 replica.cpp:643] Replica received learned notice for position 2
> I0425 18:24:35.503154 23962 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 7.987444ms
> I0425 18:24:35.503190 23962 leveldb.cpp:399] Deleting ~1 keys from leveldb took 16160ns
> I0425 18:24:35.503201 23962 replica.cpp:664] Persisted action at 2
> I0425 18:24:35.503209 23962 replica.cpp:649] Replica learned TRUNCATE action at position 2
> I0425 18:24:35.503527 23961 log.cpp:680] Attempting to append 438 bytes to the log
> I0425 18:24:35.503597 23968 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 3
> I0425 18:24:35.503970 23967 replica.cpp:508] Replica received write request for position 3
> I0425 18:24:35.511482 23967 leveldb.cpp:341] Persisting action (457 bytes) to leveldb took 7.473886ms
> I0425 18:24:35.511502 23967 replica.cpp:664] Persisted action at 3
> I0425 18:24:35.511749 23968 replica.cpp:643] Replica received learned notice for position 3
> I0425 18:24:35.519814 23968 leveldb.cpp:341] Persisting action (459 bytes) to leveldb took 8.043537ms
> I0425 18:24:35.519836 23968 replica.cpp:664] Persisted action at 3
> I0425 18:24:35.519845 23968 replica.cpp:649] Replica learned APPEND action at position 3
> I0425 18:24:35.520158 23965 registrar.cpp:427] Successfully updated 'registry'
> I0425 18:24:35.520236 23964 log.cpp:699] Attempting to truncate the log to 3
> I0425 18:24:35.520279 23965 master.cpp:2139] Admitted slave on juno.apache.org at slave(6)@67.195.138.61:35984
> I0425 18:24:35.520285 23964 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 4
> I0425 18:24:35.520306 23965 master.cpp:3213] Adding slave 20140425-182435-1032504131-35984-23940-0 at juno.apache.org with cpus(role1):1; mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200; disk(*):0; ports(*):[31000-32000]
> I0425 18:24:35.520434 23967 slave.cpp:542] Registered with master master@67.195.138.61:35984; given slave ID 20140425-182435-1032504131-35984-23940-0
> [libprotobuf FATAL ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824] CHECK failed: (index) < (size()): 
> I0425 18:24:35.520853 23964 replica.cpp:508] Replica received write request for position 4
> I0425 18:24:35.520884 23963 hierarchical_allocator_process.hpp:445] Added slave 20140425-182435-1032504131-35984-23940-0 (juno.apache.org) with cpus(role1):1; mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200; disk(*):0; ports(*):[31000-32000] (and cpus(role1):1; mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200; disk(*):0; ports(*):[31000-32000] available)
> I0425 18:24:35.520987 23963 hierarchical_allocator_process.hpp:708] Performed allocation for slave 20140425-182435-1032504131-35984-23940-0 in 17123ns
> ../../src/tests/allocator_tests.cpp:523: Failure
> Actual function call count doesn't match EXPECT_CALL(allocator, slaveAdded(_, _, _))...
>          Expected: to be called twice
>            Actual: called once - unsatisfied and active
> unknown file: Failure
> C++ exception with description "CHECK failed: (index) < (size()): " thrown in the test body.
> I0425 18:24:35.521409 23940 master.cpp:547] Master terminating
> I0425 18:24:35.521536 23965 slave.cpp:2012] master@67.195.138.61:35984 exited
> W0425 18:24:35.521553 23965 slave.cpp:2015] Master disconnected! Waiting for a new master to be elected
> I0425 18:24:35.528180 23964 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 7.286467ms
> I0425 18:24:35.528199 23964 replica.cpp:664] Persisted action at 4
> I0425 18:24:35.528659 23940 slave.cpp:388] Slave terminating
> [  FAILED  ] ReservationAllocatorTest.ResourcesReturned (235 ms)
> {noformat}
> {noformat:title=This is only the result of the CHECK failure and irrelevant}
> Actual function call count doesn't match EXPECT_CALL(allocator, slaveAdded(_, _, _))...
>          Expected: to be called twice
>            Actual: called once - unsatisfied and active
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)