You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2015/01/16 02:26:35 UTC

[jira] [Commented] (MESOS-2231) ExampleTest.LowLevelSchedulerLibprocess is flaky.

    [ https://issues.apache.org/jira/browse/MESOS-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279652#comment-14279652 ] 

Benjamin Mahler commented on MESOS-2231:
----------------------------------------

FYI [~karya] [~tillt] haven't seen this issue before the recent changes, can you take a look?

> ExampleTest.LowLevelSchedulerLibprocess is flaky.
> -------------------------------------------------
>
>                 Key: MESOS-2231
>                 URL: https://issues.apache.org/jira/browse/MESOS-2231
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Benjamin Mahler
>              Labels: flaky, flaky-test
>
> Different issue from MESOS-1785, looks like "double free or corruption", haven't seen this before:
> {noformat}
> [ RUN      ] ExamplesTest.LowLevelSchedulerLibprocess
> Using temporary directory '/tmp/ExamplesTest_LowLevelSchedulerLibprocess_naU4BK'
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0115 19:21:55.344287 10350 process.cpp:958] libprocess is initialized on 67.195.81.189:45829 for 16 cpus
> Enabling authentication for the scheduler
> I0115 19:21:55.345615 10350 logging.cpp:177] Logging to STDERR
> I0115 19:21:55.345639 10350 scheduler.cpp:146] Version: 0.22.0
> I0115 19:21:55.520369 10350 leveldb.cpp:176] Opened db in 166.020335ms
> I0115 19:21:55.586822 10350 leveldb.cpp:183] Compacted db in 66.394842ms
> I0115 19:21:55.586971 10350 leveldb.cpp:198] Created db iterator in 69517ns
> I0115 19:21:55.586993 10350 leveldb.cpp:204] Seeked to beginning of db in 4008ns
> I0115 19:21:55.587000 10350 leveldb.cpp:273] Iterated through 0 keys in the db in 302ns
> I0115 19:21:55.587183 10350 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 19:21:55.589311 10388 recover.cpp:449] Starting replica recovery
> I0115 19:21:55.590361 10388 recover.cpp:475] Replica is in EMPTY status
> I0115 19:21:55.594468 10376 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request
> I0115 19:21:55.595654 10382 recover.cpp:195] Received a recover response from a replica in EMPTY status
> I0115 19:21:55.596400 10378 recover.cpp:566] Updating replica status to STARTING
> I0115 19:21:55.605976 10383 master.cpp:262] Master 20150115-192155-3176252227-45829-10350 (proserpina.apache.org) started on 67.195.81.189:45829
> I0115 19:21:55.606104 10383 master.cpp:308] Master only allowing authenticated frameworks to register
> I0115 19:21:55.606127 10383 master.cpp:315] Master allowing unauthenticated slaves to register
> I0115 19:21:55.606184 10383 credentials.hpp:36] Loading credentials for authentication from '/tmp/ExamplesTest_LowLevelSchedulerLibprocess_naU4BK/credentials'
> W0115 19:21:55.606258 10383 credentials.hpp:51] Permissions on credentials file '/tmp/ExamplesTest_LowLevelSchedulerLibprocess_naU4BK/credentials' are too open. It is recommended that your credentials file is NOT accessible by others.
> I0115 19:21:55.606578 10383 master.cpp:357] Authorization enabled
> I0115 19:21:55.607563 10377 whitelist_watcher.cpp:65] No whitelist given
> I0115 19:21:55.607799 10374 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process
> I0115 19:21:55.608512 10350 containerizer.cpp:102] Using isolation: posix/cpu,posix/mem
> I0115 19:21:55.614555 10385 master.cpp:1219] The newly elected leader is master@67.195.81.189:45829 with id 20150115-192155-3176252227-45829-10350
> I0115 19:21:55.614589 10385 master.cpp:1232] Elected as the leading master!
> I0115 19:21:55.614630 10385 master.cpp:1050] Recovering from registrar
> I0115 19:21:55.614897 10379 registrar.cpp:313] Recovering registrar
> I0115 19:21:55.616330 10380 slave.cpp:173] Slave started on 1)@67.195.81.189:45829
> I0115 19:21:55.617673 10350 containerizer.cpp:102] Using isolation: posix/cpu,posix/mem
> I0115 19:21:55.624580 10384 slave.cpp:173] Slave started on 2)@67.195.81.189:45829
> I0115 19:21:55.625908 10350 containerizer.cpp:102] Using isolation: posix/cpu,posix/mem
> I0115 19:21:55.631642 10379 slave.cpp:173] Slave started on 3)@67.195.81.189:45829
> I0115 19:21:55.634032 10388 scheduler.cpp:417] New master detected at master@67.195.81.189:45829
> I0115 19:21:55.634084 10388 scheduler.cpp:466] Authenticating with master master@67.195.81.189:45829
> I0115 19:21:55.634521 10382 authenticatee.hpp:114] Initializing client SASL
> I0115 19:21:55.635840 10382 authenticatee.hpp:138] Creating new client SASL connection
> I0115 19:21:55.636001 10385 master.cpp:881] Dropping 'mesos.internal.AuthenticateMessage' message since not recovered yet
> I0115 19:21:55.637289 10380 slave.cpp:300] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000]
> I0115 19:21:55.637296 10384 slave.cpp:300] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000]
> I0115 19:21:55.637296 10379 slave.cpp:300] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000]
> I0115 19:21:55.637423 10380 slave.cpp:329] Slave hostname: proserpina.apache.org
> I0115 19:21:55.637447 10380 slave.cpp:330] Slave checkpoint: true
> I0115 19:21:55.637480 10384 slave.cpp:329] Slave hostname: proserpina.apache.org
> I0115 19:21:55.637521 10379 slave.cpp:329] Slave hostname: proserpina.apache.org
> I0115 19:21:55.637547 10384 slave.cpp:330] Slave checkpoint: true
> I0115 19:21:55.637593 10379 slave.cpp:330] Slave checkpoint: true
> *** Error in `/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/build/src/.libs': double free or corruption (fasttop): 0x00002af7a4002da0 ***
> ../../src/tests/script.cpp:83: Failure
> Failed
> low_level_scheduler_libprocess_test.sh terminated with signal Aborted
> [  FAILED  ] ExamplesTest.LowLevelSchedulerLibprocess (1689 ms)
> {noformat}
> I suspect the same issue is occurring in the NoExecutorFrameworkTest:
> {noformat}
> [ RUN      ] ExamplesTest.NoExecutorFramework
> I0115 19:22:10.790614 10761 exec.cpp:455] Ignoring exited event because the driver is aborted!
> Using temporary directory '/tmp/ExamplesTest_NoExecutorFramework_2EaghC'
> Enabling authentication for the framework
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0115 19:22:12.124490 10777 process.cpp:958] libprocess is initialized on 67.195.81.189:38112 for 16 cpus
> I0115 19:22:12.124758 10777 logging.cpp:177] Logging to STDERR
> I0115 19:22:12.351037 10777 leveldb.cpp:176] Opened db in 214.659787ms
> I0115 19:22:12.400264 10777 leveldb.cpp:183] Compacted db in 49.127641ms
> I0115 19:22:12.400393 10777 leveldb.cpp:198] Created db iterator in 72727ns
> I0115 19:22:12.400434 10777 leveldb.cpp:204] Seeked to beginning of db in 5552ns
> I0115 19:22:12.400444 10777 leveldb.cpp:273] Iterated through 0 keys in the db in 425ns
> I0115 19:22:12.400667 10777 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 19:22:12.403313 10812 recover.cpp:449] Starting replica recovery
> I0115 19:22:12.404208 10812 recover.cpp:475] Replica is in EMPTY status
> I0115 19:22:12.406448 10809 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request
> I0115 19:22:12.407618 10808 recover.cpp:195] Received a recover response from a replica in EMPTY status
> I0115 19:22:12.408342 10816 recover.cpp:566] Updating replica status to STARTING
> I0115 19:22:12.416853 10808 master.cpp:262] Master 20150115-192212-3176252227-38112-10777 (proserpina.apache.org) started on 67.195.81.189:38112
> I0115 19:22:12.416944 10808 master.cpp:308] Master only allowing authenticated frameworks to register
> I0115 19:22:12.416957 10808 master.cpp:315] Master allowing unauthenticated slaves to register
> I0115 19:22:12.417012 10808 credentials.hpp:36] Loading credentials for authentication from '/tmp/ExamplesTest_NoExecutorFramework_2EaghC/credentials'
> W0115 19:22:12.417135 10808 credentials.hpp:51] Permissions on credentials file '/tmp/ExamplesTest_NoExecutorFramework_2EaghC/credentials' are too open. It is recommended that your credentials file is NOT accessible by others.
> I0115 19:22:12.417436 10808 master.cpp:357] Authorization enabled
> I0115 19:22:12.418402 10810 whitelist_watcher.cpp:65] No whitelist given
> I0115 19:22:12.418709 10802 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process
> I0115 19:22:12.419842 10777 containerizer.cpp:102] Using isolation: posix/cpu,posix/mem
> I0115 19:22:12.424952 10815 master.cpp:1219] The newly elected leader is master@67.195.81.189:38112 with id 20150115-192212-3176252227-38112-10777
> I0115 19:22:12.424984 10815 master.cpp:1232] Elected as the leading master!
> I0115 19:22:12.425017 10815 master.cpp:1050] Recovering from registrar
> I0115 19:22:12.425333 10805 registrar.cpp:313] Recovering registrar
> I0115 19:22:12.426198 10810 slave.cpp:173] Slave started on 1)@67.195.81.189:38112
> I0115 19:22:12.426682 10810 slave.cpp:300] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000]
> I0115 19:22:12.426796 10810 slave.cpp:329] Slave hostname: proserpina.apache.org
> I0115 19:22:12.426812 10810 slave.cpp:330] Slave checkpoint: true
> I0115 19:22:12.428252 10777 containerizer.cpp:102] Using isolation: posix/cpu,posix/mem
> I0115 19:22:12.428481 10814 state.cpp:33] Recovering state from '/tmp/user/2395/mesos-NIH0bU/0/meta'
> I0115 19:22:12.428874 10809 status_update_manager.cpp:197] Recovering status update manager
> I0115 19:22:12.429150 10808 containerizer.cpp:298] Recovering containerizer
> I0115 19:22:12.430721 10810 slave.cpp:3519] Finished recovery
> I0115 19:22:12.435571 10802 slave.cpp:173] Slave started on 2)@67.195.81.189:38112
> I0115 19:22:12.438400 10777 containerizer.cpp:102] Using isolation: posix/cpu,posix/mem
> I0115 19:22:12.444885 10816 slave.cpp:173] Slave started on 3)@67.195.81.189:38112
> I0115 19:22:12.448340 10777 sched.cpp:151] Version: 0.22.0
> I0115 19:22:12.449358 10815 sched.cpp:248] New master detected at master@67.195.81.189:38112
> I0115 19:22:12.449451 10815 sched.cpp:304] Authenticating with master master@67.195.81.189:38112
> I0115 19:22:12.449482 10815 sched.cpp:311] Using default CRAM-MD5 authenticatee
> I0115 19:22:12.449959 10813 authenticatee.hpp:114] Initializing client SASL
> I0115 19:22:12.451356 10813 authenticatee.hpp:138] Creating new client SASL connection
> I0115 19:22:12.451514 10804 master.cpp:881] Dropping 'mesos.internal.AuthenticateMessage' message since not recovered yet
> I0115 19:22:12.459143 10816 slave.cpp:300] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000]
> I0115 19:22:12.459141 10802 slave.cpp:300] Slave resources: cpus(*):2; mem(*):10240; disk(*):3.70122e+06; ports(*):[31000-32000]
> I0115 19:22:12.459276 10814 status_update_manager.cpp:171] Pausing sending status updates
> I0115 19:22:12.459388 10816 slave.cpp:329] Slave hostname: proserpina.apache.org
> I0115 19:22:12.459306 10808 slave.cpp:613] New master detected at master@67.195.81.189:38112
> I0115 19:22:12.459406 10816 slave.cpp:330] Slave checkpoint: true
> I0115 19:22:12.459450 10802 slave.cpp:329] Slave hostname: proserpina.apache.org
> I0115 19:22:12.459463 10802 slave.cpp:330] Slave checkpoint: true
> I0115 19:22:12.459491 10808 slave.cpp:638] No credentials provided. Attempting to register without authentication
> I0115 19:22:12.459619 10808 slave.cpp:649] Detecting new master
> *** Error in `/home/jenkins/jenkins-slave/workspace/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/build/src/.libs/lt-no-executor-framework': double free or corruption (fasttop): 0x00002b1cf80016a0 ***
> ../../src/tests/script.cpp:83: Failure
> Failed
> no_executor_framework_test.sh terminated with signal Aborted
> [  FAILED  ] ExamplesTest.NoExecutorFramework (1681 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)