You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Alexander Rojas (JIRA)" <ji...@apache.org> on 2015/09/01 16:55:46 UTC

[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

    [ https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725502#comment-14725502 ] 

Alexander Rojas commented on MESOS-3235:
----------------------------------------

I could reproduce it with this output:

{noformat}
[ RUN      ] FetcherCacheHttpTest.HttpCachedConcurrent
HTTP/1.1 200 OK
Date: Tue, 01 Sep 2015 14:44:57 GMT
Content-Length: 30

I0901 16:44:57.354526 1906750208 exec.cpp:133] Version: 0.25.0
E0901 16:44:57.356951 360517632 socket.hpp:174] Shutdown failed on fd=20: Socket is not connected [57]
I0901 16:44:57.357071 356761600 exec.cpp:207] Executor registered on slave 20150901-164457-16777343-65082-67695-S0
E0901 16:44:57.362895 360517632 socket.hpp:174] Shutdown failed on fd=21: Socket is not connected [57]
Registered executor on localhost
Starting task 0
Forked command at 70019
sh -c './mesos-fetcher-test-cmd 0'
E0901 16:44:57.372481 360517632 socket.hpp:174] Shutdown failed on fd=20: Socket is not connected [57]
I0901 16:44:57.461415 1906750208 exec.cpp:133] Version: 0.25.0
I0901 16:44:57.464025 1906750208 exec.cpp:133] Version: 0.25.0
I0901 16:44:57.466198 1906750208 exec.cpp:133] Version: 0.25.0
Command exited with status 0 (pid: 70019)
I0901 16:44:57.468094 1906750208 exec.cpp:133] Version: 0.25.0
E0901 16:44:57.472774 219480064 socket.hpp:174] Shutdown failed on fd=24: Socket is not connected [57]
I0901 16:44:57.472812 218406912 exec.cpp:207] Executor registered on slave 20150901-164457-16777343-65082-67695-S0
E0901 16:44:57.478857 219480064 socket.hpp:174] Shutdown failed on fd=27: Socket is not connected [57]
E0901 16:44:57.479565 360517632 socket.hpp:174] Shutdown failed on fd=20: Socket is not connected [57]
I0901 16:44:57.479615 214392832 exec.cpp:207] Executor registered on slave 20150901-164457-16777343-65082-67695-S0
E0901 16:44:57.479730 216002560 socket.hpp:174] Shutdown failed on fd=28: Socket is not connected [57]
Registered executor on localhost
Starting task 2
Forked command at 70081
sh -c './mesos-fetcher-test-cmd 2'
Registered executor on localhost
Starting task 4
Forked command at 70082
sh -c './mesos-fetcher-test-cmd 4'
E0901 16:44:57.493479 219480064 socket.hpp:174] Shutdown failed on fd=24: Socket is not connected [57]
E0901 16:44:57.495735 216002560 socket.hpp:174] Shutdown failed on fd=28: Socket is not connected [57]
Command exited with status 0 (pid: 70081)
Command exited with status 0 (pid: 70082)
E0901 16:44:57.597821 219480064 socket.hpp:174] Shutdown failed on fd=24: Socket is not connected [57]
E0901 16:44:57.598403 216002560 socket.hpp:174] Shutdown failed on fd=28: Socket is not connected [57]
../../src/tests/fetcher_cache_tests.cpp:928: Failure
Failed to wait 15secs for awaitFinished(tasks.get())
[  FAILED  ] FetcherCacheHttpTest.HttpCachedConcurrent (15365 ms)
{noformat}

I might have found a way to permanently reproduce the failure, which seems to indicate a race condition somewhere.

> FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky
> -------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-3235
>                 URL: https://issues.apache.org/jira/browse/MESOS-3235
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>            Reporter: Joseph Wu
>            Assignee: Bernd Mathiske
>              Labels: mesosphere
>
> On OSX, {{make clean && make -j8 V=0 check}}:
> {code}
> [----------] 3 tests from FetcherCacheHttpTest
> [ RUN      ] FetcherCacheHttpTest.HttpCachedSerialized
> HTTP/1.1 200 OK
> Date: Fri, 07 Aug 2015 17:23:05 GMT
> Content-Length: 30
> I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57]
> I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 0
> Forked command at 54363
> sh -c './mesos-fetcher-test-cmd 0'
> E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57]
> Command exited with status 0 (pid: 54363)
> E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57]
> I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57]
> I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 1
> Forked command at 54411
> sh -c './mesos-fetcher-test-cmd 1'
> E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57]
> Command exited with status 0 (pid: 54411)
> E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: Socket is not connected [57]
> ../../src/tests/fetcher_cache_tests.cpp:860: Failure
> Failed to wait 15secs for awaitFinished(task.get())
> *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are using GNU date ***
> [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
> [ RUN      ] FetcherCacheHttpTest.HttpCachedConcurrent
> PC: @        0x113723618 process::Owned<>::get()
> *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
>     @     0x7fff8fcacf1a _sigtramp
>     @     0x7f9bc3109710 (unknown)
>     @        0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
>     @        0x113862f9d mesos::internal::slave::MesosContainerizerProcess::fetch()
>     @        0x1138f1b5d _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcEEEERK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
>     @        0x1138f18cf _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEERK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_
>     @        0x1143768cf std::__1::function<>::operator()()
>     @        0x11435ca7f process::ProcessBase::visit()
>     @        0x1143ed6fe process::DispatchEvent::visit()
>     @        0x1127aaaa1 process::ProcessBase::serve()
>     @        0x114343b4e process::ProcessManager::resume()
>     @        0x1143431ca process::internal::schedule()
>     @        0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEEEEEPvS5_
>     @     0x7fff95090268 _pthread_body
>     @     0x7fff950901e5 _pthread_start
>     @     0x7fff9508e41d thread_start
> Failed to synchronize with slave (it's probably exited)
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {code}
> This was encountered just once out of 3+ {{make check}}s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)