You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Hindman (JIRA)" <ji...@apache.org> on 2014/04/30 20:02:19 UTC

[jira] [Comment Edited] (MESOS-1271) CHECK failure in replica.

    [ https://issues.apache.org/jira/browse/MESOS-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985836#comment-13985836 ] 

Benjamin Hindman edited comment on MESOS-1271 at 4/30/14 6:01 PM:
------------------------------------------------------------------

Checked its log, seems that the local replica received the LEARNED notice first and then received the WRITE request! Not sure why this happened!

{noformat}
I0430 02:26:35.350102 45906 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 6131
I0430 02:26:35.378063 45908 replica.cpp:643] Replica received learned notice for position 6131
I0430 02:26:35.383350 45908 leveldb.cpp:341] Persisting action (718523 bytes) to leveldb took 5.173633ms
I0430 02:26:35.383401 45908 replica.cpp:664] Persisted action at 6131
I0430 02:26:35.383414 45908 replica.cpp:649] Replica learned APPEND action at position 6131
I0430 02:26:35.383997 45923 replica.cpp:508] Replica received write request for position 6131
I0430 02:26:35.384345 45923 leveldb.cpp:436] Reading position from leveldb took 308357ns
I0430 02:26:35.389766 45923 leveldb.cpp:341] Persisting action (718521 bytes) to leveldb took 5.054618ms
I0430 02:26:35.389829 45923 replica.cpp:664] Persisted action at 6131
F0430 02:26:35.393795 45903 coordinator.cpp:399] Check failed: !missing Not expecting local replica to be missing position 6131 after the writing is done
{noformat}


was (Author: jieyu):
host: smf1-bkm-14-sr1.prod.twitter.com is the leader at the time

Checked its log, seems that the local replica received the LEARNED notice first and then received the WRITE request! Not sure why this happened!

{noformat}
I0430 02:26:35.350102 45906 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 6131
I0430 02:26:35.378063 45908 replica.cpp:643] Replica received learned notice for position 6131
I0430 02:26:35.383350 45908 leveldb.cpp:341] Persisting action (718523 bytes) to leveldb took 5.173633ms
I0430 02:26:35.383401 45908 replica.cpp:664] Persisted action at 6131
I0430 02:26:35.383414 45908 replica.cpp:649] Replica learned APPEND action at position 6131
I0430 02:26:35.383997 45923 replica.cpp:508] Replica received write request for position 6131
I0430 02:26:35.384345 45923 leveldb.cpp:436] Reading position from leveldb took 308357ns
I0430 02:26:35.389766 45923 leveldb.cpp:341] Persisting action (718521 bytes) to leveldb took 5.054618ms
I0430 02:26:35.389829 45923 replica.cpp:664] Persisted action at 6131
F0430 02:26:35.393795 45903 coordinator.cpp:399] Check failed: !missing Not expecting local replica to be missing position 6131 after the writing is done
{noformat}

> CHECK failure in replica.
> -------------------------
>
>                 Key: MESOS-1271
>                 URL: https://issues.apache.org/jira/browse/MESOS-1271
>             Project: Mesos
>          Issue Type: Bug
>          Components: replicated log
>    Affects Versions: 0.19.0
>            Reporter: Benjamin Mahler
>            Assignee: Jie Yu
>             Fix For: 0.19.0
>
>
> {noformat}
> I0430 02:26:34.484668 45920 registrar.cpp:427] Successfully updated 'registry'
> I0430 02:26:34.558326 45920 registrar.cpp:379] Attempting to update the 'registry'
> I0430 02:26:34.658385 45910 log.cpp:680] Attempting to append 719410 bytes to the log
> I0430 02:26:34.658483 45910 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 6129
> I0430 02:26:34.666218 45925 replica.cpp:508] Replica received write request for position 6129
> I0430 02:26:34.672080 45925 leveldb.cpp:341] Persisting action (719433 bytes) to leveldb took 5.610781ms
> I0430 02:26:34.672142 45925 replica.cpp:664] Persisted action at 6129
> I0430 02:26:34.682070 45913 replica.cpp:643] Replica received learned notice for position 6129
> I0430 02:26:34.687636 45913 leveldb.cpp:341] Persisting action (719435 bytes) to leveldb took 5.50696ms
> I0430 02:26:34.687713 45913 replica.cpp:664] Persisted action at 6129
> I0430 02:26:34.687729 45913 replica.cpp:649] Replica learned APPEND action at position 6129
> I0430 02:26:34.688134 45912 log.cpp:699] Attempting to truncate the log to 6129
> I0430 02:26:34.688251 45911 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 6130
> I0430 02:26:34.689167 45911 replica.cpp:508] Replica received write request for position 6130
> I0430 02:26:34.689728 45911 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 529731ns
> I0430 02:26:34.689746 45911 replica.cpp:664] Persisted action at 6130
> I0430 02:26:34.701628 45919 replica.cpp:643] Replica received learned notice for position 6130
> I0430 02:26:34.702505 45919 leveldb.cpp:341] Persisting action (20 bytes) to leveldb took 762510ns
> I0430 02:26:34.702551 45919 leveldb.cpp:399] Deleting ~2 keys from leveldb took 20442ns
> I0430 02:26:34.702568 45919 replica.cpp:664] Persisted action at 6130
> I0430 02:26:34.702590 45919 replica.cpp:649] Replica learned TRUNCATE action at position 6130
> I0430 02:26:35.163915 45920 registrar.cpp:427] Successfully updated 'registry'
> I0430 02:26:35.246116 45920 registrar.cpp:379] Attempting to update the 'registry'
> I0430 02:26:35.348455 45910 log.cpp:680] Attempting to append 718498 bytes to the log
> I0430 02:26:35.350102 45906 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 6131
> I0430 02:26:35.378063 45908 replica.cpp:643] Replica received learned notice for position 6131
> I0430 02:26:35.383350 45908 leveldb.cpp:341] Persisting action (718523 bytes) to leveldb took 5.173633ms
> I0430 02:26:35.383401 45908 replica.cpp:664] Persisted action at 6131
> I0430 02:26:35.383414 45908 replica.cpp:649] Replica learned APPEND action at position 6131
> I0430 02:26:35.383997 45923 replica.cpp:508] Replica received write request for position 6131
> I0430 02:26:35.384345 45923 leveldb.cpp:436] Reading position from leveldb took 308357ns
> I0430 02:26:35.389766 45923 leveldb.cpp:341] Persisting action (718521 bytes) to leveldb took 5.054618ms
> I0430 02:26:35.389829 45923 replica.cpp:664] Persisted action at 6131
> F0430 02:26:35.393795 45903 coordinator.cpp:399] Check failed: !missing Not expecting local replica to be missing position 6131 after the writing is done
> *** Check failure stack trace: ***
>     @     0x7f9081eed5fd  google::LogMessage::Fail()
>     @     0x7f9081eef444  google::LogMessage::SendToLog()
>     @     0x7f9081eed1ec  google::LogMessage::Flush()
>     @     0x7f9081eefd39  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f9081d06c52  mesos::internal::log::CoordinatorProcess::updateIndexAfterWritten()
>     @     0x7f9081d18b63  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI6OptionImEN5mesos8internal3log18CoordinatorProcessEbbEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSF_FSD_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
>     @     0x7f9081e231c2  process::ProcessManager::resume()
>     @     0x7f9081e234bc  process::schedule()
>     @     0x7f908139783d  start_thread
>     @     0x7f90800ff26d  clone
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)