You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@rocketmq.apache.org by GitBox <gi...@apache.org> on 2019/12/11 03:34:22 UTC

[GitHub] [rocketmq] henrypoter opened a new issue #1641: INDEX_OUT_OF_RANGE caused broker & nameserver shutdown

henrypoter opened a new issue #1641: INDEX_OUT_OF_RANGE caused broker & nameserver shutdown
URL: https://github.com/apache/rocketmq/issues/1641
 
 
   1. Please describe the issue you observed:
   
   - What did you do (The steps to reproduce)?
   rocketmq版本 4.6.0。使用dledger模式部署了一个broker分片,1主2从。
   其中有一个从节点在压测过程中broker和namserver进程终止了
   错误信息
   `[rocketmq@y2_web212 rocketmqlogs]$ vi broker_default.log 
   
   2019-12-05 03:38:01 ERROR EntryDispatcher-n0-n1 - [Push-n1]Error in EntryDispatcher-n0-n1 writeIndex=29222072 compareIndex=-1
   io.openmessaging.storage.dledger.exception.DLedgerException: [code=410,name=INDEX_OUT_OF_RANGE,desc=] 29221072 should between 300627707-313007211
           at io.openmessaging.storage.dledger.utils.PreConditions.check(PreConditions.java:41) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.store.file.DLedgerMmapFileStore.get(DLedgerMmapFileStore.java:479) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppendInner(DLedgerEntryPusher.java:389) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doCheckAppendResponse(DLedgerEntryPusher.java:434) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppend(DLedgerEntryPusher.java:461) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doWork(DLedgerEntryPusher.java:602) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.ShutdownAbleThread.run(ShutdownAbleThread.java:87) [dledger-0.1.jar:na]
   2019-12-05 03:38:02 INFO DLedgerFlushDataService - Flush data cost=691 ms
   2019-12-05 03:38:02 ERROR StateMaintainer - Send heartBeat request failed info[group=RaftNode00,term=1,code=200,local=n0,remote=n1,leader=n0]
   org.apache.rocketmq.remoting.exception.RemotingConnectException: connect to 192.168.111.213:40911 failed
           at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeAsync(NettyRemotingClient.java:531) ~[rocketmq-remoting-4.6.0.jar:4.6.0]
           at io.openmessaging.storage.dledger.DLedgerRpcNettyService.heartBeat(DLedgerRpcNettyService.java:118) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerLeaderElector.sendHeartbeats(DLedgerLeaderElector.java:252) [dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerLeaderElector.maintainAsLeader(DLedgerLeaderElector.java:320) [dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerLeaderElector.maintainState(DLedgerLeaderElector.java:506) [dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerLeaderElector.access$200(DLedgerLeaderElector.java:40) [dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerLeaderElector$StateMaintainer.doWork(DLedgerLeaderElector.java:548) [dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.ShutdownAbleThread.run(ShutdownAbleThread.java:87) [dledger-0.1.jar:na]
   2019-12-05 03:38:03 WARN EntryDispatcher-n0-n1 - [Push-n1]Retry to push entry at 29221072
   2019-12-05 03:38:03 ERROR EntryDispatcher-n0-n1 - [Push-n1]Error in EntryDispatcher-n0-n1 writeIndex=29222072 compareIndex=-1
   io.openmessaging.storage.dledger.exception.DLedgerException: [code=410,name=INDEX_OUT_OF_RANGE,desc=] 29221072 should between 300627707-313007409
           at io.openmessaging.storage.dledger.utils.PreConditions.check(PreConditions.java:41) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.store.file.DLedgerMmapFileStore.get(DLedgerMmapFileStore.java:479) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppendInner(DLedgerEntryPusher.java:389) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doCheckAppendResponse(DLedgerEntryPusher.java:434) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppend(DLedgerEntryPusher.java:461) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doWork(DLedgerEntryPusher.java:602) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.ShutdownAbleThread.run(ShutdownAbleThread.java:87) [dledger-0.1.jar:na]
   2019-12-05 03:38:03 INFO QuorumAckChecker - [n0][LEADER] term=1 ledgerBegin=300627707 ledgerEnd=313007409 committed=313007409 watermarks={1:{"n0":313007409,"n1":29221071,"n2":313007409}}
   2019-12-05 03:38:03 WARN EntryDispatcher-n0-n1 - [Push-n1]Retry to push entry at 29221072
   2019-12-05 03:38:03 ERROR EntryDispatcher-n0-n1 - [Push-n1]Error in EntryDispatcher-n0-n1 writeIndex=29222072 compareIndex=-1
   io.openmessaging.storage.dledger.exception.DLedgerException: [code=410,name=INDEX_OUT_OF_RANGE,desc=] 29221072 should between 300627707-313007431
           at io.openmessaging.storage.dledger.utils.PreConditions.check(PreConditions.java:41) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.store.file.DLedgerMmapFileStore.get(DLedgerMmapFileStore.java:479) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppendInner(DLedgerEntryPusher.java:389) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doCheckAppendResponse(DLedgerEntryPusher.java:434) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppend(DLedgerEntryPusher.java:461) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doWork(DLedgerEntryPusher.java:602) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.ShutdownAbleThread.run(ShutdownAbleThread.java:87) [dledger-0.1.jar:na]
   2019-12-05 03:38:04 WARN EntryDispatcher-n0-n1 - [Push-n1]Retry to push entry at 29221072
   2019-12-05 03:38:04 ERROR EntryDispatcher-n0-n1 - [Push-n1]Error in EntryDispatcher-n0-n1 writeIndex=29222072 compareIndex=-1
   io.openmessaging.storage.dledger.exception.DLedgerException: [code=410,name=INDEX_OUT_OF_RANGE,desc=] 29221072 should between 300627707-313007473
           at io.openmessaging.storage.dledger.utils.PreConditions.check(PreConditions.java:41) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.store.file.DLedgerMmapFileStore.get(DLedgerMmapFileStore.java:479) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppendInner(DLedgerEntryPusher.java:389) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doCheckAppendResponse(DLedgerEntryPusher.java:434) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doAppend(DLedgerEntryPusher.java:461) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.DLedgerEntryPusher$EntryDispatcher.doWork(DLedgerEntryPusher.java:602) ~[dledger-0.1.jar:na]
           at io.openmessaging.storage.dledger.ShutdownAbleThread.run(ShutdownAbleThread.java:87) [dledger-0.1.jar:na]
   2019-12-05 03:38:04 WARN EntryDispatcher-n0-n1 - [Push-n1]Retry to push entry at 29221072
   2019-12-05 03:38:04 ERROR EntryDispatcher-n0-n1 - [Push-n1]Error in EntryDispatcher-n0-n1 writeIndex=29222072 compareIndex=-1
   ?ERROR      `
   
   重启broker和NameServer后恢复log
   
   `
   [rocketmq@y2zomq213 rocketmqlogs]$ tail -fn 111111111 *.log | grep recover
   2019-12-05 14:20:24 INFO main - Begin to recover data from entryIndex=629360492 fileIndex=17 fileSize=20 fileName=/home/rocketmq/store/dledger-n1/data/00000000426275504128 
   2019-12-05 14:20:25 INFO main - Trying to recover data file /home/rocketmq/store/dledger-n1/data/00000000427349245952
   2019-12-05 14:20:26 INFO main - Trying to recover data file /home/rocketmq/store/dledger-n1/data/00000000428422987776
           at io.openmessaging.storage.dledger.store.file.DLedgerMmapFileStore.recover(DLedgerMmapFileStore.java:216) ~[dledger-0.1.jar:na]
           at org.apache.rocketmq.store.dledger.DLedgerCommitLog.recover(DLedgerCommitLog.java:253) [rocketmq-store-4.6.0.jar:4.6.0]
           at org.apache.rocketmq.store.dledger.DLedgerCommitLog.recoverNormally(DLedgerCommitLog.java:305) [rocketmq-store-4.6.0.jar:4.6.0]
           at org.apache.rocketmq.store.DefaultMessageStore.recover(DefaultMessageStore.java:1334) [rocketmq-store-4.6.0.jar:4.6.0]
   2019-12-05 22:28:18 INFO main - Begin to recover data from entryIndex=629360492 fileIndex=17 fileSize=20 fileName=/home/rocketmq/store/dledger-n1/data/00000000426275504128 
   2019-12-05 22:28:19 INFO main - Trying to recover data file /home/rocketmq/store/dledger-n1/data/00000000427349245952
   2019-12-05 22:28:20 INFO main - Trying to recover data file /home/rocketmq/store/dledger-n1/data/00000000428422987776
           at io.openmessaging.storage.dledger.store.file.DLedgerMmapFileStore.recover(DLedgerMmapFileStore.java:216) ~[dledger-0.1.jar:na]
           at org.apache.rocketmq.store.dledger.DLedgerCommitLog.recover(DLedgerCommitLog.java:253) [rocketmq-store-4.6.0.jar:4.6.0]
           at org.apache.rocketmq.store.dledger.DLedgerCommitLog.recoverNormally(DLedgerCommitLog.java:305) [rocketmq-store-4.6.0.jar:4.6.0]
           at org.apache.rocketmq.store.DefaultMessageStore.recover(DefaultMessageStore.java:1334) [rocketmq-store-4.6.0.jar:4.6.0]
   2019-12-03 10:47:56 INFO main - recover next consume queue file, /home/rocketmq/store/consumequeue/uat-topic/0/00000000000012000000
   2019-12-03 10:47:56 INFO main - recover next consume queue file, /home/rocketmq/store/consumequeue/uat-topic/0/00000000000018000000
   2019-12-03 10:47:56 INFO main - recover current consume queue file over,  /home/rocketmq/store/consumequeue/uat-topic/0/00000000000018000000 0 0 0
   2019-12-03 10:47:56 INFO main - recover current consume queue queue over /home/rocketmq/store/consumequeue/uat-topic/0/00000000000018000000 22122820
   2019-12-03 10:47:56 INFO main - recover next consume queue file, /home/rocketmq/store/consumequeue/uat-topic/1/00000000000012000000
   2019-12-03 10:47:56 INFO main - recover next consume queue file, /home/rocketmq/store/consumequeue/uat-topic/1/00000000000018000000
   2019-12-03 10:47:56 INFO main - recover current consume queue file over,  /home/rocketmq/store/consumequeue/uat-topic/1/00000000000018000000 0 0 0
   2019-12-03 10:47:56 INFO main - recover current consume queue queue over /home/rocketmq/store/consumequeue/uat-topic/1/00000000000018000000 22123000
   2019-12-03 10:47:56 INFO main - recover next consume queue file, /home/rocketmq/store/consumequeue/uat-topic/2/00000000000012000000
   2019-12-03 10:47:56 INFO main - recover next consume queue f`
   - What did you expect to see?
   
   
   2. Please tell us about your environment:
   
   3. Other information (e.g. detailed explanation, logs, related issues, suggestions how to fix, etc):
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services