You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@rocketmq.apache.org by GitBox <gi...@apache.org> on 2018/09/18 05:20:43 UTC

[GitHub] suiyuzeng edited a comment on issue #467: Message missed after recovering from abnormal shutdown

suiyuzeng edited a comment on issue #467: Message missed after recovering from abnormal shutdown
URL: https://github.com/apache/rocketmq/issues/467#issuecomment-422258727
 
 
   Reading the code about recover() according to the prompt, the message will do reput. Analyzing the log again, i find some about the error log.
   The first error log of ReputMessageService:
   2018-09-11 16:46:46.976 WARN ReputMessageService - [BUG]logic queue order maybe wrong, expectLogicOffset: 1050988840 currentLogicOffset: 1050988820 Topic: role_change QID: 4 Diff: 20
   The cqOffset reputing:52549442. There is something wrong with cqOffset 52549441 of qid 4.
   
   From the producer log, the phyOffset is 80549937024 of cqOffset 52549441, as follow:
   2018-09-11 16:37:32,006 [ INFO ] MissChecker - send msg success, topic:role_change, tag:1536655040000, index:399503976, result:SendResult [sendStatus=SLAVE_NOT_AVAILABLE, msgId=0AB314D91D3F070DEA4E3710B4E7BD18, offsetMsgId=0A60706900002A9F00000012C1267F80, messageQueue=MessageQueue [topic=role_change, brokerName=syz-00, queueId=4], queueOffset=52549441]
   
   In the recover log broker, the max phy offset is 80549937216. As the messages with fixed length 192, the laste message offset is 80549937024 whose cqoffset is 52549441.
   2018-09-11 16:44:11.292 INFO main - load over, and the max phy offset = 80549937216
   
   And I find some log else about this issue:
   2018-09-11 16:44:11.123 ERROR main - [BUG]read total count not equals msg total size. totalSize=192, readTotalCount=140, bodyLen=38, topicLen=11, propertiesLength=0
   2018-09-11 16:44:11.134 INFO main - /home/suiyuzeng/store/consumequeue/0 mkdir OK
   2018-09-11 16:44:11.134 WARN main - found a illegal magic code 0x0
   2018-09-11 16:44:11.180 INFO main - topic:role_change, queue:4, queue offset after truncate:52549441, origin:52549441
   The last line is added by me for debug.  In truncateDirtyLogicFiles() before return, get the cqoffset by getMaxOffsetInQueue(). And The cqOffset should be 52549442. 
   
   I think the last message(cqoffset 52549441, phyoffset 80549937024)  was damaged. In the log, totalSize, bodyLen, topicLen are right but propertiesLength is wrong. As checkMessageAndReturnSize() find it abnormal and return false, the message is dispatched. 
   
         DispatchRequest dispatchRequest = this.checkMessageAndReturnSize(byteBuffer, checkCRCOnRecover);
         int size = dispatchRequest.getMsgSize();
         // Normal data
         if (size > 0) {
                ..........
          }
   
   As the topic was not set in the DispatchRequest, we find the log " /home/suiyuzeng/store/consumequeue/0 mkdir OK ". So the message whit cqOffset 52549441 may was not despatch to the consume queue.
   
   In recoverAbnormally() ,only the size is checked. Should we check isSuccess as in recoverNormally? Truncate the messages when isSuccess is false.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services