You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by GitBox <gi...@apache.org> on 2021/04/19 11:05:57 UTC

[GitHub] [incubator-pegasus] zhangyifan27 commented on issue #719: data loss after restarting

zhangyifan27 commented on issue #719:
URL: https://github.com/apache/incubator-pegasus/issues/719#issuecomment-822382925


   @ZhongChaoqiang 
   能具体描述一下这个问题要怎么复现吗?我测试了下learn之后重启,并没有复现出`ERR_INCOMPLETE_DATA`相关的log。
   另外出现这个错误之后,open replica失败,replica及相关数据会被删除,replica server会重新learn一份数据,应该是不会丢数据的,实际中你们发现丢数据的现象了吗?
   
   `_last_committed_decree`在数据成功写入log和memtable之后会更新,而`_last_durable_decree`好像是打checkpoint时才会更新(这部分不太确定, @neverchanje 可以解答下), init info里面记录last_committed_decree的值应该没什么问题,因为数据已经成功写入了,open replica时是需要replay这部分数据的,如果把它改成last_durable_decree 反而是有问题的。
   
   看了下replica start这块的逻辑:https://github.com/apache/incubator-pegasus/blob/a948e89b180b6a5c82d298d0dcc65f7bb770a8be/src/server/pegasus_server_impl.cpp#L1706-L1752 `_last_committed_decree`被初始化为last_flush_decree,`_last_committed_decree`被更新成和`_last_committed_decree`一样的值,如果后面出现`last_durable_decree() < _info.init_durable_decree`,说明replica重启之前可能没有正常flush导致last_flush_decree没有写入manifest(2.0.0之后的版本是写到meta column family):https://github.com/apache/incubator-pegasus/blob/a948e89b180b6a5c82d298d0dcc65f7bb770a8be/src/server/pegasus_server_impl.cpp#L1805-L1812
   我觉得你可以看下是不是replica server进程退出时没有执行这个flush导致的这个问题?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org