You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by GitBox <gi...@apache.org> on 2022/01/26 08:27:02 UTC

[GitHub] [incubator-pegasus] WHBANG opened a new issue #895: the single partition data dir is deleted, restart the service and it will be newly created

WHBANG opened a new issue #895:
URL: https://github.com/apache/incubator-pegasus/issues/895


   ## Bug Report
   
   Please answer these questions before submitting your issue. Thanks!
   
   1. What did you do?
   
   - create a single partition table, write some data, check these data is readable
   - stop replica server
   - delete data directory( replica/, replica/reps/, replica/reps/m.n.pegasus:delete any level of directory can reproduce)
   - start replica server
   - the deleted partitions are rebuild and the data is lost
   
   2. What did you expect to see?
   print error log, partition status is unhealthy
   
   3. What did you see instead?
   no error log and the deleted partitions are rebuild
   
   4. What version of Pegasus are you using?
   2.0.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] hycdong commented on issue #895: the single partition data dir is deleted, restart the service and it will be newly created

Posted by GitBox <gi...@apache.org>.
hycdong commented on issue #895:
URL: https://github.com/apache/incubator-pegasus/issues/895#issuecomment-1022075907


   > > Thanks for your report~ I have some problems with your cases:
   > > 
   > > * Table have one partition, and how many replicas of it? 1 replica or 3 replicas (by default) ?
   > > * How many replica server nodes in your case?
   > > 
   > > Expecting your answer.
   > 
   > * i am sorry, the description is wrong, I have modified it, it is 1 replica table, partition num is random, i have tested on 1 and 3 replica server nodes;
   > * only one replica table have this problem;
   
   Thanks for your answer, I got your case.
   
   I suppose your table has 4 partitions with only 1 replica, the stopped replica server called `serverA`, the data directory you delete is `replica/reps/1.0.pegasus`.
   - After you stop serverA, and delete serverA directory 'replica/reps/1.0.pegasus', meta server noticed partition 1.0 can not be found in serverA, it will wait for serverA alive.
   - Restart serverA, meta server noticed it, then tried to recover partition 1.0 on serverA, then create directory `replica/reps/1.0.pegasus`.
   
   As a result, the directory will be created, and this table only have one replica, as you directly deleted the data(in `replica/reps/1.0.pegasus`), its data won't exist. In multi-replica cases, it will learn data from other replicas.
   I think it is not bug, in your case, data is lost and metadata are still stored in meta server and zk, meta server will recreate directory but can not recover the data.
   Hoping my answer is helpful, expecting your reply~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] hycdong edited a comment on issue #895: the single partition data dir is deleted, restart the service and it will be newly created

Posted by GitBox <gi...@apache.org>.
hycdong edited a comment on issue #895:
URL: https://github.com/apache/incubator-pegasus/issues/895#issuecomment-1021981272


   Thanks for your report~
   I have some problems with your cases:
   - Table have one partition, and how many replicas of it? 1 replica or 3 replicas (by default) ?
   - How many replica server nodes in your case?
   
   Expecting your answer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] hycdong commented on issue #895: the single partition data dir is deleted, restart the service and it will be newly created

Posted by GitBox <gi...@apache.org>.
hycdong commented on issue #895:
URL: https://github.com/apache/incubator-pegasus/issues/895#issuecomment-1021981272


   Thanks for your report~
   I have some problems with your cases:
   - Table have one partition, and how many replicas of it? 1 replica or 3 replicas (by default) ?
   - How many replica server nodes in your case?
   Expecting your answer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] hycdong commented on issue #895: the single partition data dir is deleted, restart the service and it will be newly created

Posted by GitBox <gi...@apache.org>.
hycdong commented on issue #895:
URL: https://github.com/apache/incubator-pegasus/issues/895#issuecomment-1022783434


   Thanks for @acelyc111 reply, I got the point. The problem is that for 1 replica table, meta server wouldn't know if the data lost. I think it is a enhancement for DDD case.
   
   > Even for the 3 replica factor table, we can check the consistency of decree/ballot on meta server/zk and the primary replica.
   
   Firstly, in any replica factor, ballot is only controlled by meta server, it is unnecessary to check its consistency.
   
   Secondly, in current implementation, meta server doesn't do any decree consistency check, the decree even not persistent on zk only in meta memory. In most our production environment, table has 3 replica, meta server as the cluster controller, only cares which replica has newer data, and meta server will compare decree only in DDD situation. If 3 replica factor only lost primary. meta server will not  decree compare, just upgrade one of it secondary.
   
   Back to your case, when your case happened, replica server would be willing to core but not meta server. Meta server collect decree information from replica servers, it can not compare the exact decree with the reported one. You can add a decree check when replica server receive the assign_primary request.
   
   Hoping my answer is helpful, expecting your reply~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] WHBANG commented on issue #895: the single partition data dir is deleted, restart the service and it will be newly created

Posted by GitBox <gi...@apache.org>.
WHBANG commented on issue #895:
URL: https://github.com/apache/incubator-pegasus/issues/895#issuecomment-1021999619


   > Thanks for your report~ I have some problems with your cases:
   > 
   > * Table have one partition, and how many replicas of it? 1 replica or 3 replicas (by default) ?
   > * How many replica server nodes in your case?
   > 
   > Expecting your answer.
   
   - i am sorry, the description is wrong, I have modified it, it is 1 replica table, partition num is random, i have tested on 1 and 3 replica server nodes;
   - only one replica table have this problem;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] WHBANG edited a comment on issue #895: the single partition data dir is deleted, restart the service and it will be newly created

Posted by GitBox <gi...@apache.org>.
WHBANG edited a comment on issue #895:
URL: https://github.com/apache/incubator-pegasus/issues/895#issuecomment-1031040901


   @hycdong @acelyc111 hi,I am very happy to participate in the community building of Pegasus. I have made a small modifications to this issue. 
   Happy new year~I am looking forward to your suggestions~
   https://github.com/XiaoMi/rdsn/pull/1044


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] WHBANG commented on issue #895: the single partition data dir is deleted, restart the service and it will be newly created

Posted by GitBox <gi...@apache.org>.
WHBANG commented on issue #895:
URL: https://github.com/apache/incubator-pegasus/issues/895#issuecomment-1031040901


   @hycdong @acelyc111 hi,I am very happy to participate in the community building of Pegasus. I have made a small modifications to this issue. 
   Happy new year~I am looking forward to your suggestions~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] acelyc111 commented on issue #895: the single partition data dir is deleted, restart the service and it will be newly created

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #895:
URL: https://github.com/apache/incubator-pegasus/issues/895#issuecomment-1022109663


   IMO, the cluster would better report this issue, not running normally as nothing happened, trigger a coredump or something like that, of course, it's an optional feature, we can disable it by config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] acelyc111 commented on issue #895: the single partition data dir is deleted, restart the service and it will be newly created

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on issue #895:
URL: https://github.com/apache/incubator-pegasus/issues/895#issuecomment-1022113738


   Even for the 3 replica factor table, we can check the consistency of decree/ballot on meta server/zk and the primary replica.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org