You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/10/07 09:46:22 UTC

[GitHub] [ozone] sodonnel commented on pull request #3806: HDDS-7291: Fixing exception handling in case of non positive replica …

sodonnel commented on PR #3806:
URL: https://github.com/apache/ozone/pull/3806#issuecomment-1271367761

   I had a quick look at this.
   
   The problem we are facing, is a ContainerReplica for an EC container, which does not have a replica index >= 1. This is clearly a bug somewhere - perhaps on the Datanodes, but we are not sure.
   
   The replicas come from the DNs, and the only place they get persisted into SCM memory is in the ContainerReportHandler, namely `AbstractContainerReportHandler.updateContainerReplica()`. This seems to be the only place in the code a ContainerReplica gets persisted outside of tests.
   
   I think it would be better if we validated the replica "at source" instead of placing checks all through the code. This is kind of like integrity constraints on a database table. You enforce the checks on insert, and then trust the data on select.
   
   So if we check the replica when it is received from the DN, and if its an EC container which does not have a valid index, we do not load it into SCM memory and complain about it in the logs.
   
   Unfortunately a datanode has no idea if a container is EC or not, so it cannot check for a valid index, but when we receive it at SCM, we know the container the replica belongs to, and can check it then.
   
   It is still an open question in my mind about what we should do with it. In the failures we witnessed, the system seemed to correct itself. Perhaps this is some sort of race condition at the DN which causes a zero initially, but I am guessing. For now, perhaps we just drop the replica in SCM and log about it being invalid, and it may bring us closer to the root cause of the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org