You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/10/12 14:40:53 UTC

[GitHub] [hadoop-ozone] sodonnel opened a new pull request #1488: HDDS-4336. ContainerInfo does not persist BCSID (sequenceId) leading to failed replicas reports

sodonnel opened a new pull request #1488:
URL: https://github.com/apache/hadoop-ozone/pull/1488


   ## What changes were proposed in this pull request?
   
   If you create a container, and then close it, the BCSID is synced on the datanodes and then the value is updated in SCM via setting the "sequenceID" field on the containerInfo object for the container.
   
   If you later restart just SCM, the sequenceID becomes zero, and then container reports for the replica fail with a stack trace like:
   
   ```
   Exception in thread "EventQueue-ContainerReportForContainerReportHandler" java.lang.AssertionError
   	at org.apache.hadoop.hdds.scm.container.ContainerInfo.updateSequenceId(ContainerInfo.java:176)
   	at org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerStats(AbstractContainerReportHandler.java:108)
   	at org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:83)
   	at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:162)
   	at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:130)
   	at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50)
   	at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   ```
   
   The assertion here is failing, as it does not allow for the sequenceID to be changed on a CLOSED container:
   
   ```
     public void updateSequenceId(long sequenceID) {
       assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
       sequenceId = max(sequenceID, sequenceId);
     }
   ```
   
   The issue seems to be caused by the serialisation and deserialisation of the containerInfo object to protobuf, as sequenceId never persisted or restored.
   
   However, I am also confused about how this ever worked, as this is a pretty significant problem.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4336
   
   ## How was this patch tested?
   
   New integration test to reproduce the issue before fixing it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] sodonnel merged pull request #1488: HDDS-4336. ContainerInfo does not persist BCSID (sequenceId) leading to failed replica reports

Posted by GitBox <gi...@apache.org>.
sodonnel merged pull request #1488:
URL: https://github.com/apache/hadoop-ozone/pull/1488


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org