You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Xiaoyu Yao (Jira)" <ji...@apache.org> on 2020/07/08 01:17:00 UTC

[jira] [Comment Edited] (HDDS-3918) ConcurrentModificationException in ContainerReportHandler.onMessage

    [ https://issues.apache.org/jira/browse/HDDS-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153148#comment-17153148 ] 

Xiaoyu Yao edited comment on HDDS-3918 at 7/8/20, 1:16 AM:
-----------------------------------------------------------

We have a race condition here on the container set of the NodeStateMap#nodeToContainer map. The ICR (Incrementation container report) and CR (container report) and processed in separate executors threads. 

ICR simply add() to the container set.
CR get() and set() to the container set. 

HDDS-3110 has the correct root cause analysis of the race condition but does not choose the thread safe version of the HashSet. So the race still exist as shown in SCM logs. 

Attach a simple unit test TestCME.java to verify this and the fix has been posted in the PR. 


was (Author: xyao):
We have a race condition here on the container set of the NodeStateMap#nodeToContainer map. The ICR (Incrementation container report) and CR (container report) and processed in separate executors threads. 

ICR simply add() to the container set.
CR get() and set() to the container set. 

HDDS-3110 has the correct root cause analysis of the race condition but does not choose the thread safe version of the HashSet. So the race still exist as shown in SCM logs. 

I have written a simple unit test to verify this and the will post the fix shortly. 


> ConcurrentModificationException in ContainerReportHandler.onMessage
> -------------------------------------------------------------------
>
>                 Key: HDDS-3918
>                 URL: https://issues.apache.org/jira/browse/HDDS-3918
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Sammi Chen
>            Assignee: Nanda kumar
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: TestCME.java
>
>
> 2020-07-03 14:51:45,489 [EventQueue-ContainerReportForContainerReportHandler] ERROR org.apache.hadoop.hdds.server.events.SingleThreadExecutor: Error on execution message org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher$ContainerReportFromDatanode@8f6e7cb
> java.util.ConcurrentModificationException
>         at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
>         at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
>         at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044)
>         at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>         at java.util.HashSet.<init>(HashSet.java:120)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:127)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50)
>         at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2020-07-03 14:51:45,648 [EventQueue-ContainerReportForContainerReportHandler] ERROR org.apache.hadoop.hdds.server.events.SingleThreadExecutor: Error on execution message org.apache.hadoop.hdds.scm.server.SCMDatanodeHeartbeatDispatcher$ContainerReportFromDatanode@49d2b84b
> java.util.ConcurrentModificationException
>         at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
>         at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
>         at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044)
>         at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
>         at java.util.HashSet.<init>(HashSet.java:120)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:127)
>         at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50)
>         at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org