You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@geode.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/08/06 15:23:00 UTC

[jira] [Commented] (GEODE-7012) Distributed deadlock with StartupMessages if executor pools get full

    [ https://issues.apache.org/jira/browse/GEODE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901177#comment-16901177 ] 

ASF subversion and git services commented on GEODE-7012:
--------------------------------------------------------

Commit 43e02edaff74e2827d7ab0b8a765d8b7ee630521 in geode's branch refs/heads/develop from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=43e02ed ]

Merge pull request #3877 from Bill/feature/GEODE-7012

GEODE-7012: Failure in upgrade testing - upgrading 1.9 to 1.10

> Distributed deadlock with StartupMessages if executor pools get full
> --------------------------------------------------------------------
>
>                 Key: GEODE-7012
>                 URL: https://issues.apache.org/jira/browse/GEODE-7012
>             Project: Geode
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Dan Smith
>            Assignee: Ernest Burghardt
>            Priority: Major
>             Fix For: 1.10.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We hit a distributed deadlock in one of our tests where two members are hung sending startup messages to each other. 
> It turns out that until a member gets a response to a StartupMessage, it is in a state where it blocks all outgoing messages. At the same time, the member is receiving an attempting to respond to other messages, but those responses get blocked. If too many messages come in before the StartupResponseMessage, this ends up filling up the ClusterDistributionManager.highPriorityPool.
> If two members are trying to start up at the same time, and they both fill up the highPriorityPool, they both will fail to process each other's StartupMessage, because that message is executed in the same pool.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)