You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Kirk Lund (Jira)" <ji...@apache.org> on 2019/08/30 16:05:00 UTC

[jira] [Assigned] (GEODE-7152) AlertAppender recursive call causes ForcedDisconnect to hang

     [ https://issues.apache.org/jira/browse/GEODE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk Lund reassigned GEODE-7152:
--------------------------------

    Assignee: Kirk Lund

> AlertAppender recursive call causes ForcedDisconnect to hang
> ------------------------------------------------------------
>
>                 Key: GEODE-7152
>                 URL: https://issues.apache.org/jira/browse/GEODE-7152
>             Project: Geode
>          Issue Type: Bug
>          Components: management
>            Reporter: Kirk Lund
>            Assignee: Kirk Lund
>            Priority: Major
>
> AlertAppender uses a ThreadLocal to prevent recursive calls from actually doing anything. However, a recent upgrade to our log4j dependencies seems to have changed the behavior such that log4j refuses to invoke {{doAppend}} if the thread is currently handling a {{sendAlert}} initiated from {{doAppend}}. To fix this bug, we will need to remove the use of our ThreadLocal from {{doAppend}} and change {{sendAlert}} to be asynchronous.
> In this run we are expecting ForcedDisconnects, but they don't occur until after the hang is declared.
> The appearance of these "Recursive call to appender" messages is new:
> {noformat}
> wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,554 Geode Failure Detection thread 6 ERROR Recursive call to appender ALERT
> {noformat}
> \*** Test is dropping the network
> {noformat}
> wan_bridgeNetworkPartition1-0812-213411/vm_9_locator_1_2_host2_13435.log: [info 2019/08/12 21:37:46.265 PDT <vm_9_thr_13_locator_1_2_host2_13435> tid=0x94] Dropping network connection from rs-FullRegression13040513a0i3large-hydra-client-46 to rs-FullRegression13040513a0i3large-hydra-client-1 and from rs-FullRegression13040513a0i3large-hydra-client-1 to rs-FullRegression13040513a0i3large-hydra-client-46
> {noformat}
> \*** Here are the appender messages
> {noformat}
> wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,554 Geode Failure Detection thread 6 ERROR Recursive call to appender ALERT
> wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,554 Geode Failure Detection thread 4 ERROR Recursive call to appender ALERT
> wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,554 Geode Failure Detection thread 5 ERROR Recursive call to appender ALERT
> wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,556 Geode Failure Detection thread 5 ERROR Recursive call to appender ALERT
> wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,558 Geode Failure Detection thread 4 ERROR Recursive call to appender ALERT
> wan_bridgeNetworkPartition1-0812-213411/bgexec15919_14066.log: 2019-08-12 21:37:47,559 Geode Failure Detection thread 4 ERROR Recursive call to appender ALERT
> {noformat}
> \*** network is dropped
> {noformat}
> wan_bridgeNetworkPartition1-0812-213411/vm_9_locator_1_2_host2_13435.log: [info 2019/08/12 21:37:47.136 PDT <vm_9_thr_13_locator_1_2_host2_13435> tid=0x94] Dropped network connection from rs-FullRegression13040513a0i3large-hydra-client-46 to rs-FullRegression13040513a0i3large-hydra-client-1 and from rs-FullRegression13040513a0i3large-hydra-client-1 to rs-FullRegression13040513a0i3large-hydra-client-46
> {noformat}
> \*** hang is declared
> {noformat}
> wan_bridgeNetworkPartition1-0812-213411/taskmaster_15326.log: [severe 2019/08/12 21:40:44.626 PDT <master_15326> tid=0x1] Result for vm_12_thr_16_wan1Lose_host1_14084: TASK[0] splitBrain.NetworkPartitionTest.HydraTask_doEntryOperations: HANG a client exceeded max result wait sec: 180
> {noformat}
> \*** now we see ForcedDisconnects
> {noformat}
> [fatal 2019/08/12 21:40:57.826 PDT <unicast receiver,rs-FullRegression13040513a0i3large-hydra-client-46-43284> tid=0x24] Membership service failure: Membership coordinator 10.32.110.145(gemfire3_host1_14066:14066:locator)<ec><v2>:41000 has declared that a network partition has occurred
> org.apache.geode.ForcedDisconnectException: Membership coordinator 10.32.110.145(gemfire3_host1_14066:14066:locator)<ec><v2>:41000 has declared that a network partition has occurred
>         at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.forceDisconnect(GMSMembershipManager.java:2506)
>         at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1106)
>         at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processMessage(GMSJoinLeave.java:1481)
>         at org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1328)
>         at org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1266)
>         at org.jgroups.JChannel.invokeCallback(JChannel.java:816)
>         at org.jgroups.JChannel.up(JChannel.java:741)
>         at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
>         at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
>         at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
>         at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077)
>         at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792)
>         at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)
>         at org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:73)
>         at org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:72)
>         at org.jgroups.protocols.TP.passMessageUp(TP.java:1658)
>         at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876)
>         at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10)
>         at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789)
>         at org.jgroups.protocols.TP.receive(TP.java:1714)
>         at org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:152)
>         at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)