You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Juan José Ramos Cassella (JIRA)" <ji...@apache.org> on 2019/03/21 12:21:00 UTC

[jira] [Created] (GEODE-6551) Multiple Executions of RegionAlterFunction Leaves Partition Region Inconsistent

Juan José Ramos Cassella created GEODE-6551:
-----------------------------------------------

             Summary: Multiple Executions of RegionAlterFunction Leaves Partition Region Inconsistent
                 Key: GEODE-6551
                 URL: https://issues.apache.org/jira/browse/GEODE-6551
             Project: Geode
          Issue Type: Bug
          Components: configuration, gfsh, wan
            Reporter: Juan José Ramos Cassella


When trying to assign a non-persistent parallel {{gateway-sender}} / {{async-event-queue}} to a persistent partitioned region through {{gfsh}}, the actual region is left inconsistent in the {{cluster configuration service}} if the internal function is executed more than once.
The problem is that the {{gateway-sender}} / {{async-event-queue}} is added to the internal list too early within the execution lifecycle and, if the actual addition fails afterwards, the internal list is never reverted to its original state. This invalid configuration is persisted into the cluster configuration service afterwards (for the second, "successful execution"), so the subsequent restart of the servers will miserably fail.
The following set of steps reproduces the problem for a {{gateway-sender}}, but the logic is exactly the same for an {{async-event-queue}}:

{noformat}
gfsh -e "start locator --name=locator --port=10101"
gfsh -e "start server --name=server --server-port=40404 --locators=localhost[10101]"
gfsh -e "connect --locator=localhost[10101]" -e "create disk-store --name=diskStore --dir=diskStore"
gfsh -e "connect --locator=localhost[10101]" -e "create region --name=testRegion --type=PARTITION_PERSISTENT --disk-store=diskStore"
gfsh -e "connect --locator=localhost[10101]" -e "create gateway-sender --id=gateway --parallel=true --remote-distributed-system-id=2 --enable-persistence=false"

# First Execution Fails
gfsh -e "connect --locator=localhost[10101]" -e "alter region --name=testRegion --gateway-sender-id=gateway"
Member | Status | Message
------ | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------
server | ERROR  |  org.apache.geode.internal.cache.wan.GatewaySenderException: Non persistent gateway sender gateway can not be attached to persistent region /testRegion

# Second Execution Succeeds
gfsh -e "connect --locator=localhost[10101]" -e "alter region --name=testRegion --gateway-sender-id=gateway"
Member | Status | Message
------ | ------ | -------------------------
server | OK     | Region testRegion altered

gfsh -e "connect --locator=localhost[10101]" -e "stop server --name=server"
gfsh -e "start server --name=server --server-port=40404 --locators=localhost[10101]"
....The Cache Server process terminated unexpectedly with exit status 1. Please refer to the log file in /server for full details.
Exception in thread "main" org.apache.geode.internal.cache.wan.GatewaySenderException: Non persistent gateway sender gateway can not be attached to persistent region /testRegion
	at org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:454)

# The log shows that the cluster configuration receiged is invalid:
[info 2019/03/21 11:52:57.606 GMT <main> tid=0x1] Received cluster configuration from the locator
[info 2019/03/21 11:52:57.638 GMT <main> tid=0x1] 
***************************************************************
Configuration for  'cluster'

Jar files to deployed
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<cache xmlns="http://geode.apache.org/schema/cache" xmlns:jdbc="http://geode.apache.org/schema/jdbc" xmlns:lucene="http://geode.apache.org/schema/lucene" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://geode.apache.org/schema/lucene http://geode.apache.org/schema/lucene/lucene-1.0.xsd http://geode.apache.org/schema/jdbc http://geode.apache.org/schema/jdbc/jdbc-1.0.xsd http://geode.apache.org/schema/cache http://geode.apache.org/schema/cache/cache-1.0.xsd">
    <gateway-sender disk-synchronous="true" enable-batch-conflation="false" enable-persistence="false" id="gateway" manual-start="false" parallel="true" remote-distributed-system-id="2"/>
    <disk-store allow-force-compaction="false" auto-compact="true" compaction-threshold="50" disk-usage-critical-percentage="99" disk-usage-warning-percentage="90" max-oplog-size="1024" name="diskStore" queue-size="0" time-interval="1000" write-buffer-size="32768">
        <disk-dirs>
            <disk-dir dir-size="2147483647">diskStore</disk-dir>
        </disk-dirs>
    </disk-store>
    <region name="testRegion" refid="PARTITION_PERSISTENT">
        <region-attributes data-policy="persistent-partition" disk-store-name="diskStore" gateway-sender-ids="gateway"/>
    </region>
</cache>
{noformat}


The current validations executed within the {{RegionAlterFunction}} are not enough and and should also include the persistent checks (currently done in {{ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR}}) or, at least, leave the internal list of {{gateway-sender}}/{{async-event-queue}} as they were before.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)