You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Juan José Ramos Cassella (JIRA)" <ji...@apache.org> on 2019/03/21 13:11:00 UTC

[jira] [Updated] (GEODE-6551) Multiple Executions of RegionAlterFunction Leaves Partition Region Inconsistent

     [ https://issues.apache.org/jira/browse/GEODE-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Juan José Ramos Cassella updated GEODE-6551:
--------------------------------------------
    Description: 
When trying to assign a non-persistent parallel {{gateway-sender}} / {{async-event-queue}} to a persistent partitioned region through {{gfsh}}, the actual region is left inconsistent in the {{cluster configuration service}} if the internal function is executed more than once.
 The problem is that the {{gateway-sender}} / {{async-event-queue}} is added to the internal list too early within the execution lifecycle and, if the actual addition fails afterwards, the internal list is never reverted to its original state. This invalid configuration is persisted into the cluster configuration service afterwards (for the second, "successful execution"), so the subsequent restart of the servers will miserably fail.
 The following set of steps reproduces the problem for a {{gateway-sender}}, but the logic is exactly the same for an {{async-event-queue}}:
{noformat}
gfsh -e "start locator --name=locator --port=10101"
gfsh -e "start server --name=server --server-port=40404 --locators=localhost[10101]"
gfsh -e "connect --locator=localhost[10101]" -e "create disk-store --name=diskStore --dir=diskStore"
gfsh -e "connect --locator=localhost[10101]" -e "create region --name=testRegion --type=PARTITION_PERSISTENT --disk-store=diskStore"
gfsh -e "connect --locator=localhost[10101]" -e "create gateway-sender --id=gateway --parallel=true --remote-distributed-system-id=2 --enable-persistence=false"

# First Execution Fails
gfsh -e "connect --locator=localhost[10101]" -e "alter region --name=testRegion --gateway-sender-id=gateway"
Member | Status | Message
------ | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------
server | ERROR  |  org.apache.geode.internal.cache.wan.GatewaySenderException: Non persistent gateway sender gateway can not be attached to persistent region /testRegion

# Second Execution Succeeds
gfsh -e "connect --locator=localhost[10101]" -e "alter region --name=testRegion --gateway-sender-id=gateway"
Member | Status | Message
------ | ------ | -------------------------
server | OK     | Region testRegion altered

gfsh -e "connect --locator=localhost[10101]" -e "stop server --name=server"
gfsh -e "start server --name=server --server-port=40404 --locators=localhost[10101]"
....The Cache Server process terminated unexpectedly with exit status 1. Please refer to the log file in /server for full details.
Exception in thread "main" org.apache.geode.internal.cache.wan.GatewaySenderException: Non persistent gateway sender gateway can not be attached to persistent region /testRegion
	at org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:454)

# The log shows that the cluster configuration receiged is invalid:
[info 2019/03/21 11:52:57.606 GMT <main> tid=0x1] Received cluster configuration from the locator
[info 2019/03/21 11:52:57.638 GMT <main> tid=0x1] 
***************************************************************
Configuration for  'cluster'

Jar files to deployed
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<cache xmlns="http://geode.apache.org/schema/cache" xmlns:jdbc="http://geode.apache.org/schema/jdbc" xmlns:lucene="http://geode.apache.org/schema/lucene" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://geode.apache.org/schema/lucene http://geode.apache.org/schema/lucene/lucene-1.0.xsd http://geode.apache.org/schema/jdbc http://geode.apache.org/schema/jdbc/jdbc-1.0.xsd http://geode.apache.org/schema/cache http://geode.apache.org/schema/cache/cache-1.0.xsd">
    <gateway-sender disk-synchronous="true" enable-batch-conflation="false" enable-persistence="false" id="gateway" manual-start="false" parallel="true" remote-distributed-system-id="2"/>
    <disk-store allow-force-compaction="false" auto-compact="true" compaction-threshold="50" disk-usage-critical-percentage="99" disk-usage-warning-percentage="90" max-oplog-size="1024" name="diskStore" queue-size="0" time-interval="1000" write-buffer-size="32768">
        <disk-dirs>
            <disk-dir dir-size="2147483647">diskStore</disk-dir>
        </disk-dirs>
    </disk-store>
    <region name="testRegion" refid="PARTITION_PERSISTENT">
        <region-attributes data-policy="persistent-partition" disk-store-name="diskStore" gateway-sender-ids="gateway"/>
    </region>
</cache>
{noformat}
Improve the current validations invoked from within the {{RegionAlterFunction}} and added through GEODE-4919 to also include the persistent checks (currently done in {{ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR}}).

  was:
When trying to assign a non-persistent parallel {{gateway-sender}} / {{async-event-queue}} to a persistent partitioned region through {{gfsh}}, the actual region is left inconsistent in the {{cluster configuration service}} if the internal function is executed more than once.
The problem is that the {{gateway-sender}} / {{async-event-queue}} is added to the internal list too early within the execution lifecycle and, if the actual addition fails afterwards, the internal list is never reverted to its original state. This invalid configuration is persisted into the cluster configuration service afterwards (for the second, "successful execution"), so the subsequent restart of the servers will miserably fail.
The following set of steps reproduces the problem for a {{gateway-sender}}, but the logic is exactly the same for an {{async-event-queue}}:

{noformat}
gfsh -e "start locator --name=locator --port=10101"
gfsh -e "start server --name=server --server-port=40404 --locators=localhost[10101]"
gfsh -e "connect --locator=localhost[10101]" -e "create disk-store --name=diskStore --dir=diskStore"
gfsh -e "connect --locator=localhost[10101]" -e "create region --name=testRegion --type=PARTITION_PERSISTENT --disk-store=diskStore"
gfsh -e "connect --locator=localhost[10101]" -e "create gateway-sender --id=gateway --parallel=true --remote-distributed-system-id=2 --enable-persistence=false"

# First Execution Fails
gfsh -e "connect --locator=localhost[10101]" -e "alter region --name=testRegion --gateway-sender-id=gateway"
Member | Status | Message
------ | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------
server | ERROR  |  org.apache.geode.internal.cache.wan.GatewaySenderException: Non persistent gateway sender gateway can not be attached to persistent region /testRegion

# Second Execution Succeeds
gfsh -e "connect --locator=localhost[10101]" -e "alter region --name=testRegion --gateway-sender-id=gateway"
Member | Status | Message
------ | ------ | -------------------------
server | OK     | Region testRegion altered

gfsh -e "connect --locator=localhost[10101]" -e "stop server --name=server"
gfsh -e "start server --name=server --server-port=40404 --locators=localhost[10101]"
....The Cache Server process terminated unexpectedly with exit status 1. Please refer to the log file in /server for full details.
Exception in thread "main" org.apache.geode.internal.cache.wan.GatewaySenderException: Non persistent gateway sender gateway can not be attached to persistent region /testRegion
	at org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:454)

# The log shows that the cluster configuration receiged is invalid:
[info 2019/03/21 11:52:57.606 GMT <main> tid=0x1] Received cluster configuration from the locator
[info 2019/03/21 11:52:57.638 GMT <main> tid=0x1] 
***************************************************************
Configuration for  'cluster'

Jar files to deployed
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<cache xmlns="http://geode.apache.org/schema/cache" xmlns:jdbc="http://geode.apache.org/schema/jdbc" xmlns:lucene="http://geode.apache.org/schema/lucene" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://geode.apache.org/schema/lucene http://geode.apache.org/schema/lucene/lucene-1.0.xsd http://geode.apache.org/schema/jdbc http://geode.apache.org/schema/jdbc/jdbc-1.0.xsd http://geode.apache.org/schema/cache http://geode.apache.org/schema/cache/cache-1.0.xsd">
    <gateway-sender disk-synchronous="true" enable-batch-conflation="false" enable-persistence="false" id="gateway" manual-start="false" parallel="true" remote-distributed-system-id="2"/>
    <disk-store allow-force-compaction="false" auto-compact="true" compaction-threshold="50" disk-usage-critical-percentage="99" disk-usage-warning-percentage="90" max-oplog-size="1024" name="diskStore" queue-size="0" time-interval="1000" write-buffer-size="32768">
        <disk-dirs>
            <disk-dir dir-size="2147483647">diskStore</disk-dir>
        </disk-dirs>
    </disk-store>
    <region name="testRegion" refid="PARTITION_PERSISTENT">
        <region-attributes data-policy="persistent-partition" disk-store-name="diskStore" gateway-sender-ids="gateway"/>
    </region>
</cache>
{noformat}


The current validations executed within the {{RegionAlterFunction}} are not enough and and should also include the persistent checks (currently done in {{ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR}}) or, at least, leave the internal list of {{gateway-sender}}/{{async-event-queue}} as they were before.



> Multiple Executions of RegionAlterFunction Leaves Partition Region Inconsistent
> -------------------------------------------------------------------------------
>
>                 Key: GEODE-6551
>                 URL: https://issues.apache.org/jira/browse/GEODE-6551
>             Project: Geode
>          Issue Type: Bug
>          Components: configuration, gfsh, wan
>            Reporter: Juan José Ramos Cassella
>            Assignee: Juan José Ramos Cassella
>            Priority: Major
>
> When trying to assign a non-persistent parallel {{gateway-sender}} / {{async-event-queue}} to a persistent partitioned region through {{gfsh}}, the actual region is left inconsistent in the {{cluster configuration service}} if the internal function is executed more than once.
>  The problem is that the {{gateway-sender}} / {{async-event-queue}} is added to the internal list too early within the execution lifecycle and, if the actual addition fails afterwards, the internal list is never reverted to its original state. This invalid configuration is persisted into the cluster configuration service afterwards (for the second, "successful execution"), so the subsequent restart of the servers will miserably fail.
>  The following set of steps reproduces the problem for a {{gateway-sender}}, but the logic is exactly the same for an {{async-event-queue}}:
> {noformat}
> gfsh -e "start locator --name=locator --port=10101"
> gfsh -e "start server --name=server --server-port=40404 --locators=localhost[10101]"
> gfsh -e "connect --locator=localhost[10101]" -e "create disk-store --name=diskStore --dir=diskStore"
> gfsh -e "connect --locator=localhost[10101]" -e "create region --name=testRegion --type=PARTITION_PERSISTENT --disk-store=diskStore"
> gfsh -e "connect --locator=localhost[10101]" -e "create gateway-sender --id=gateway --parallel=true --remote-distributed-system-id=2 --enable-persistence=false"
> # First Execution Fails
> gfsh -e "connect --locator=localhost[10101]" -e "alter region --name=testRegion --gateway-sender-id=gateway"
> Member | Status | Message
> ------ | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------
> server | ERROR  |  org.apache.geode.internal.cache.wan.GatewaySenderException: Non persistent gateway sender gateway can not be attached to persistent region /testRegion
> # Second Execution Succeeds
> gfsh -e "connect --locator=localhost[10101]" -e "alter region --name=testRegion --gateway-sender-id=gateway"
> Member | Status | Message
> ------ | ------ | -------------------------
> server | OK     | Region testRegion altered
> gfsh -e "connect --locator=localhost[10101]" -e "stop server --name=server"
> gfsh -e "start server --name=server --server-port=40404 --locators=localhost[10101]"
> ....The Cache Server process terminated unexpectedly with exit status 1. Please refer to the log file in /server for full details.
> Exception in thread "main" org.apache.geode.internal.cache.wan.GatewaySenderException: Non persistent gateway sender gateway can not be attached to persistent region /testRegion
> 	at org.apache.geode.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:454)
> # The log shows that the cluster configuration receiged is invalid:
> [info 2019/03/21 11:52:57.606 GMT <main> tid=0x1] Received cluster configuration from the locator
> [info 2019/03/21 11:52:57.638 GMT <main> tid=0x1] 
> ***************************************************************
> Configuration for  'cluster'
> Jar files to deployed
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <cache xmlns="http://geode.apache.org/schema/cache" xmlns:jdbc="http://geode.apache.org/schema/jdbc" xmlns:lucene="http://geode.apache.org/schema/lucene" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://geode.apache.org/schema/lucene http://geode.apache.org/schema/lucene/lucene-1.0.xsd http://geode.apache.org/schema/jdbc http://geode.apache.org/schema/jdbc/jdbc-1.0.xsd http://geode.apache.org/schema/cache http://geode.apache.org/schema/cache/cache-1.0.xsd">
>     <gateway-sender disk-synchronous="true" enable-batch-conflation="false" enable-persistence="false" id="gateway" manual-start="false" parallel="true" remote-distributed-system-id="2"/>
>     <disk-store allow-force-compaction="false" auto-compact="true" compaction-threshold="50" disk-usage-critical-percentage="99" disk-usage-warning-percentage="90" max-oplog-size="1024" name="diskStore" queue-size="0" time-interval="1000" write-buffer-size="32768">
>         <disk-dirs>
>             <disk-dir dir-size="2147483647">diskStore</disk-dir>
>         </disk-dirs>
>     </disk-store>
>     <region name="testRegion" refid="PARTITION_PERSISTENT">
>         <region-attributes data-policy="persistent-partition" disk-store-name="diskStore" gateway-sender-ids="gateway"/>
>     </region>
> </cache>
> {noformat}
> Improve the current validations invoked from within the {{RegionAlterFunction}} and added through GEODE-4919 to also include the persistent checks (currently done in {{ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR}}).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)