You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Anilkumar Gingade (Jira)" <ji...@apache.org> on 2021/01/06 15:12:00 UTC

[jira] [Commented] (GEODE-8248) Member hangs waiting for missing disk-stores after gfsh shutdown

    [ https://issues.apache.org/jira/browse/GEODE-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259786#comment-17259786 ] 

Anilkumar Gingade commented on GEODE-8248:
------------------------------------------

The product is behaving as expected; based on the action performed by Gfsh.
When shutdown is executed from gfsh; it does shutdown on each member instead of shutdown-all.
In order to have the behavior as mentioned in this issue; the gfsh has to call shutdown-all.


> Member hangs waiting for missing disk-stores after gfsh shutdown
> ----------------------------------------------------------------
>
>                 Key: GEODE-8248
>                 URL: https://issues.apache.org/jira/browse/GEODE-8248
>             Project: Geode
>          Issue Type: Bug
>          Components: gfsh, persistence
>            Reporter: Juan Ramos
>            Priority: Major
>         Attachments: temporal.zip
>
>
> Let’s say I have 2 servers with a simple {{REPLICATE_PERSISTENT}} region and I stop both using the {{gfsh shutdown}} command.
> According to the [documentation|https://geode.apache.org/docs/guide/112/managing/disk_storage/starting_system_with_disk_stores.html], I should be able to start either of the servers without any problems as both host the most up to date data. However, what happens in reality is that the startup hangs with the following:
> {noformat}
> (1) Executing - start server --name=server1 --locators=localhost[10334] --server-port=40401 --cache-xml-file=/temporal/cache.xml
> .........
> Region /TestRegion has potentially stale data. It is waiting for another member to recover the latest data.
> My persistent id:
>   DiskStore ID: 4d1abaf3-677d-4c52-b3f8-681e051f143c
>   Name: server1
>   Location: /temporal/server1/dataStore
> Members with potentially new data:
> [
>   DiskStore ID: 163dfaf7-a680-4154-a278-8cec40d57d80
>   Name: server2
>   Location: /temporal/server2/dataStore
> ]
> "main" #1 prio=5 os_prio=31 tid=0x00007f9b28809000 nid=0x1003 in Object.wait() [0x000070000ab04000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	at org.apache.geode.internal.cache.persistence.MembershipChangeListener.waitForChange(MembershipChangeListener.java:62)
> 	- locked <0x0000000719df55e0> (a org.apache.geode.internal.cache.persistence.MembershipChangeListener)
> 	at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.waitForMembershipChangeForMissingDiskStores(PersistenceInitialImageAdvisor.java:218)
> 	at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(PersistenceInitialImageAdvisor.java:118)
> 	at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:835)
> 	at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
> 	at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1196)
> 	at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1076)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3043)
> 	at org.apache.geode.pdx.internal.PeerTypeRegistration.initialize(PeerTypeRegistration.java:198)
> 	at org.apache.geode.pdx.internal.TypeRegistry.initialize(TypeRegistry.java:116)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.initializePdxRegistry(GemFireCacheImpl.java:1449)
> 	- locked <0x00000005c0593168> (a org.apache.geode.internal.cache.GemFireCacheImpl)
> 	at org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:511)
> 	at org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1388)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1208)
> 	at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> 	- locked <0x00000005c016a108> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
> 	- locked <0x00000005c0043de0> (a java.lang.Class for org.apache.geode.internal.cache.InternalCacheBuilder)
> 	at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> 	- locked <0x00000005c0043de0> (a java.lang.Class for org.apache.geode.internal.cache.InternalCacheBuilder)
> 	at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> 	at org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> 	at org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> 	at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> 	at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> 	at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
> {noformat}
> We should either fix the problem and make sure the members fully synchronise their data during the {{shutdown}} process so they don't have to wait on each other or, if this is the expected behaviour, update the documentation accordingly.
> The attached {{zip}} file contains a simple script to reproduce the issue, the only thing that needs to be changed after downloading and uncompressing the file, it's the {{GEMFIRE}} environment variable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)