You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Juan Ramos (Jira)" <ji...@apache.org> on 2020/06/15 11:29:00 UTC

[jira] [Updated] (GEODE-8248) Member hangs waiting for missing disk-stores after gfsh shutdown

     [ https://issues.apache.org/jira/browse/GEODE-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Juan Ramos updated GEODE-8248:
------------------------------
    Attachment: temporal.zip

> Member hangs waiting for missing disk-stores after gfsh shutdown
> ----------------------------------------------------------------
>
>                 Key: GEODE-8248
>                 URL: https://issues.apache.org/jira/browse/GEODE-8248
>             Project: Geode
>          Issue Type: Bug
>          Components: gfsh, persistence
>            Reporter: Juan Ramos
>            Priority: Major
>         Attachments: temporal.zip
>
>
> Let’s say I have 2 servers with a simple {{REPLICATE_PERSISTENT}} region and I stop both using the {{gfsh shutdown}} command.
> According to the [documentation|https://geode.apache.org/docs/guide/112/managing/disk_storage/starting_system_with_disk_stores.html], I should be able to start either of the servers without any problems as both host the most up to date data. However, what happens in reality is that the startup hangs with the following:
> {noformat}
> (1) Executing - start server --name=server1 --locators=localhost[10334] --server-port=40401 --cache-xml-file=/temporal/cache.xml
> .........
> Region /TestRegion has potentially stale data. It is waiting for another member to recover the latest data.
> My persistent id:
>   DiskStore ID: 4d1abaf3-677d-4c52-b3f8-681e051f143c
>   Name: server1
>   Location: /temporal/server1/dataStore
> Members with potentially new data:
> [
>   DiskStore ID: 163dfaf7-a680-4154-a278-8cec40d57d80
>   Name: server2
>   Location: /temporal/server2/dataStore
> ]
> "main" #1 prio=5 os_prio=31 tid=0x00007f9b28809000 nid=0x1003 in Object.wait() [0x000070000ab04000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	at org.apache.geode.internal.cache.persistence.MembershipChangeListener.waitForChange(MembershipChangeListener.java:62)
> 	- locked <0x0000000719df55e0> (a org.apache.geode.internal.cache.persistence.MembershipChangeListener)
> 	at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.waitForMembershipChangeForMissingDiskStores(PersistenceInitialImageAdvisor.java:218)
> 	at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(PersistenceInitialImageAdvisor.java:118)
> 	at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:835)
> 	at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
> 	at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1196)
> 	at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1076)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3043)
> 	at org.apache.geode.pdx.internal.PeerTypeRegistration.initialize(PeerTypeRegistration.java:198)
> 	at org.apache.geode.pdx.internal.TypeRegistry.initialize(TypeRegistry.java:116)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.initializePdxRegistry(GemFireCacheImpl.java:1449)
> 	- locked <0x00000005c0593168> (a org.apache.geode.internal.cache.GemFireCacheImpl)
> 	at org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:511)
> 	at org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1388)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1208)
> 	at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
> 	- locked <0x00000005c016a108> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
> 	- locked <0x00000005c0043de0> (a java.lang.Class for org.apache.geode.internal.cache.InternalCacheBuilder)
> 	at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
> 	- locked <0x00000005c0043de0> (a java.lang.Class for org.apache.geode.internal.cache.InternalCacheBuilder)
> 	at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
> 	at org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> 	at org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
> 	at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
> 	at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
> 	at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
> {noformat}
> We should either fix the problem and make sure the members fully synchronise their data during the {{shutdown}} process so they don't have to wait on each other or, if this is the expected behaviour, update the documentation accordingly.
> The attached {{zip}} file contains a simple script to reproduce the issue, the only thing that needs to be changed after downloading and uncompressing the file, it's the {{GEMFIRE}} environment variable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)