You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Bruce Schuchardt (JIRA)" <ji...@apache.org> on 2016/02/09 23:20:18 UTC
[jira] [Updated] (GEODE-950) split brain in
wanAdminLocatorsPeerHAP2P
[ https://issues.apache.org/jira/browse/GEODE-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bruce Schuchardt updated GEODE-950:
-----------------------------------
Description:
This test starts locators simultaneously and both are configured to know about the other. In the run below two locators created their own membership views, forming a split-brain at start up time instead of forming a single distributed system.
Host name: w2-2013-lin-12
OS name: Linux
Architecture: amd64
OS version: 3.10.0-229.el7.x86_64
Java version: 1.8.0_66
Java vm name: Java HotSpot(TM) 64-Bit Server VM
Java vendor: Oracle Corporation
Java home: /export/gcm/where/jdk/1.8.0_66/x86_64.linux/jre
#####################################################
GemFire Version 9.0.0-SNAPSHOT
Source Date: 2016-02-03 16:09:18 -0800
Source Revision: 3f7070f117dbd8f2e5fb436b6aed3469e9fca673
Source Repository: develop
Build Id: bruces 020416
Build Date: 2016-02-04 16:02:44 -0800
Build Version: 9.0.0-SNAPSHOT bruces 020416 2016-02-04 16:02:44 -0800 javac 1.8.0_66
Build JDK: Java 1.8.0_66
Build Platform: Linux 2.6.32-122.el6.x86_64 amd64
#####################################################
Test was run from /export/frodo2/users/bruce/devel/gfasf/closed/gemfire-test/build/resources/test/newWan/discovery/newWanDiscovery.bt
Test:
parReg/newWan/parallel/discovery/wanAdminLocatorsPeerHAP2P.conf
locatorHostsPerSite=4
locatorThreadsPerVM=1
locatorVMsPerHost=1
maxOps=300
peerHostsPerSite=2
peerMem=256m
peerThreadsPerVM=10
peerVMsPerHost=2
redundantCopies=1
resultWaitSec=600
wanSites=3
Run with local.conf:
hydra.HostPrms-hostNames = w2-2013-lin-12 w1-gst-dev03;
//randomSeed extracted from test:
hydra.Prms-randomSeed=1454836695339;
*** Test failed with this error:
CLIENT vm_17_thr_64_peer_2_1_w1-gst-dev03_3365
INITTASK[2] newWan.WANTest.HydraTask_initPeerTask
HANG a client exceeded max result wait sec: 600
*** Last client logging by hung thread
[info 2016/02/07 01:30:48.650 PST <vm_17_thr_64_peer_2_1_w1-gst-dev03_3365> tid=0x1e] Configured disk store factory: com.gemstone.gemfire.internal.cache.DiskStoreFactoryImpl@16cf1ca8
*** Test declared hung 595996 ms after last client logging
[severe 2016/02/07 01:40:44.646 PST <vm_17_thr_68_peer_2_1_w1-gst-dev03_2152 Dynamic Client VM Stopper> tid=0x274] Result for vm_17_thr_64_peer_2_1_w1-gst-dev03_3365: INITTASK[2] newWan.WANTest.HydraTask_initPeerTask: HANG a client exceeded max result wait sec: 600
*** Hung thread
"vm_17_thr_64_peer_2_1_w1-gst-dev03_3365" #30 daemon prio=5 os_prio=0 tid=0x00007f0ca0026000 nid=0xdd3 waiting on condition [0x00007f0cafffd000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f7429b60> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at com.gemstone.gemfire.internal.cache.BucketPersistenceAdvisor.waitForPrimaryPersistentRecovery(BucketPersistenceAdvisor.java:363)
at com.gemstone.gemfire.internal.cache.ProxyBucketRegion.waitForPrimaryPersistentRecovery(ProxyBucketRegion.java:633)
at com.gemstone.gemfire.internal.cache.PRHARedundancyProvider.recoverPersistentBuckets(PRHARedundancyProvider.java:1821)
at com.gemstone.gemfire.internal.cache.PartitionedRegion.initPRInternals(PartitionedRegion.java:1073)
- locked <0x00000000f567aa10> (a com.gemstone.gemfire.internal.cache.PartitionedRegion)
at com.gemstone.gemfire.internal.cache.PartitionedRegion.initialize(PartitionedRegion.java:1193)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3171)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:3063)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:3052)
at hydra.RegionHelper.createRegion(RegionHelper.java:129)
- locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
at hydra.RegionHelper.createRegion(RegionHelper.java:93)
- locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
at hydra.RegionHelper.createRegion(RegionHelper.java:80)
- locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
at newWan.WANTest.initDatastoreRegion(WANTest.java:439)
at newWan.WANTest.HydraTask_initPeerTask(WANTest.java:797)
- locked <0x00000000f58842e8> (a java.lang.Class for newWan.WANTest)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at hydra.MethExecutor.execute(MethExecutor.java:198)
at hydra.MethExecutor.execute(MethExecutor.java:162)
at hydra.TestTask.execute(TestTask.java:195)
at hydra.RemoteTestModule$1.run(RemoteTestModule.java:216)
Stack for hung thread vm_17_thr_64_peer_2_1_w1-gst-dev03_3365 was found 3 times and was unchanging.
was:
This test starts locators simultaneously and both are configured to know about the other. In the run below two locators created their own membership views, forming a split-brain at start up time instead of forming a single distributed system.
Host name: w2-2013-lin-12
OS name: Linux
Architecture: amd64
OS version: 3.10.0-229.el7.x86_64
Java version: 1.8.0_66
Java vm name: Java HotSpot(TM) 64-Bit Server VM
Java vendor: Oracle Corporation
Java home: /export/gcm/where/jdk/1.8.0_66/x86_64.linux/jre
#####################################################
GemFire Version 9.0.0-SNAPSHOT
Source Date: 2016-02-03 16:09:18 -0800
Source Revision: 3f7070f117dbd8f2e5fb436b6aed3469e9fca673
Source Repository: develop
Build Id: bruces 020416
Build Date: 2016-02-04 16:02:44 -0800
Build Version: 9.0.0-SNAPSHOT bruces 020416 2016-02-04 16:02:44 -0800 javac 1.8.0_66
Build JDK: Java 1.8.0_66
Build Platform: Linux 2.6.32-122.el6.x86_64 amd64
#####################################################
Test was run from /export/frodo2/users/bruce/devel/gfasf/closed/gemfire-test/build/resources/test/newWan/discovery/newWanDiscovery.bt
Test:
parReg/newWan/parallel/discovery/wanAdminLocatorsPeerHAP2P.conf
locatorHostsPerSite=4
locatorThreadsPerVM=1
locatorVMsPerHost=1
maxOps=300
peerHostsPerSite=2
peerMem=256m
peerThreadsPerVM=10
peerVMsPerHost=2
redundantCopies=1
resultWaitSec=600
wanSites=3
Run with local.conf:
hydra.HostPrms-hostNames = w2-2013-lin-12 w1-gst-dev03;
//randomSeed extracted from test:
hydra.Prms-randomSeed=1454836695339;
*** Test failed with this error:
CLIENT vm_17_thr_64_peer_2_1_w1-gst-dev03_3365
INITTASK[2] newWan.WANTest.HydraTask_initPeerTask
HANG a client exceeded max result wait sec: 600
*** Last client logging by hung thread
[info 2016/02/07 01:30:48.650 PST <vm_17_thr_64_peer_2_1_w1-gst-dev03_3365> tid=0x1e] Configured disk store factory: com.gemstone.gemfire.internal.cache.DiskStoreFactoryImpl@16cf1ca8
*** Test declared hung 595996 ms after last client logging
[severe 2016/02/07 01:40:44.646 PST <vm_17_thr_68_peer_2_1_w1-gst-dev03_2152 Dynamic Client VM Stopper> tid=0x274] Result for vm_17_thr_64_peer_2_1_w1-gst-dev03_3365: INITTASK[2] newWan.WANTest.HydraTask_initPeerTask: HANG a client exceeded max result wait sec: 600
*** Hung thread
"vm_17_thr_64_peer_2_1_w1-gst-dev03_3365" #30 daemon prio=5 os_prio=0 tid=0x00007f0ca0026000 nid=0xdd3 waiting on condition [0x00007f0cafffd000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f7429b60> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at com.gemstone.gemfire.internal.cache.BucketPersistenceAdvisor.waitForPrimaryPersistentRecovery(BucketPersistenceAdvisor.java:363)
at com.gemstone.gemfire.internal.cache.ProxyBucketRegion.waitForPrimaryPersistentRecovery(ProxyBucketRegion.java:633)
at com.gemstone.gemfire.internal.cache.PRHARedundancyProvider.recoverPersistentBuckets(PRHARedundancyProvider.java:1821)
at com.gemstone.gemfire.internal.cache.PartitionedRegion.initPRInternals(PartitionedRegion.java:1073)
- locked <0x00000000f567aa10> (a com.gemstone.gemfire.internal.cache.PartitionedRegion)
at com.gemstone.gemfire.internal.cache.PartitionedRegion.initialize(PartitionedRegion.java:1193)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3171)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:3063)
at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:3052)
at hydra.RegionHelper.createRegion(RegionHelper.java:129)
- locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
at hydra.RegionHelper.createRegion(RegionHelper.java:93)
- locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
at hydra.RegionHelper.createRegion(RegionHelper.java:80)
- locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
at newWan.WANTest.initDatastoreRegion(WANTest.java:439)
at newWan.WANTest.HydraTask_initPeerTask(WANTest.java:797)
- locked <0x00000000f58842e8> (a java.lang.Class for newWan.WANTest)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at hydra.MethExecutor.execute(MethExecutor.java:198)
at hydra.MethExecutor.execute(MethExecutor.java:162)
at hydra.TestTask.execute(TestTask.java:195)
at hydra.RemoteTestModule$1.run(RemoteTestModule.java:216)
Stack for hung thread vm_17_thr_64_peer_2_1_w1-gst-dev03_3365 was found 3 times and was unchanging.
See http://hydradb.gemstone.com/hdb/testresult/920073
> split brain in wanAdminLocatorsPeerHAP2P
> ----------------------------------------
>
> Key: GEODE-950
> URL: https://issues.apache.org/jira/browse/GEODE-950
> Project: Geode
> Issue Type: Bug
> Components: membership
> Reporter: Bruce Schuchardt
>
> This test starts locators simultaneously and both are configured to know about the other. In the run below two locators created their own membership views, forming a split-brain at start up time instead of forming a single distributed system.
> Host name: w2-2013-lin-12
> OS name: Linux
> Architecture: amd64
> OS version: 3.10.0-229.el7.x86_64
> Java version: 1.8.0_66
> Java vm name: Java HotSpot(TM) 64-Bit Server VM
> Java vendor: Oracle Corporation
> Java home: /export/gcm/where/jdk/1.8.0_66/x86_64.linux/jre
> #####################################################
>
> GemFire Version 9.0.0-SNAPSHOT
> Source Date: 2016-02-03 16:09:18 -0800
> Source Revision: 3f7070f117dbd8f2e5fb436b6aed3469e9fca673
> Source Repository: develop
>
> Build Id: bruces 020416
> Build Date: 2016-02-04 16:02:44 -0800
> Build Version: 9.0.0-SNAPSHOT bruces 020416 2016-02-04 16:02:44 -0800 javac 1.8.0_66
> Build JDK: Java 1.8.0_66
> Build Platform: Linux 2.6.32-122.el6.x86_64 amd64
>
> #####################################################
> Test was run from /export/frodo2/users/bruce/devel/gfasf/closed/gemfire-test/build/resources/test/newWan/discovery/newWanDiscovery.bt
> Test:
> parReg/newWan/parallel/discovery/wanAdminLocatorsPeerHAP2P.conf
> locatorHostsPerSite=4
> locatorThreadsPerVM=1
> locatorVMsPerHost=1
> maxOps=300
> peerHostsPerSite=2
> peerMem=256m
> peerThreadsPerVM=10
> peerVMsPerHost=2
> redundantCopies=1
> resultWaitSec=600
> wanSites=3
> Run with local.conf:
> hydra.HostPrms-hostNames = w2-2013-lin-12 w1-gst-dev03;
> //randomSeed extracted from test:
> hydra.Prms-randomSeed=1454836695339;
> *** Test failed with this error:
> CLIENT vm_17_thr_64_peer_2_1_w1-gst-dev03_3365
> INITTASK[2] newWan.WANTest.HydraTask_initPeerTask
> HANG a client exceeded max result wait sec: 600
> *** Last client logging by hung thread
> [info 2016/02/07 01:30:48.650 PST <vm_17_thr_64_peer_2_1_w1-gst-dev03_3365> tid=0x1e] Configured disk store factory: com.gemstone.gemfire.internal.cache.DiskStoreFactoryImpl@16cf1ca8
> *** Test declared hung 595996 ms after last client logging
> [severe 2016/02/07 01:40:44.646 PST <vm_17_thr_68_peer_2_1_w1-gst-dev03_2152 Dynamic Client VM Stopper> tid=0x274] Result for vm_17_thr_64_peer_2_1_w1-gst-dev03_3365: INITTASK[2] newWan.WANTest.HydraTask_initPeerTask: HANG a client exceeded max result wait sec: 600
> *** Hung thread
> "vm_17_thr_64_peer_2_1_w1-gst-dev03_3365" #30 daemon prio=5 os_prio=0 tid=0x00007f0ca0026000 nid=0xdd3 waiting on condition [0x00007f0cafffd000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000f7429b60> (a java.util.concurrent.CountDownLatch$Sync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> at com.gemstone.gemfire.internal.cache.BucketPersistenceAdvisor.waitForPrimaryPersistentRecovery(BucketPersistenceAdvisor.java:363)
> at com.gemstone.gemfire.internal.cache.ProxyBucketRegion.waitForPrimaryPersistentRecovery(ProxyBucketRegion.java:633)
> at com.gemstone.gemfire.internal.cache.PRHARedundancyProvider.recoverPersistentBuckets(PRHARedundancyProvider.java:1821)
> at com.gemstone.gemfire.internal.cache.PartitionedRegion.initPRInternals(PartitionedRegion.java:1073)
> - locked <0x00000000f567aa10> (a com.gemstone.gemfire.internal.cache.PartitionedRegion)
> at com.gemstone.gemfire.internal.cache.PartitionedRegion.initialize(PartitionedRegion.java:1193)
> at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3171)
> at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:3063)
> at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:3052)
> at hydra.RegionHelper.createRegion(RegionHelper.java:129)
> - locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
> at hydra.RegionHelper.createRegion(RegionHelper.java:93)
> - locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
> at hydra.RegionHelper.createRegion(RegionHelper.java:80)
> - locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
> at newWan.WANTest.initDatastoreRegion(WANTest.java:439)
> at newWan.WANTest.HydraTask_initPeerTask(WANTest.java:797)
> - locked <0x00000000f58842e8> (a java.lang.Class for newWan.WANTest)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at hydra.MethExecutor.execute(MethExecutor.java:198)
> at hydra.MethExecutor.execute(MethExecutor.java:162)
> at hydra.TestTask.execute(TestTask.java:195)
> at hydra.RemoteTestModule$1.run(RemoteTestModule.java:216)
> Stack for hung thread vm_17_thr_64_peer_2_1_w1-gst-dev03_3365 was found 3 times and was unchanging.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)