You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Bruce Schuchardt (JIRA)" <ji...@apache.org> on 2016/02/09 23:20:18 UTC

[jira] [Created] (GEODE-950) split brain in wanAdminLocatorsPeerHAP2P

Bruce Schuchardt created GEODE-950:
--------------------------------------

             Summary: split brain in wanAdminLocatorsPeerHAP2P
                 Key: GEODE-950
                 URL: https://issues.apache.org/jira/browse/GEODE-950
             Project: Geode
          Issue Type: Bug
          Components: membership
            Reporter: Bruce Schuchardt


This test starts locators simultaneously and both are configured to know about the other.  In the run below two locators created their own membership views, forming a split-brain at start up time instead of forming a single distributed system.

Host name: w2-2013-lin-12
OS name: Linux
Architecture: amd64
OS version: 3.10.0-229.el7.x86_64
Java version: 1.8.0_66
Java vm name: Java HotSpot(TM) 64-Bit Server VM
Java vendor: Oracle Corporation
Java home: /export/gcm/where/jdk/1.8.0_66/x86_64.linux/jre

  #####################################################
  
  GemFire Version 9.0.0-SNAPSHOT
  Source Date: 2016-02-03 16:09:18 -0800
  Source Revision: 3f7070f117dbd8f2e5fb436b6aed3469e9fca673
  Source Repository: develop
  
  Build Id: bruces 020416
  Build Date: 2016-02-04 16:02:44 -0800
  Build Version: 9.0.0-SNAPSHOT bruces 020416 2016-02-04 16:02:44 -0800 javac 1.8.0_66
  Build JDK: Java 1.8.0_66
  Build Platform: Linux 2.6.32-122.el6.x86_64 amd64
  
  #####################################################


Test was run from /export/frodo2/users/bruce/devel/gfasf/closed/gemfire-test/build/resources/test/newWan/discovery/newWanDiscovery.bt

Test:
parReg/newWan/parallel/discovery/wanAdminLocatorsPeerHAP2P.conf
   locatorHostsPerSite=4
   locatorThreadsPerVM=1
   locatorVMsPerHost=1
   maxOps=300
   peerHostsPerSite=2
   peerMem=256m
   peerThreadsPerVM=10
   peerVMsPerHost=2
   redundantCopies=1
   resultWaitSec=600
   wanSites=3

Run with local.conf:

hydra.HostPrms-hostNames = w2-2013-lin-12 w1-gst-dev03;

//randomSeed extracted from test:
hydra.Prms-randomSeed=1454836695339;

*** Test failed with this error:
CLIENT vm_17_thr_64_peer_2_1_w1-gst-dev03_3365
INITTASK[2] newWan.WANTest.HydraTask_initPeerTask
HANG a client exceeded max result wait sec: 600

*** Last client logging by hung thread
[info 2016/02/07 01:30:48.650 PST <vm_17_thr_64_peer_2_1_w1-gst-dev03_3365> tid=0x1e] Configured disk store factory: com.gemstone.gemfire.internal.cache.DiskStoreFactoryImpl@16cf1ca8

*** Test declared hung 595996 ms after last client logging
[severe 2016/02/07 01:40:44.646 PST <vm_17_thr_68_peer_2_1_w1-gst-dev03_2152 Dynamic Client VM Stopper> tid=0x274] Result for vm_17_thr_64_peer_2_1_w1-gst-dev03_3365: INITTASK[2] newWan.WANTest.HydraTask_initPeerTask: HANG a client exceeded max result wait sec: 600

*** Hung thread
"vm_17_thr_64_peer_2_1_w1-gst-dev03_3365" #30 daemon prio=5 os_prio=0 tid=0x00007f0ca0026000 nid=0xdd3 waiting on condition [0x00007f0cafffd000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000f7429b60> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at com.gemstone.gemfire.internal.cache.BucketPersistenceAdvisor.waitForPrimaryPersistentRecovery(BucketPersistenceAdvisor.java:363)
	at com.gemstone.gemfire.internal.cache.ProxyBucketRegion.waitForPrimaryPersistentRecovery(ProxyBucketRegion.java:633)
	at com.gemstone.gemfire.internal.cache.PRHARedundancyProvider.recoverPersistentBuckets(PRHARedundancyProvider.java:1821)
	at com.gemstone.gemfire.internal.cache.PartitionedRegion.initPRInternals(PartitionedRegion.java:1073)
	- locked <0x00000000f567aa10> (a com.gemstone.gemfire.internal.cache.PartitionedRegion)
	at com.gemstone.gemfire.internal.cache.PartitionedRegion.initialize(PartitionedRegion.java:1193)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3171)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:3063)
	at com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:3052)
	at hydra.RegionHelper.createRegion(RegionHelper.java:129)
	- locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
	at hydra.RegionHelper.createRegion(RegionHelper.java:93)
	- locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
	at hydra.RegionHelper.createRegion(RegionHelper.java:80)
	- locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
	at newWan.WANTest.initDatastoreRegion(WANTest.java:439)
	at newWan.WANTest.HydraTask_initPeerTask(WANTest.java:797)
	- locked <0x00000000f58842e8> (a java.lang.Class for newWan.WANTest)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at hydra.MethExecutor.execute(MethExecutor.java:198)
	at hydra.MethExecutor.execute(MethExecutor.java:162)
	at hydra.TestTask.execute(TestTask.java:195)
	at hydra.RemoteTestModule$1.run(RemoteTestModule.java:216)

Stack for hung thread vm_17_thr_64_peer_2_1_w1-gst-dev03_3365 was found 3 times and was unchanging.

See http://hydradb.gemstone.com/hdb/testresult/920073



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)