You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by Cameron McKenzie <mc...@gmail.com> on 2016/06/01 00:08:58 UTC

Re: CURATOR-3.0 tests

Maybe we need to look at some way of providing a hook for tests to wait
reliably for asynch tasks to finish?

The latest round of tests ran OK. One test failed on an unrelated thing
(ConnectionLoss), but this appears to be a transient thing as it's worked
ok the next time around.

I will start getting a release together. Thanks for you help with the
updated tests.
cheers

On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <jordan@jordanzimmerman.com
> wrote:

> The problem is in-flight watchers and async background calls. There’s no
> way to cancel these and they can take time to occur - even after a recipe
> instance is closed.
>
> -Jordan
>
> > On May 31, 2016, at 5:11 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > Ok, running it again now.
> >
> > Is the problem that the watcher clean up for the recipes is done
> > asynchronously after they are closed?
> >
> > On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
> jordan@jordanzimmerman.com
> >> wrote:
> >
> >> OK - please try now. I added a loop in the “no watchers” checker. If
> there
> >> are remaining watchers, it will sleep a bit and try again.
> >>
> >> -Jordan
> >>
> >>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <mc...@gmail.com>
> >> wrote:
> >>>
> >>> Looks like these failures are intermittent. Running them directly in
> >>> Eclipse they seem to be passing. I will run the whole thing again in
> the
> >>> morning and see how it goes.
> >>>
> >>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com>
> >>> wrote:
> >>>
> >>>> There are still 2 tests failing for me:
> >>>>
> >>>> FAILURE! - in
> >>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
> >>>>
> >>
> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>> Time elapsed: 17.488 sec  <<< FAILURE!
> >>>> java.lang.AssertionError: One or more child watchers are still
> >> registered:
> >>>> [/test]
> >>>> at
> >>>>
> >>
> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
> >>>> at
> >>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
> >>>>
> >>>> FAILURE! - in
> >>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
> >>>>
> >>
> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
> >>>> Time elapsed: 96.641 sec  <<< FAILURE!
> >>>> java.lang.AssertionError: expected [true] but found [false]
> >>>> at org.testng.Assert.fail(Assert.java:94)
> >>>> at org.testng.Assert.failNotEquals(Assert.java:494)
> >>>> at org.testng.Assert.assertTrue(Assert.java:42)
> >>>> at org.testng.Assert.assertTrue(Assert.java:52)
> >>>> at
> >>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
> >>>>
> >>>> Failed tests:
> >>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more child
> >>>> watchers are still registered: [/test]
> >>>> Run 2: PASS
> >>>>
> >>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but
> >>>> found [false]
> >>>>
> >>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com
> >>>>> wrote:
> >>>>
> >>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against that,
> >> and
> >>>>> if it's all good will merge into CURATOR-3.0
> >>>>> cheers
> >>>>>
> >>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
> >>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>
> >>>>>> Actually - I don’t remember if branch CURATOR-332 is merged yet. I
> >>>>>> made/pushed my changes in CURATOR-332
> >>>>>>
> >>>>>> -jordan
> >>>>>>
> >>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> I'm still seeing 6 failed tests that seem related to the same stuff
> >>>>>> after
> >>>>>>> merging your fix:
> >>>>>>>
> >>>>>>> Failed tests:
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child
> >> watchers
> >>>>>>> are still registered: [/test]
> >>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child
> >> watchers
> >>>>>>> are still registered: [/test]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>> Run 1:
> >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>> Run 2:
> >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more child
> >>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more child
> >>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>
> >>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
> but
> >>>>>>> found [false]
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>> Run 1: PASS
> >>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more
> data
> >>>>>>> watchers are still registered: [/count]
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data watchers are
> >>>>>> still
> >>>>>>> registered: [/count]
> >>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data watchers are
> >>>>>> still
> >>>>>>> registered: [/count]
> >>>>>>>
> >>>>>>>
> >>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
> >>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>
> >>>>>>>> I see the problem. The fix is not simple though so I’ll spend some
> >>>>>> time on
> >>>>>>>> it. The TL;DR is that exists watchers are still supposed to get
> set
> >>>>>> when
> >>>>>>>> there is a KeeperException.NoNode and the code isn’t handling it.
> >> But,
> >>>>>>>> while I was looking at the code I realized there are some
> >> significant
> >>>>>>>> additional problems. Curator, here, is trying to mirror what
> >>>>>> ZooKeeper does
> >>>>>>>> internally which is insanely complicated. In hindsight, the whole
> ZK
> >>>>>>>> watcher mechanism should’ve been decoupled from the mutator APIs.
> >>>>>> But, of
> >>>>>>>> course, that’s easy for me to say now.
> >>>>>>>>
> >>>>>>>> -Jordan
> >>>>>>>>
> >>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
> >>>>>> mckenzie.cam@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Thanks Scott,
> >>>>>>>>> Those tests are now passing for me.
> >>>>>>>>>
> >>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently on the
> >> 3.0
> >>>>>>>>> branch. It appears that this is actually potentially a bug in the
> >>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've had a
> quick
> >>>>>> look
> >>>>>>>>> through, but I haven't dived in in any detail. It's the
> >>>>>>>>> WatcherRemovalManager stuff I think. If you've got time, can you
> >>>>>> have a
> >>>>>>>>> look? If not, let me know and I'll do some more digging.
> >>>>>>>>>
> >>>>>>>>> cheers
> >>>>>>>>>
> >>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
> >>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks Scott.
> >>>>>>>>>>
> >>>>>>>>>> Push the fix to master and merge it into 3.0.
> >>>>>>>>>>
> >>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto Nexus.
> >>>>>>>>>> cheers
> >>>>>>>>>>
> >>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
> >> dragonsinth@gmail.com
> >>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Alright, I have a fix, but it wants to be applied to both
> master
> >>>>>> and
> >>>>>>>> 3.0.
> >>>>>>>>>>> Where should I push the fix?
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
> >>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are failing
> >>>>>> there.
> >>>>>>>>>>>> cheers
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
> >>>>>> dragonsinth@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
> >>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Scott can you take a look?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
> >>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few times
> >> but
> >>>>>> no
> >>>>>>>>>>>> love:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
> >>>>>>>>>>>>>> actual 6
> >>>>>>>>>>>>>>> expected -31:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I will have a look into what's going on in the morning.
> Given
> >>>>>> that
> >>>>>>>>>>>>> these
> >>>>>>>>>>>>>>> may take some messing about to fix up, do we just want to
> >> vote
> >>>>>> on
> >>>>>>>>>>>>> 2.11.0
> >>>>>>>>>>>>>>> separately, as that is all ready to go?
> >>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Great news. Thanks.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> ====================
> >>>>>>>>>>>>>>>> Jordan Zimmerman
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
> >>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema validation
> >>>>>> stuff.
> >>>>>>>>>>> It
> >>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call. Because
> the
> >>>>>> unit
> >>>>>>>>>>>> test
> >>>>>>>>>>>>>>>> uses a
> >>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>   final String adjustedPath =
> >>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
> >>>>>>>>>>>>>>>> createMode.isSequential()));
> >>>>>>>>>>>>>>>>>>   List<ACL> aclList = acling.getAclList(adjustedPath);
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>
> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
> >>>>>>>>>>>>>>>> data,
> >>>>>>>>>>>>>>>>>> aclList);
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>   String returnPath = null;
> >>>>>>>>>>>>>>>>>>   if ( backgrounding.inBackground() )
> >>>>>>>>>>>>>>>>>>   {
> >>>>>>>>>>>>>>>>>>       pathInBackground(adjustedPath, data, givenPath);
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to force a
> >> failure
> >>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener, the
> >>>>>>>>>>> expectation is
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> this only gets called on backgrounding operations?
> >>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there, so
> maybe
> >>>>>>>>>>>> something
> >>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know if I
> get
> >>>>>>>>>>> stuck.
> >>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Let me know if you need help.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to the
> >>>>>> master
> >>>>>>>>>>>>> branch?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> -JZ
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Guys,
> >>>>>>>>>>>>>>>>>>>>> There's a test
> >> TestFrameworkBackground:testErrorListener
> >>>>>>>>>>> that
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to try
> and
> >>>>>>>>>>> provoke
> >>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>> error
> >>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
> >> CreateBuilderImpl
> >>>>>>>>>>> prior
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the exception
> that
> >>>>>> it
> >>>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>>>>> happens
> >>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it just
> throws
> >>>>>> an
> >>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is propogated up
> >> the
> >>>>>>>>>>> stack
> >>>>>>>>>>>> at
> >>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>> point the unit test fails.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just don't
> >>>>>> understand
> >>>>>>>>>>> how
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>> ever
> >>>>>>>>>>>>>>>>>>>>> worked?
> >>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
OK - I’ve found the real bug…

If the code successfully creates a ZNode on line 355, but fails to getChildren on line 363 due to a network issue, the just-created ZNode is orphaned and will cause a deadlock.

That said, I’m no longer sure I’m write about #2 below. Thoughts?

-Jordan


> On Jun 2, 2016, at 3:28 PM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
> 
> I believe there are two things going on:
> 
> 1) This test uses the infinite versions of the APIs. For some reason, either the internal lock or the semaphore code is getting stuck in wait() when there’s a network outage and never wakes up. I have some theories I’m working on.
> 
> 2) This is in the category of “How Did it Ever Work”. I’m cc’ing Ben Bangert because it was his algorithm I used for InterProcessSemaphoreV2 and I want to run this past him. In the current implementation (https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/InterProcessSemaphoreV2.java <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/InterProcessSemaphoreV2.java> 363-371), it seems to me that if there are more waiters on semaphores than there are available semaphores, it will wait infinitely. My solution is to sort the ZNode children and if the index of the acquiring client is less than the number of configured max leases, give that client the lease and be done. E.g.
> List<String> children = LockInternals.getSortedChildren(...);
> int ourIndex = children.indexOf(nodeName);
> 	...
> if ( ourIndex < maxLeases )
> {
>     break;
> }
> Thoughts?
> 
> -Jordan
> 
>> On Jun 2, 2016, at 12:04 AM, Cameron McKenzie <mckenzie.cam@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Yeah, I'm still getting failures too. I will have more of a look if I get
>> time tonight.
>> cheers
>> 
>> On Thu, Jun 2, 2016 at 3:01 PM, Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>
>>> wrote:
>> 
>>> Hmm - I’m still getting failures - maybe I’m wrong. It’s late and I’m off
>>> to bed. I’ll look at this more tomorrow.
>>> 
>>> -Jordan
>>> 
>>>> On Jun 1, 2016, at 10:59 PM, Cameron McKenzie <mckenzie.cam@gmail.com <ma...@gmail.com>>
>>> wrote:
>>>> 
>>>> The counter is just being used to check if semaphores are still being
>>>> acquired. Essentially it just runs in a loop acquiring semaphores (and
>>>> incrementing the counter when they are acquired).
>>>> 
>>>> Then it shuts down the server, waits until it the session is lost, then
>>>> restarts the server and then checks that semaphores are being acquired
>>>> correctly again (by checking that the counter is being incremented).
>>>> 
>>>> This is just a simplified version of the test that is failing.
>>>> 
>>>> When the test fails, all of the threads are attempting to get a lease on
>>>> the semaphore, but none of them get it, then the test times out while
>>>> waiting.
>>>> 
>>>> 
>>>> 
>>>> On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <
>>> jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>
>>>>> wrote:
>>>> 
>>>>> I also had to add:
>>>>> 
>>>>> while(!lost.get() && (counter.get() > 0))
>>>>> {
>>>>>   Thread.sleep(1000);
>>>>> }
>>>>> Which seems more correct to me.
>>>>> 
>>>>>> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mc...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> I have just pushed an interprocess_mutex_issue branch. The test case is
>>>>> in
>>>>>> TestInterprocessMutexNotReconnecting
>>>>>> 
>>>>>> For me it's failing around 20% of the time.
>>>>>> cheers
>>>>>> 
>>>>>> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Yep, just let me confirm that it's actually getting the same problem.
>>>>> I'm
>>>>>>> sure it was before, but I've just run it a bunch of times and
>>>>> everything's
>>>>>>> been fine.
>>>>>>> 
>>>>>>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>> 
>>>>>>>> Can you push your unit test somewhere?
>>>>>>>> 
>>>>>>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <
>>> mckenzie.cam@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2
>>>>> though.
>>>>>>>>> I've written a simplified unit test that just has a bunch of clients
>>>>>>>>> attempting to grab a lease on the semaphore. When I shutdown and
>>>>>>>> restart ZK
>>>>>>>>> about 25% of the time, none of the clients can reacquire the
>>>>> semaphore.
>>>>>>>>> 
>>>>>>>>> Still trying to work out what's going on, but I'm probably not going
>>>>> to
>>>>>>>>> have a lot of time today to look at it.
>>>>>>>>> cheers
>>>>>>>>> 
>>>>>>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Odd - SemaphoreClient does seem wrong.
>>>>>>>>>> 
>>>>>>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> It looks like under some circumstances (which I haven't worked out
>>>>>>>> yet)
>>>>>>>>>>> that the InterprocessMutex acquire() is not working correctly when
>>>>>>>>>>> reconnecting to ZK. Still digging into why this is.
>>>>>>>>>>> 
>>>>>>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
>>>>>>>> missing
>>>>>>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
>>>>>>>> throws
>>>>>>>>>> an
>>>>>>>>>>> exception if they return true. As far as I can work out, this
>>> means
>>>>>>>> that
>>>>>>>>>>> whenever the lock is acquired, an exception gets thrown indicating
>>>>>>>> that
>>>>>>>>>>> there are Multiple acquirers.
>>>>>>>>>>> 
>>>>>>>>>>> This test is failing fairly consistently. It seems to be the
>>>>> remaining
>>>>>>>>>> test
>>>>>>>>>>> that keeps failing in the Jenkins build also
>>>>>>>>>>> cheers
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Looks like I was incorrect. The NoWatcherException is being
>>> thrown
>>>>> on
>>>>>>>>>>>> success as well, and the problem is not in the cluster restart.
>>>>> Will
>>>>>>>>>> keep
>>>>>>>>>>>> digging.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>>>>>>>> (assertion
>>>>>>>>>> at
>>>>>>>>>>>>> line 294). Again, it seems like some sort of race condition with
>>>>> the
>>>>>>>>>>>>> watcher removal.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When
>>> it
>>>>>>>> fails
>>>>>>>>>>>>> it seems that it's got something to do with watcher removal.
>>> When
>>>>>>>> the
>>>>>>>>>> test
>>>>>>>>>>>>> passes, this error is not logged.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>>>>>>>>>> KeeperErrorCode
>>>>>>>>>>>>> = No such watcher for /foo/bar/lock/leases
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>>>>>>>>>> at
>>>>> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>>>>>>>>>> at
>>> org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>> at
>>>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Is it possible it's something to do with the way that the
>>> cluster
>>>>> is
>>>>>>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new
>>> one
>>>>> is
>>>>>>>>>> just
>>>>>>>>>>>>> created.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Maybe we need to look at some way of providing a hook for
>>> tests
>>>>> to
>>>>>>>>>> wait
>>>>>>>>>>>>>>> reliably for asynch tasks to finish?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The latest round of tests ran OK. One test failed on an
>>>>> unrelated
>>>>>>>>>> thing
>>>>>>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as
>>>>> it's
>>>>>>>>>>>>>> worked
>>>>>>>>>>>>>>> ok the next time around.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I will start getting a release together. Thanks for you help
>>>>> with
>>>>>>>> the
>>>>>>>>>>>>>>> updated tests.
>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The problem is in-flight watchers and async background calls.
>>>>>>>>>> There’s
>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>> way to cancel these and they can take time to occur - even
>>>>> after
>>>>>>>> a
>>>>>>>>>>>>>> recipe
>>>>>>>>>>>>>>>> instance is closed.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Ok, running it again now.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is
>>>>> done
>>>>>>>>>>>>>>>>> asynchronously after they are closed?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>>>>>>>> checker.
>>>>>>>>>> If
>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
>>>>>>>> directly
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole
>>> thing
>>>>>>>> again
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>>>>>>>> still
>>>>>>>>>>>>>>>>>> registered:
>>>>>>>>>>>>>>>>>>>> [/test]
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found
>>> [false]
>>>>>>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>>>>>>>> more
>>>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>>>>> [true]
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>>>>>>>> against
>>>>>>>>>>>>>> that,
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is
>>>>> merged
>>>>>>>>>>>>>> yet. I
>>>>>>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to
>>> the
>>>>>>>> same
>>>>>>>>>>>>>> stuff
>>>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or
>>> more
>>>>>>>> child
>>>>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or
>>> more
>>>>>>>> child
>>>>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>>>>>>>>>> 
>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered:
>>> [/test]
>>>>>>>>>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>>>>>>>>>> 
>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered:
>>> [/test]
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>>>>>>>> more
>>>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>>>>>>>> more
>>>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294
>>>>> expected
>>>>>>>>>>>>>> [true]
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256
>>> One
>>>>> or
>>>>>>>>>> more
>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so
>>> I’ll
>>>>>>>>>> spend
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still
>>>>> supposed
>>>>>>>> to
>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>>>>>>>>>> handling
>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>> But,
>>>>>>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are
>>>>> some
>>>>>>>>>>>>>>>>>> significant
>>>>>>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to
>>> mirror
>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In
>>> hindsight,
>>>>>>>> the
>>>>>>>>>>>>>> whole
>>>>>>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>>>>>>>> mutator
>>>>>>>>>>>>>> APIs.
>>>>>>>>>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>>>>>>>> consistently
>>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually
>>> potentially a
>>>>>>>> bug
>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
>>>>> I've
>>>>>>>>>> had a
>>>>>>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's
>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got
>>>>> time,
>>>>>>>>>> can
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more
>>>>> digging.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and
>>> 3.2
>>>>>>>> onto
>>>>>>>>>>>>>> Nexus.
>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>>>>>>>>>> dragonsinth@gmail.com
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied
>>> to
>>>>>>>> both
>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie
>>> <
>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
>>>>> are
>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan
>>> Zimmerman
>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie
>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've
>>> tried a
>>>>>>>> few
>>>>>>>>>>>>>> times
>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>>>>>>>>>> morning.
>>>>>>>>>>>>>>>> Given
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we
>>>>> just
>>>>>>>>>> want
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> vote
>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
>>>>> Zimmerman
>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron
>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>>>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the
>>> schema
>>>>>>>>>>>>>> validation
>>>>>>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding
>>> call.
>>>>>>>>>>>>>> Because
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an
>>>>> exception
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>>>>>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathInBackground(adjustedPath, data,
>>>>>>>>>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test
>>> to
>>>>>>>>>> force a
>>>>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the
>>>>> UnhandledErrorListener,
>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>>>>>>>> operations?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>>>>>>>> McKenzie
>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>>>>>>>> there,
>>>>>>>>>> so
>>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>>>>>>>> know
>>>>>>>>>> if
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>>>>>>>>>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you
>>> compared
>>>>>>>> it to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>>>>>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>>>>>>>> seems to
>>>>>>>>>>>>>> try
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>>>>>>>>>> exception
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So,
>>>>> it
>>>>>>>>>> just
>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>>>>>>>>>> propogated up
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I
>>> just
>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
> 


Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Sorry - I’ve been tied up. I’ll take a look tomorrow.

-Jordan

> On Jun 8, 2016, at 5:33 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> Does anyone have any thoughts on this?
> 
> Either,
> -I'm misinterpreting something, and we just need to increase the wait time
> / decrease the session timeout on the CURATOR-335 test.
> or
> -The way that the new connection state manager works is incorrect and needs
> to be fixed.
> 
> I'm leaning to the second one. My expectation would be that a LOST event
> would be generated after session timeout MS had passed with no connection
> to ZK. Not 4/3 of the session timeout MS.
> 
> 
> 
> On Tue, Jun 7, 2016 at 5:13 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> 
>> I think that the problem is that the processEvents() loop in the
>> ConnectionStateManager class checks for timeouts while in a suspended state
>> only every 2/3 of the negotiated session length.
>> 
>> I think that the test in CURATOR-335 is perhaps a bit strange in that it
>> uses the same session timeout as the amount of time it will wait for the
>> session to timeout to occur (+ a sleepForABitCall()), so there's not much
>> room for error. That's why it has exposed this issue.
>> 
>> The session timeout is set to 50 seconds.
>> 
>> The initial SUSPEND event occurs.
>> Then 50 * (2/3) seconds later the processEvents() loop wakes up again
>> after no events have occurred, so checks timeouts. It still hasn't timed
>> out, so it polls again for another 50 * (2/3) seconds. There are no
>> additional events in this time, which means that the next timeout check
>> doesn't occur until 66 seconds (100 * (2/3)) instead of after 50 seconds as
>> you would expect.
>> 
>> If I set the assertions to wait for up to 70 seconds for the appropriate
>> state to be achieved, then the test passes fine.
>> 
>> The poll call in the processEvents loop must take into account how much
>> time has already been spent in a suspended state.
>> 
>> Jordan, do you remember what the rationale behind only waiting for 2/3 of
>> the session timeout is? It's something to do with the way that ZK itself
>> handles session timeouts isn't it? Does ZK timeout the session if it hasn't
>> received a heartbeat for 2/3 of the session timeout? I can't remember.
>> cheers
>> 
>> 
>> 
>> 
>> On Tue, Jun 7, 2016 at 10:41 AM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>> 
>>> Seems like I have uncovered another problem on the 3.0 branch.
>>> 
>>> It looks like the new (ish) connection handling stuff doesn't seem to be
>>> working correctly for long session timeouts. Specifically, the test for
>>> CURATOR-335 fails on the 3.0 branch when run with the new connection
>>> handling, but works with the 'classic' connection handling.
>>> 
>>> It fails when asserting that the LOST event occurs after the server is
>>> stopped.
>>> 
>>> I'm not going to have time to do much more digging for at least today,
>>> but I have made a more targeted test case:
>>> 
>>> TestFramework:testSessionLossWithLongTimeout on
>>> the long_session_timeout_issue branch.
>>> 
>>> if anyone has time to look before I do.
>>> 
>>> I think that this needs to be resolved before 3.0 can be released.
>>> cheers
>>> 
>>> 
>>> On Mon, Jun 6, 2016 at 9:49 AM, Jordan Zimmerman <
>>> jordan@jordanzimmerman.com> wrote:
>>> 
>>>> :D
>>>> 
>>>>> Is it worth holding up the build to merge CURATOR-331?
>>>> No, let’s go with what we have.
>>>> 
>>>>> On Jun 5, 2016, at 6:48 PM, Cameron McKenzie <mc...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Ah, must still be recovering, I'm sure I saw it was being applied to
>>>> the
>>>>> 3.0 branch.
>>>>> 
>>>>> I will merge it into master and 3.0.
>>>>> 
>>>>> Is it worth holding up the build to merge CURATOR-331? I have asked
>>>> Scott
>>>>> what his opinion is since its the TreeCache stuff. It looks ok to me
>>>> though.
>>>>> cheers
>>>>> 
>>>>> On Mon, Jun 6, 2016 at 9:44 AM, Jordan Zimmerman <
>>>> jordan@jordanzimmerman.com
>>>>>> wrote:
>>>>> 
>>>>>> Yes, that’s correct. It’s a patch against master. I’ll do the merge if
>>>>>> you’re OK with it.
>>>>>> 
>>>>>> -Jordan
>>>>>> 
>>>>>>> On Jun 5, 2016, at 6:42 PM, Cameron McKenzie <mckenzie.cam@gmail.com
>>>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>> hey Jordan,
>>>>>>> The fix for CURATOR-335 looks good to me, but I'm wondering if it
>>>> should
>>>>>>> actually be applied against master and then merged into 3.0?
>>>>>>> 
>>>>>>> On Fri, Jun 3, 2016 at 12:21 PM, Jordan Zimmerman <
>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>> 
>>>>>>>> no worries - get well.
>>>>>>>> 
>>>>>>>>> On Jun 2, 2016, at 9:20 PM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks for sorting this out Jordan. I'm pretty sick today so won't
>>>> get
>>>>>>>>> around to looking at it, but I will try over the weekend or really
>>>> next
>>>>>>>> week
>>>>>>>>> On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <
>>>> jordan@jordanzimmerman.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>>> It sounds like curator is using a different algorithm since it
>>>> has
>>>>>>>>>>> nodes sorting their position to determine if they have a lease or
>>>>>> not.
>>>>>>>>>> 
>>>>>>>>>> No - I just added that as I thought there was a bug. But, now I
>>>>>> realize
>>>>>>>>>> I’m wrong. So, it was correct all along. Thanks Ben.
>>>>>>>>>> 
>>>>>>>>>> -Jordan
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
Does anyone have any thoughts on this?

Either,
-I'm misinterpreting something, and we just need to increase the wait time
/ decrease the session timeout on the CURATOR-335 test.
or
-The way that the new connection state manager works is incorrect and needs
to be fixed.

I'm leaning to the second one. My expectation would be that a LOST event
would be generated after session timeout MS had passed with no connection
to ZK. Not 4/3 of the session timeout MS.



On Tue, Jun 7, 2016 at 5:13 PM, Cameron McKenzie <mc...@gmail.com>
wrote:

> I think that the problem is that the processEvents() loop in the
> ConnectionStateManager class checks for timeouts while in a suspended state
> only every 2/3 of the negotiated session length.
>
> I think that the test in CURATOR-335 is perhaps a bit strange in that it
> uses the same session timeout as the amount of time it will wait for the
> session to timeout to occur (+ a sleepForABitCall()), so there's not much
> room for error. That's why it has exposed this issue.
>
> The session timeout is set to 50 seconds.
>
> The initial SUSPEND event occurs.
> Then 50 * (2/3) seconds later the processEvents() loop wakes up again
> after no events have occurred, so checks timeouts. It still hasn't timed
> out, so it polls again for another 50 * (2/3) seconds. There are no
> additional events in this time, which means that the next timeout check
> doesn't occur until 66 seconds (100 * (2/3)) instead of after 50 seconds as
> you would expect.
>
> If I set the assertions to wait for up to 70 seconds for the appropriate
> state to be achieved, then the test passes fine.
>
> The poll call in the processEvents loop must take into account how much
> time has already been spent in a suspended state.
>
> Jordan, do you remember what the rationale behind only waiting for 2/3 of
> the session timeout is? It's something to do with the way that ZK itself
> handles session timeouts isn't it? Does ZK timeout the session if it hasn't
> received a heartbeat for 2/3 of the session timeout? I can't remember.
> cheers
>
>
>
>
> On Tue, Jun 7, 2016 at 10:41 AM, Cameron McKenzie <mc...@gmail.com>
> wrote:
>
>> Seems like I have uncovered another problem on the 3.0 branch.
>>
>> It looks like the new (ish) connection handling stuff doesn't seem to be
>> working correctly for long session timeouts. Specifically, the test for
>> CURATOR-335 fails on the 3.0 branch when run with the new connection
>> handling, but works with the 'classic' connection handling.
>>
>> It fails when asserting that the LOST event occurs after the server is
>> stopped.
>>
>> I'm not going to have time to do much more digging for at least today,
>> but I have made a more targeted test case:
>>
>> TestFramework:testSessionLossWithLongTimeout on
>> the long_session_timeout_issue branch.
>>
>> if anyone has time to look before I do.
>>
>> I think that this needs to be resolved before 3.0 can be released.
>> cheers
>>
>>
>> On Mon, Jun 6, 2016 at 9:49 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com> wrote:
>>
>>> :D
>>>
>>> > Is it worth holding up the build to merge CURATOR-331?
>>> No, let’s go with what we have.
>>>
>>> > On Jun 5, 2016, at 6:48 PM, Cameron McKenzie <mc...@gmail.com>
>>> wrote:
>>> >
>>> > Ah, must still be recovering, I'm sure I saw it was being applied to
>>> the
>>> > 3.0 branch.
>>> >
>>> > I will merge it into master and 3.0.
>>> >
>>> > Is it worth holding up the build to merge CURATOR-331? I have asked
>>> Scott
>>> > what his opinion is since its the TreeCache stuff. It looks ok to me
>>> though.
>>> > cheers
>>> >
>>> > On Mon, Jun 6, 2016 at 9:44 AM, Jordan Zimmerman <
>>> jordan@jordanzimmerman.com
>>> >> wrote:
>>> >
>>> >> Yes, that’s correct. It’s a patch against master. I’ll do the merge if
>>> >> you’re OK with it.
>>> >>
>>> >> -Jordan
>>> >>
>>> >>> On Jun 5, 2016, at 6:42 PM, Cameron McKenzie <mckenzie.cam@gmail.com
>>> >
>>> >> wrote:
>>> >>>
>>> >>> hey Jordan,
>>> >>> The fix for CURATOR-335 looks good to me, but I'm wondering if it
>>> should
>>> >>> actually be applied against master and then merged into 3.0?
>>> >>>
>>> >>> On Fri, Jun 3, 2016 at 12:21 PM, Jordan Zimmerman <
>>> >>> jordan@jordanzimmerman.com> wrote:
>>> >>>
>>> >>>> no worries - get well.
>>> >>>>
>>> >>>>> On Jun 2, 2016, at 9:20 PM, Cameron McKenzie <
>>> mckenzie.cam@gmail.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Thanks for sorting this out Jordan. I'm pretty sick today so won't
>>> get
>>> >>>>> around to looking at it, but I will try over the weekend or really
>>> next
>>> >>>> week
>>> >>>>> On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <
>>> jordan@jordanzimmerman.com>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>>>> It sounds like curator is using a different algorithm since it
>>> has
>>> >>>>>>> nodes sorting their position to determine if they have a lease or
>>> >> not.
>>> >>>>>>
>>> >>>>>> No - I just added that as I thought there was a bug. But, now I
>>> >> realize
>>> >>>>>> I’m wrong. So, it was correct all along. Thanks Ben.
>>> >>>>>>
>>> >>>>>> -Jordan
>>> >>>>
>>> >>>>
>>> >>
>>> >>
>>>
>>>
>>
>

Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
Done, have created a PR.

On Mon, Jun 13, 2016 at 1:22 PM, Cameron McKenzie <mc...@gmail.com>
wrote:

> Yep, I will try and do it tomorrow
> On 13 Jun 2016 1:19 PM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
> wrote:
>
>> Sure - you OK doing it?
>>
>> > On Jun 12, 2016, at 10:09 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>> >
>> > It could probably just be the remaining amount of time until session
>> > timeout couldn't it? We should know at what time the last event was as
>> we
>> > use this to calculate when the list event occurs
>> > On 13 Jun 2016 12:54 PM, "Jordan Zimmerman" <jordan@jordanzimmerman.com
>> >
>> > wrote:
>> >
>> >> Yeah - that was my thinking but, really, it could be done more often.
>> >> Maybe 1/3 instead of 2/3? It wouldn’t do any harm really. Just as long
>> as
>> >> it doesn’t turn into a spin loop.
>> >>
>> >> -Jordan
>> >>
>> >>> On Jun 7, 2016, at 2:13 AM, Cameron McKenzie <mc...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Jordan, do you remember what the rationale behind only waiting for
>> 2/3 of
>> >>> the session timeout is? It's something to do with the way that ZK
>> itself
>> >>> handles session timeouts isn't it? Does ZK timeout the session if it
>> >> hasn't
>> >>> received a heartbeat for 2/3 of the session timeout? I can't remember.
>> >>> cheers
>> >>
>> >>
>>
>>

Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
Yep, I will try and do it tomorrow
On 13 Jun 2016 1:19 PM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
wrote:

> Sure - you OK doing it?
>
> > On Jun 12, 2016, at 10:09 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > It could probably just be the remaining amount of time until session
> > timeout couldn't it? We should know at what time the last event was as we
> > use this to calculate when the list event occurs
> > On 13 Jun 2016 12:54 PM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
> > wrote:
> >
> >> Yeah - that was my thinking but, really, it could be done more often.
> >> Maybe 1/3 instead of 2/3? It wouldn’t do any harm really. Just as long
> as
> >> it doesn’t turn into a spin loop.
> >>
> >> -Jordan
> >>
> >>> On Jun 7, 2016, at 2:13 AM, Cameron McKenzie <mc...@gmail.com>
> >> wrote:
> >>>
> >>> Jordan, do you remember what the rationale behind only waiting for 2/3
> of
> >>> the session timeout is? It's something to do with the way that ZK
> itself
> >>> handles session timeouts isn't it? Does ZK timeout the session if it
> >> hasn't
> >>> received a heartbeat for 2/3 of the session timeout? I can't remember.
> >>> cheers
> >>
> >>
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Sure - you OK doing it?

> On Jun 12, 2016, at 10:09 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> It could probably just be the remaining amount of time until session
> timeout couldn't it? We should know at what time the last event was as we
> use this to calculate when the list event occurs
> On 13 Jun 2016 12:54 PM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
> wrote:
> 
>> Yeah - that was my thinking but, really, it could be done more often.
>> Maybe 1/3 instead of 2/3? It wouldn’t do any harm really. Just as long as
>> it doesn’t turn into a spin loop.
>> 
>> -Jordan
>> 
>>> On Jun 7, 2016, at 2:13 AM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>>> 
>>> Jordan, do you remember what the rationale behind only waiting for 2/3 of
>>> the session timeout is? It's something to do with the way that ZK itself
>>> handles session timeouts isn't it? Does ZK timeout the session if it
>> hasn't
>>> received a heartbeat for 2/3 of the session timeout? I can't remember.
>>> cheers
>> 
>> 


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
It could probably just be the remaining amount of time until session
timeout couldn't it? We should know at what time the last event was as we
use this to calculate when the list event occurs
On 13 Jun 2016 12:54 PM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
wrote:

> Yeah - that was my thinking but, really, it could be done more often.
> Maybe 1/3 instead of 2/3? It wouldn’t do any harm really. Just as long as
> it doesn’t turn into a spin loop.
>
> -Jordan
>
> > On Jun 7, 2016, at 2:13 AM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > Jordan, do you remember what the rationale behind only waiting for 2/3 of
> > the session timeout is? It's something to do with the way that ZK itself
> > handles session timeouts isn't it? Does ZK timeout the session if it
> hasn't
> > received a heartbeat for 2/3 of the session timeout? I can't remember.
> > cheers
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Yeah - that was my thinking but, really, it could be done more often. Maybe 1/3 instead of 2/3? It wouldn’t do any harm really. Just as long as it doesn’t turn into a spin loop.

-Jordan

> On Jun 7, 2016, at 2:13 AM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> Jordan, do you remember what the rationale behind only waiting for 2/3 of
> the session timeout is? It's something to do with the way that ZK itself
> handles session timeouts isn't it? Does ZK timeout the session if it hasn't
> received a heartbeat for 2/3 of the session timeout? I can't remember.
> cheers


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
I think that the problem is that the processEvents() loop in the
ConnectionStateManager class checks for timeouts while in a suspended state
only every 2/3 of the negotiated session length.

I think that the test in CURATOR-335 is perhaps a bit strange in that it
uses the same session timeout as the amount of time it will wait for the
session to timeout to occur (+ a sleepForABitCall()), so there's not much
room for error. That's why it has exposed this issue.

The session timeout is set to 50 seconds.

The initial SUSPEND event occurs.
Then 50 * (2/3) seconds later the processEvents() loop wakes up again after
no events have occurred, so checks timeouts. It still hasn't timed out, so
it polls again for another 50 * (2/3) seconds. There are no additional
events in this time, which means that the next timeout check doesn't occur
until 66 seconds (100 * (2/3)) instead of after 50 seconds as you would
expect.

If I set the assertions to wait for up to 70 seconds for the appropriate
state to be achieved, then the test passes fine.

The poll call in the processEvents loop must take into account how much
time has already been spent in a suspended state.

Jordan, do you remember what the rationale behind only waiting for 2/3 of
the session timeout is? It's something to do with the way that ZK itself
handles session timeouts isn't it? Does ZK timeout the session if it hasn't
received a heartbeat for 2/3 of the session timeout? I can't remember.
cheers




On Tue, Jun 7, 2016 at 10:41 AM, Cameron McKenzie <mc...@gmail.com>
wrote:

> Seems like I have uncovered another problem on the 3.0 branch.
>
> It looks like the new (ish) connection handling stuff doesn't seem to be
> working correctly for long session timeouts. Specifically, the test for
> CURATOR-335 fails on the 3.0 branch when run with the new connection
> handling, but works with the 'classic' connection handling.
>
> It fails when asserting that the LOST event occurs after the server is
> stopped.
>
> I'm not going to have time to do much more digging for at least today, but
> I have made a more targeted test case:
>
> TestFramework:testSessionLossWithLongTimeout on
> the long_session_timeout_issue branch.
>
> if anyone has time to look before I do.
>
> I think that this needs to be resolved before 3.0 can be released.
> cheers
>
>
> On Mon, Jun 6, 2016 at 9:49 AM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
>> :D
>>
>> > Is it worth holding up the build to merge CURATOR-331?
>> No, let’s go with what we have.
>>
>> > On Jun 5, 2016, at 6:48 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>> >
>> > Ah, must still be recovering, I'm sure I saw it was being applied to the
>> > 3.0 branch.
>> >
>> > I will merge it into master and 3.0.
>> >
>> > Is it worth holding up the build to merge CURATOR-331? I have asked
>> Scott
>> > what his opinion is since its the TreeCache stuff. It looks ok to me
>> though.
>> > cheers
>> >
>> > On Mon, Jun 6, 2016 at 9:44 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com
>> >> wrote:
>> >
>> >> Yes, that’s correct. It’s a patch against master. I’ll do the merge if
>> >> you’re OK with it.
>> >>
>> >> -Jordan
>> >>
>> >>> On Jun 5, 2016, at 6:42 PM, Cameron McKenzie <mc...@gmail.com>
>> >> wrote:
>> >>>
>> >>> hey Jordan,
>> >>> The fix for CURATOR-335 looks good to me, but I'm wondering if it
>> should
>> >>> actually be applied against master and then merged into 3.0?
>> >>>
>> >>> On Fri, Jun 3, 2016 at 12:21 PM, Jordan Zimmerman <
>> >>> jordan@jordanzimmerman.com> wrote:
>> >>>
>> >>>> no worries - get well.
>> >>>>
>> >>>>> On Jun 2, 2016, at 9:20 PM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Thanks for sorting this out Jordan. I'm pretty sick today so won't
>> get
>> >>>>> around to looking at it, but I will try over the weekend or really
>> next
>> >>>> week
>> >>>>> On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <
>> jordan@jordanzimmerman.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>>> It sounds like curator is using a different algorithm since it has
>> >>>>>>> nodes sorting their position to determine if they have a lease or
>> >> not.
>> >>>>>>
>> >>>>>> No - I just added that as I thought there was a bug. But, now I
>> >> realize
>> >>>>>> I’m wrong. So, it was correct all along. Thanks Ben.
>> >>>>>>
>> >>>>>> -Jordan
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
That was my assessment too, but the amount of time to get a LOST event
seems incorrect to me. It takes 4/3 session timeout seconds, where I think
it should take session timeout seconds
On 13 Jun 2016 12:02 PM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
wrote:

> Sorry it’s taken me so long to get to this…
>
> I don’t see what the problem is with testSessionLossWithLongTimeout(). The
> session timeout is being set to timing.forWaiting().milliseconds(). The
> test on line 116 only waits timing.forWaiting().milliseconds() for timeout
> and will almost always fail. If I change this line to:
>
>         timing.multiple(2).forWaiting().milliseconds()
>
> The test succeeds. This seems correct to me.
>
> -Jordan
>
> > On Jun 6, 2016, at 7:41 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > Seems like I have uncovered another problem on the 3.0 branch.
> >
> > It looks like the new (ish) connection handling stuff doesn't seem to be
> > working correctly for long session timeouts. Specifically, the test for
> > CURATOR-335 fails on the 3.0 branch when run with the new connection
> > handling, but works with the 'classic' connection handling.
> >
> > It fails when asserting that the LOST event occurs after the server is
> > stopped.
> >
> > I'm not going to have time to do much more digging for at least today,
> but
> > I have made a more targeted test case:
> >
> > TestFramework:testSessionLossWithLongTimeout on
> > the long_session_timeout_issue branch.
> >
> > if anyone has time to look before I do.
> >
> > I think that this needs to be resolved before 3.0 can be released.
> > cheers
> >
> >
> > On Mon, Jun 6, 2016 at 9:49 AM, Jordan Zimmerman <
> jordan@jordanzimmerman.com
> >> wrote:
> >
> >> :D
> >>
> >>> Is it worth holding up the build to merge CURATOR-331?
> >> No, let’s go with what we have.
> >>
> >>> On Jun 5, 2016, at 6:48 PM, Cameron McKenzie <mc...@gmail.com>
> >> wrote:
> >>>
> >>> Ah, must still be recovering, I'm sure I saw it was being applied to
> the
> >>> 3.0 branch.
> >>>
> >>> I will merge it into master and 3.0.
> >>>
> >>> Is it worth holding up the build to merge CURATOR-331? I have asked
> Scott
> >>> what his opinion is since its the TreeCache stuff. It looks ok to me
> >> though.
> >>> cheers
> >>>
> >>> On Mon, Jun 6, 2016 at 9:44 AM, Jordan Zimmerman <
> >> jordan@jordanzimmerman.com
> >>>> wrote:
> >>>
> >>>> Yes, that’s correct. It’s a patch against master. I’ll do the merge if
> >>>> you’re OK with it.
> >>>>
> >>>> -Jordan
> >>>>
> >>>>> On Jun 5, 2016, at 6:42 PM, Cameron McKenzie <mckenzie.cam@gmail.com
> >
> >>>> wrote:
> >>>>>
> >>>>> hey Jordan,
> >>>>> The fix for CURATOR-335 looks good to me, but I'm wondering if it
> >> should
> >>>>> actually be applied against master and then merged into 3.0?
> >>>>>
> >>>>> On Fri, Jun 3, 2016 at 12:21 PM, Jordan Zimmerman <
> >>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>
> >>>>>> no worries - get well.
> >>>>>>
> >>>>>>> On Jun 2, 2016, at 9:20 PM, Cameron McKenzie <
> mckenzie.cam@gmail.com
> >>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Thanks for sorting this out Jordan. I'm pretty sick today so won't
> >> get
> >>>>>>> around to looking at it, but I will try over the weekend or really
> >> next
> >>>>>> week
> >>>>>>> On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <
> >> jordan@jordanzimmerman.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>>> It sounds like curator is using a different algorithm since it
> has
> >>>>>>>>> nodes sorting their position to determine if they have a lease or
> >>>> not.
> >>>>>>>>
> >>>>>>>> No - I just added that as I thought there was a bug. But, now I
> >>>> realize
> >>>>>>>> I’m wrong. So, it was correct all along. Thanks Ben.
> >>>>>>>>
> >>>>>>>> -Jordan
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Sorry it’s taken me so long to get to this…

I don’t see what the problem is with testSessionLossWithLongTimeout(). The session timeout is being set to timing.forWaiting().milliseconds(). The test on line 116 only waits timing.forWaiting().milliseconds() for timeout and will almost always fail. If I change this line to:

	timing.multiple(2).forWaiting().milliseconds()

The test succeeds. This seems correct to me.

-Jordan

> On Jun 6, 2016, at 7:41 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> Seems like I have uncovered another problem on the 3.0 branch.
> 
> It looks like the new (ish) connection handling stuff doesn't seem to be
> working correctly for long session timeouts. Specifically, the test for
> CURATOR-335 fails on the 3.0 branch when run with the new connection
> handling, but works with the 'classic' connection handling.
> 
> It fails when asserting that the LOST event occurs after the server is
> stopped.
> 
> I'm not going to have time to do much more digging for at least today, but
> I have made a more targeted test case:
> 
> TestFramework:testSessionLossWithLongTimeout on
> the long_session_timeout_issue branch.
> 
> if anyone has time to look before I do.
> 
> I think that this needs to be resolved before 3.0 can be released.
> cheers
> 
> 
> On Mon, Jun 6, 2016 at 9:49 AM, Jordan Zimmerman <jordan@jordanzimmerman.com
>> wrote:
> 
>> :D
>> 
>>> Is it worth holding up the build to merge CURATOR-331?
>> No, let’s go with what we have.
>> 
>>> On Jun 5, 2016, at 6:48 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>>> 
>>> Ah, must still be recovering, I'm sure I saw it was being applied to the
>>> 3.0 branch.
>>> 
>>> I will merge it into master and 3.0.
>>> 
>>> Is it worth holding up the build to merge CURATOR-331? I have asked Scott
>>> what his opinion is since its the TreeCache stuff. It looks ok to me
>> though.
>>> cheers
>>> 
>>> On Mon, Jun 6, 2016 at 9:44 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com
>>>> wrote:
>>> 
>>>> Yes, that’s correct. It’s a patch against master. I’ll do the merge if
>>>> you’re OK with it.
>>>> 
>>>> -Jordan
>>>> 
>>>>> On Jun 5, 2016, at 6:42 PM, Cameron McKenzie <mc...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> hey Jordan,
>>>>> The fix for CURATOR-335 looks good to me, but I'm wondering if it
>> should
>>>>> actually be applied against master and then merged into 3.0?
>>>>> 
>>>>> On Fri, Jun 3, 2016 at 12:21 PM, Jordan Zimmerman <
>>>>> jordan@jordanzimmerman.com> wrote:
>>>>> 
>>>>>> no worries - get well.
>>>>>> 
>>>>>>> On Jun 2, 2016, at 9:20 PM, Cameron McKenzie <mckenzie.cam@gmail.com
>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>> Thanks for sorting this out Jordan. I'm pretty sick today so won't
>> get
>>>>>>> around to looking at it, but I will try over the weekend or really
>> next
>>>>>> week
>>>>>>> On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <
>> jordan@jordanzimmerman.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>>> It sounds like curator is using a different algorithm since it has
>>>>>>>>> nodes sorting their position to determine if they have a lease or
>>>> not.
>>>>>>>> 
>>>>>>>> No - I just added that as I thought there was a bug. But, now I
>>>> realize
>>>>>>>> I’m wrong. So, it was correct all along. Thanks Ben.
>>>>>>>> 
>>>>>>>> -Jordan
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
Seems like I have uncovered another problem on the 3.0 branch.

It looks like the new (ish) connection handling stuff doesn't seem to be
working correctly for long session timeouts. Specifically, the test for
CURATOR-335 fails on the 3.0 branch when run with the new connection
handling, but works with the 'classic' connection handling.

It fails when asserting that the LOST event occurs after the server is
stopped.

I'm not going to have time to do much more digging for at least today, but
I have made a more targeted test case:

TestFramework:testSessionLossWithLongTimeout on
the long_session_timeout_issue branch.

if anyone has time to look before I do.

I think that this needs to be resolved before 3.0 can be released.
cheers


On Mon, Jun 6, 2016 at 9:49 AM, Jordan Zimmerman <jordan@jordanzimmerman.com
> wrote:

> :D
>
> > Is it worth holding up the build to merge CURATOR-331?
> No, let’s go with what we have.
>
> > On Jun 5, 2016, at 6:48 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > Ah, must still be recovering, I'm sure I saw it was being applied to the
> > 3.0 branch.
> >
> > I will merge it into master and 3.0.
> >
> > Is it worth holding up the build to merge CURATOR-331? I have asked Scott
> > what his opinion is since its the TreeCache stuff. It looks ok to me
> though.
> > cheers
> >
> > On Mon, Jun 6, 2016 at 9:44 AM, Jordan Zimmerman <
> jordan@jordanzimmerman.com
> >> wrote:
> >
> >> Yes, that’s correct. It’s a patch against master. I’ll do the merge if
> >> you’re OK with it.
> >>
> >> -Jordan
> >>
> >>> On Jun 5, 2016, at 6:42 PM, Cameron McKenzie <mc...@gmail.com>
> >> wrote:
> >>>
> >>> hey Jordan,
> >>> The fix for CURATOR-335 looks good to me, but I'm wondering if it
> should
> >>> actually be applied against master and then merged into 3.0?
> >>>
> >>> On Fri, Jun 3, 2016 at 12:21 PM, Jordan Zimmerman <
> >>> jordan@jordanzimmerman.com> wrote:
> >>>
> >>>> no worries - get well.
> >>>>
> >>>>> On Jun 2, 2016, at 9:20 PM, Cameron McKenzie <mckenzie.cam@gmail.com
> >
> >>>> wrote:
> >>>>>
> >>>>> Thanks for sorting this out Jordan. I'm pretty sick today so won't
> get
> >>>>> around to looking at it, but I will try over the weekend or really
> next
> >>>> week
> >>>>> On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <
> jordan@jordanzimmerman.com>
> >>>>> wrote:
> >>>>>
> >>>>>>> It sounds like curator is using a different algorithm since it has
> >>>>>>> nodes sorting their position to determine if they have a lease or
> >> not.
> >>>>>>
> >>>>>> No - I just added that as I thought there was a bug. But, now I
> >> realize
> >>>>>> I’m wrong. So, it was correct all along. Thanks Ben.
> >>>>>>
> >>>>>> -Jordan
> >>>>
> >>>>
> >>
> >>
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
:D

> Is it worth holding up the build to merge CURATOR-331?
No, let’s go with what we have.

> On Jun 5, 2016, at 6:48 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> Ah, must still be recovering, I'm sure I saw it was being applied to the
> 3.0 branch.
> 
> I will merge it into master and 3.0.
> 
> Is it worth holding up the build to merge CURATOR-331? I have asked Scott
> what his opinion is since its the TreeCache stuff. It looks ok to me though.
> cheers
> 
> On Mon, Jun 6, 2016 at 9:44 AM, Jordan Zimmerman <jordan@jordanzimmerman.com
>> wrote:
> 
>> Yes, that’s correct. It’s a patch against master. I’ll do the merge if
>> you’re OK with it.
>> 
>> -Jordan
>> 
>>> On Jun 5, 2016, at 6:42 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>>> 
>>> hey Jordan,
>>> The fix for CURATOR-335 looks good to me, but I'm wondering if it should
>>> actually be applied against master and then merged into 3.0?
>>> 
>>> On Fri, Jun 3, 2016 at 12:21 PM, Jordan Zimmerman <
>>> jordan@jordanzimmerman.com> wrote:
>>> 
>>>> no worries - get well.
>>>> 
>>>>> On Jun 2, 2016, at 9:20 PM, Cameron McKenzie <mc...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Thanks for sorting this out Jordan. I'm pretty sick today so won't get
>>>>> around to looking at it, but I will try over the weekend or really next
>>>> week
>>>>> On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
>>>>> wrote:
>>>>> 
>>>>>>> It sounds like curator is using a different algorithm since it has
>>>>>>> nodes sorting their position to determine if they have a lease or
>> not.
>>>>>> 
>>>>>> No - I just added that as I thought there was a bug. But, now I
>> realize
>>>>>> I’m wrong. So, it was correct all along. Thanks Ben.
>>>>>> 
>>>>>> -Jordan
>>>> 
>>>> 
>> 
>> 


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
Ah, must still be recovering, I'm sure I saw it was being applied to the
3.0 branch.

I will merge it into master and 3.0.

Is it worth holding up the build to merge CURATOR-331? I have asked Scott
what his opinion is since its the TreeCache stuff. It looks ok to me though.
cheers

On Mon, Jun 6, 2016 at 9:44 AM, Jordan Zimmerman <jordan@jordanzimmerman.com
> wrote:

> Yes, that’s correct. It’s a patch against master. I’ll do the merge if
> you’re OK with it.
>
> -Jordan
>
> > On Jun 5, 2016, at 6:42 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > hey Jordan,
> > The fix for CURATOR-335 looks good to me, but I'm wondering if it should
> > actually be applied against master and then merged into 3.0?
> >
> > On Fri, Jun 3, 2016 at 12:21 PM, Jordan Zimmerman <
> > jordan@jordanzimmerman.com> wrote:
> >
> >> no worries - get well.
> >>
> >>> On Jun 2, 2016, at 9:20 PM, Cameron McKenzie <mc...@gmail.com>
> >> wrote:
> >>>
> >>> Thanks for sorting this out Jordan. I'm pretty sick today so won't get
> >>> around to looking at it, but I will try over the weekend or really next
> >> week
> >>> On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
> >>> wrote:
> >>>
> >>>>> It sounds like curator is using a different algorithm since it has
> >>>>> nodes sorting their position to determine if they have a lease or
> not.
> >>>>
> >>>> No - I just added that as I thought there was a bug. But, now I
> realize
> >>>> I’m wrong. So, it was correct all along. Thanks Ben.
> >>>>
> >>>> -Jordan
> >>
> >>
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Yes, that’s correct. It’s a patch against master. I’ll do the merge if you’re OK with it.

-Jordan

> On Jun 5, 2016, at 6:42 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> hey Jordan,
> The fix for CURATOR-335 looks good to me, but I'm wondering if it should
> actually be applied against master and then merged into 3.0?
> 
> On Fri, Jun 3, 2016 at 12:21 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
> 
>> no worries - get well.
>> 
>>> On Jun 2, 2016, at 9:20 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>>> 
>>> Thanks for sorting this out Jordan. I'm pretty sick today so won't get
>>> around to looking at it, but I will try over the weekend or really next
>> week
>>> On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
>>> wrote:
>>> 
>>>>> It sounds like curator is using a different algorithm since it has
>>>>> nodes sorting their position to determine if they have a lease or not.
>>>> 
>>>> No - I just added that as I thought there was a bug. But, now I realize
>>>> I’m wrong. So, it was correct all along. Thanks Ben.
>>>> 
>>>> -Jordan
>> 
>> 


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
hey Jordan,
The fix for CURATOR-335 looks good to me, but I'm wondering if it should
actually be applied against master and then merged into 3.0?

On Fri, Jun 3, 2016 at 12:21 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> no worries - get well.
>
> > On Jun 2, 2016, at 9:20 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > Thanks for sorting this out Jordan. I'm pretty sick today so won't get
> > around to looking at it, but I will try over the weekend or really next
> week
> > On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
> > wrote:
> >
> >>> It sounds like curator is using a different algorithm since it has
> >>> nodes sorting their position to determine if they have a lease or not.
> >>
> >> No - I just added that as I thought there was a bug. But, now I realize
> >> I’m wrong. So, it was correct all along. Thanks Ben.
> >>
> >> -Jordan
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
no worries - get well.

> On Jun 2, 2016, at 9:20 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> Thanks for sorting this out Jordan. I'm pretty sick today so won't get
> around to looking at it, but I will try over the weekend or really next week
> On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
> wrote:
> 
>>> It sounds like curator is using a different algorithm since it has
>>> nodes sorting their position to determine if they have a lease or not.
>> 
>> No - I just added that as I thought there was a bug. But, now I realize
>> I’m wrong. So, it was correct all along. Thanks Ben.
>> 
>> -Jordan


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
Thanks for sorting this out Jordan. I'm pretty sick today so won't get
around to looking at it, but I will try over the weekend or really next week
On 3 Jun 2016 7:05 AM, "Jordan Zimmerman" <jo...@jordanzimmerman.com>
wrote:

> > It sounds like curator is using a different algorithm since it has
> > nodes sorting their position to determine if they have a lease or not.
>
> No - I just added that as I thought there was a bug. But, now I realize
> I’m wrong. So, it was correct all along. Thanks Ben.
>
> -Jordan

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
> It sounds like curator is using a different algorithm since it has
> nodes sorting their position to determine if they have a lease or not.

No - I just added that as I thought there was a bug. But, now I realize I’m wrong. So, it was correct all along. Thanks Ben.

-Jordan

Re: CURATOR-3.0 tests

Posted by Ben Bangert <be...@groovie.org>.
On Thu, Jun 2, 2016 at 1:28 PM, Jordan Zimmerman
<jo...@jordanzimmerman.com> wrote:
> I believe there are two things going on:
>
> 1) This test uses the infinite versions of the APIs. For some reason, either
> the internal lock or the semaphore code is getting stuck in wait() when
> there’s a network outage and never wakes up. I have some theories I’m
> working on.
>
> 2) This is in the category of “How Did it Ever Work”. I’m cc’ing Ben Bangert
> because it was his algorithm I used for InterProcessSemaphoreV2 and I want
> to run this past him. In the current implementation
> (https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/InterProcessSemaphoreV2.java
> 363-371), it seems to me that if there are more waiters on semaphores than
> there are available semaphores, it will wait infinitely. My solution is to
> sort the ZNode children and if the index of the acquiring client is less
> than the number of configured max leases, give that client the lease and be
> done. E.g.

I'm not sure how the Curator version works, I can only go over how the
Python Kazoo client works, and it's been awhile so I had to refresh my
memory from the code.

In Kazoo, there's a lock node for a given semaphore, and a lease pool
node, which has a child ephemeral node per lease holder. The only
client allowed to add its ephemeral node to the lease pool node is the
lock holder. Clients that already acquired a lease may delete their
node at anytime to release their lease.

The lock works per the standard lock recipe, so all lock waiters are
in line, and will wake per the standard lock recipe for lease
acquisition fairness.

The client that acquires the lock gets to create a lease node, unless
there's currently as many lease child nodes as the lease pool node
indicates are allowed to have a lease. In which case, it sets a watch
on the lease pool node to wait for a lease child to go away (this was
a crucial difference from curator which had nodes watching specific
lease holding nodes in a sorted line of some sort resulting in
possible lease starvation afaik).

There should be no indefinite waiting since as soon as a lease node is
deleted, the lock holder wakes and gets to create its node (and in my
tests does so).

It sounds like curator is using a different algorithm since it has
nodes sorting their position to determine if they have a lease or not.

Cheers,
Ben

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
I believe there are two things going on:

1) This test uses the infinite versions of the APIs. For some reason, either the internal lock or the semaphore code is getting stuck in wait() when there’s a network outage and never wakes up. I have some theories I’m working on.

2) This is in the category of “How Did it Ever Work”. I’m cc’ing Ben Bangert because it was his algorithm I used for InterProcessSemaphoreV2 and I want to run this past him. In the current implementation (https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/InterProcessSemaphoreV2.java <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/InterProcessSemaphoreV2.java> 363-371), it seems to me that if there are more waiters on semaphores than there are available semaphores, it will wait infinitely. My solution is to sort the ZNode children and if the index of the acquiring client is less than the number of configured max leases, give that client the lease and be done. E.g.
List<String> children = LockInternals.getSortedChildren(...);
int ourIndex = children.indexOf(nodeName);
	...
if ( ourIndex < maxLeases )
{
    break;
}
Thoughts?

-Jordan

> On Jun 2, 2016, at 12:04 AM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> Yeah, I'm still getting failures too. I will have more of a look if I get
> time tonight.
> cheers
> 
> On Thu, Jun 2, 2016 at 3:01 PM, Jordan Zimmerman <jordan@jordanzimmerman.com
>> wrote:
> 
>> Hmm - I’m still getting failures - maybe I’m wrong. It’s late and I’m off
>> to bed. I’ll look at this more tomorrow.
>> 
>> -Jordan
>> 
>>> On Jun 1, 2016, at 10:59 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>>> 
>>> The counter is just being used to check if semaphores are still being
>>> acquired. Essentially it just runs in a loop acquiring semaphores (and
>>> incrementing the counter when they are acquired).
>>> 
>>> Then it shuts down the server, waits until it the session is lost, then
>>> restarts the server and then checks that semaphores are being acquired
>>> correctly again (by checking that the counter is being incremented).
>>> 
>>> This is just a simplified version of the test that is failing.
>>> 
>>> When the test fails, all of the threads are attempting to get a lease on
>>> the semaphore, but none of them get it, then the test times out while
>>> waiting.
>>> 
>>> 
>>> 
>>> On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com
>>>> wrote:
>>> 
>>>> I also had to add:
>>>> 
>>>> while(!lost.get() && (counter.get() > 0))
>>>> {
>>>>   Thread.sleep(1000);
>>>> }
>>>> Which seems more correct to me.
>>>> 
>>>>> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mc...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> I have just pushed an interprocess_mutex_issue branch. The test case is
>>>> in
>>>>> TestInterprocessMutexNotReconnecting
>>>>> 
>>>>> For me it's failing around 20% of the time.
>>>>> cheers
>>>>> 
>>>>> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Yep, just let me confirm that it's actually getting the same problem.
>>>> I'm
>>>>>> sure it was before, but I've just run it a bunch of times and
>>>> everything's
>>>>>> been fine.
>>>>>> 
>>>>>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>> 
>>>>>>> Can you push your unit test somewhere?
>>>>>>> 
>>>>>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2
>>>> though.
>>>>>>>> I've written a simplified unit test that just has a bunch of clients
>>>>>>>> attempting to grab a lease on the semaphore. When I shutdown and
>>>>>>> restart ZK
>>>>>>>> about 25% of the time, none of the clients can reacquire the
>>>> semaphore.
>>>>>>>> 
>>>>>>>> Still trying to work out what's going on, but I'm probably not going
>>>> to
>>>>>>>> have a lot of time today to look at it.
>>>>>>>> cheers
>>>>>>>> 
>>>>>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>> 
>>>>>>>>> Odd - SemaphoreClient does seem wrong.
>>>>>>>>> 
>>>>>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> It looks like under some circumstances (which I haven't worked out
>>>>>>> yet)
>>>>>>>>>> that the InterprocessMutex acquire() is not working correctly when
>>>>>>>>>> reconnecting to ZK. Still digging into why this is.
>>>>>>>>>> 
>>>>>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
>>>>>>> missing
>>>>>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
>>>>>>> throws
>>>>>>>>> an
>>>>>>>>>> exception if they return true. As far as I can work out, this
>> means
>>>>>>> that
>>>>>>>>>> whenever the lock is acquired, an exception gets thrown indicating
>>>>>>> that
>>>>>>>>>> there are Multiple acquirers.
>>>>>>>>>> 
>>>>>>>>>> This test is failing fairly consistently. It seems to be the
>>>> remaining
>>>>>>>>> test
>>>>>>>>>> that keeps failing in the Jenkins build also
>>>>>>>>>> cheers
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Looks like I was incorrect. The NoWatcherException is being
>> thrown
>>>> on
>>>>>>>>>>> success as well, and the problem is not in the cluster restart.
>>>> Will
>>>>>>>>> keep
>>>>>>>>>>> digging.
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>>>>>>> (assertion
>>>>>>>>> at
>>>>>>>>>>>> line 294). Again, it seems like some sort of race condition with
>>>> the
>>>>>>>>>>>> watcher removal.
>>>>>>>>>>>> 
>>>>>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When
>> it
>>>>>>> fails
>>>>>>>>>>>> it seems that it's got something to do with watcher removal.
>> When
>>>>>>> the
>>>>>>>>> test
>>>>>>>>>>>> passes, this error is not logged.
>>>>>>>>>>>> 
>>>>>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>>>>>>>>> KeeperErrorCode
>>>>>>>>>>>> = No such watcher for /foo/bar/lock/leases
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>>>>>>>>> at
>>>> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>>>>>>>>> at
>> org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>> at
>>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>>>>>>>>> 
>>>>>>>>>>>> Is it possible it's something to do with the way that the
>> cluster
>>>> is
>>>>>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new
>> one
>>>> is
>>>>>>>>> just
>>>>>>>>>>>> created.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Maybe we need to look at some way of providing a hook for
>> tests
>>>> to
>>>>>>>>> wait
>>>>>>>>>>>>>> reliably for asynch tasks to finish?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The latest round of tests ran OK. One test failed on an
>>>> unrelated
>>>>>>>>> thing
>>>>>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as
>>>> it's
>>>>>>>>>>>>> worked
>>>>>>>>>>>>>> ok the next time around.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I will start getting a release together. Thanks for you help
>>>> with
>>>>>>> the
>>>>>>>>>>>>>> updated tests.
>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The problem is in-flight watchers and async background calls.
>>>>>>>>> There’s
>>>>>>>>>>>>> no
>>>>>>>>>>>>>>> way to cancel these and they can take time to occur - even
>>>> after
>>>>>>> a
>>>>>>>>>>>>> recipe
>>>>>>>>>>>>>>> instance is closed.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Ok, running it again now.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is
>>>> done
>>>>>>>>>>>>>>>> asynchronously after they are closed?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>>>>>>> checker.
>>>>>>>>> If
>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
>>>>>>> directly
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole
>> thing
>>>>>>> again
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>>>> 
>>>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>>>>>>> still
>>>>>>>>>>>>>>>>> registered:
>>>>>>>>>>>>>>>>>>> [/test]
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found
>> [false]
>>>>>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>>>>>>> more
>>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>>>> [true]
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>>>>>>> against
>>>>>>>>>>>>> that,
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is
>>>> merged
>>>>>>>>>>>>> yet. I
>>>>>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to
>> the
>>>>>>> same
>>>>>>>>>>>>> stuff
>>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or
>> more
>>>>>>> child
>>>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or
>> more
>>>>>>> child
>>>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>>>>>>>>> 
>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered:
>> [/test]
>>>>>>>>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>>>>>>>>> 
>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered:
>> [/test]
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>>>>>>> more
>>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>>>>>>> more
>>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294
>>>> expected
>>>>>>>>>>>>> [true]
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256
>> One
>>>> or
>>>>>>>>> more
>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so
>> I’ll
>>>>>>>>> spend
>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still
>>>> supposed
>>>>>>> to
>>>>>>>>>>>>> get
>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>>>>>>>>> handling
>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>> But,
>>>>>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are
>>>> some
>>>>>>>>>>>>>>>>> significant
>>>>>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to
>> mirror
>>>>>>> what
>>>>>>>>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In
>> hindsight,
>>>>>>> the
>>>>>>>>>>>>> whole
>>>>>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>>>>>>> mutator
>>>>>>>>>>>>> APIs.
>>>>>>>>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>>>>>>> consistently
>>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually
>> potentially a
>>>>>>> bug
>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
>>>> I've
>>>>>>>>> had a
>>>>>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's
>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got
>>>> time,
>>>>>>>>> can
>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more
>>>> digging.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and
>> 3.2
>>>>>>> onto
>>>>>>>>>>>>> Nexus.
>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>>>>>>>>> dragonsinth@gmail.com
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied
>> to
>>>>>>> both
>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie
>> <
>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
>>>> are
>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan
>> Zimmerman
>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie
>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've
>> tried a
>>>>>>> few
>>>>>>>>>>>>> times
>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>>>>>>>>> morning.
>>>>>>>>>>>>>>> Given
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we
>>>> just
>>>>>>>>> want
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> vote
>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
>>>> Zimmerman
>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron
>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the
>> schema
>>>>>>>>>>>>> validation
>>>>>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding
>> call.
>>>>>>>>>>>>> Because
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an
>>>> exception
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>>>>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>> 
>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathInBackground(adjustedPath, data,
>>>>>>>>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test
>> to
>>>>>>>>> force a
>>>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the
>>>> UnhandledErrorListener,
>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>>>>>>> operations?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>>>>>>> McKenzie
>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>>>>>>> there,
>>>>>>>>> so
>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>>>>>>> know
>>>>>>>>> if
>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>>>>>>>>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you
>> compared
>>>>>>> it to
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>>>>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>>>>>>> seems to
>>>>>>>>>>>>> try
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>>>>>>>>> exception
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So,
>>>> it
>>>>>>>>> just
>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>>>>>>>>> propogated up
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I
>> just
>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
Yeah, I'm still getting failures too. I will have more of a look if I get
time tonight.
cheers

On Thu, Jun 2, 2016 at 3:01 PM, Jordan Zimmerman <jordan@jordanzimmerman.com
> wrote:

> Hmm - I’m still getting failures - maybe I’m wrong. It’s late and I’m off
> to bed. I’ll look at this more tomorrow.
>
> -Jordan
>
> > On Jun 1, 2016, at 10:59 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > The counter is just being used to check if semaphores are still being
> > acquired. Essentially it just runs in a loop acquiring semaphores (and
> > incrementing the counter when they are acquired).
> >
> > Then it shuts down the server, waits until it the session is lost, then
> > restarts the server and then checks that semaphores are being acquired
> > correctly again (by checking that the counter is being incremented).
> >
> > This is just a simplified version of the test that is failing.
> >
> > When the test fails, all of the threads are attempting to get a lease on
> > the semaphore, but none of them get it, then the test times out while
> > waiting.
> >
> >
> >
> > On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com
> >> wrote:
> >
> >> I also had to add:
> >>
> >> while(!lost.get() && (counter.get() > 0))
> >> {
> >>    Thread.sleep(1000);
> >> }
> >> Which seems more correct to me.
> >>
> >>> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mc...@gmail.com>
> >> wrote:
> >>>
> >>> I have just pushed an interprocess_mutex_issue branch. The test case is
> >> in
> >>> TestInterprocessMutexNotReconnecting
> >>>
> >>> For me it's failing around 20% of the time.
> >>> cheers
> >>>
> >>> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com>
> >>> wrote:
> >>>
> >>>> Yep, just let me confirm that it's actually getting the same problem.
> >> I'm
> >>>> sure it was before, but I've just run it a bunch of times and
> >> everything's
> >>>> been fine.
> >>>>
> >>>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
> >>>> jordan@jordanzimmerman.com> wrote:
> >>>>
> >>>>> Can you push your unit test somewhere?
> >>>>>
> >>>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <
> mckenzie.cam@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2
> >> though.
> >>>>>> I've written a simplified unit test that just has a bunch of clients
> >>>>>> attempting to grab a lease on the semaphore. When I shutdown and
> >>>>> restart ZK
> >>>>>> about 25% of the time, none of the clients can reacquire the
> >> semaphore.
> >>>>>>
> >>>>>> Still trying to work out what's going on, but I'm probably not going
> >> to
> >>>>>> have a lot of time today to look at it.
> >>>>>> cheers
> >>>>>>
> >>>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
> >>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>
> >>>>>>> Odd - SemaphoreClient does seem wrong.
> >>>>>>>
> >>>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> It looks like under some circumstances (which I haven't worked out
> >>>>> yet)
> >>>>>>>> that the InterprocessMutex acquire() is not working correctly when
> >>>>>>>> reconnecting to ZK. Still digging into why this is.
> >>>>>>>>
> >>>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
> >>>>> missing
> >>>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
> >>>>> throws
> >>>>>>> an
> >>>>>>>> exception if they return true. As far as I can work out, this
> means
> >>>>> that
> >>>>>>>> whenever the lock is acquired, an exception gets thrown indicating
> >>>>> that
> >>>>>>>> there are Multiple acquirers.
> >>>>>>>>
> >>>>>>>> This test is failing fairly consistently. It seems to be the
> >> remaining
> >>>>>>> test
> >>>>>>>> that keeps failing in the Jenkins build also
> >>>>>>>> cheers
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
> >>>>> mckenzie.cam@gmail.com
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Looks like I was incorrect. The NoWatcherException is being
> thrown
> >> on
> >>>>>>>>> success as well, and the problem is not in the cluster restart.
> >> Will
> >>>>>>> keep
> >>>>>>>>> digging.
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
> >>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
> >>>>> (assertion
> >>>>>>> at
> >>>>>>>>>> line 294). Again, it seems like some sort of race condition with
> >> the
> >>>>>>>>>> watcher removal.
> >>>>>>>>>>
> >>>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When
> it
> >>>>> fails
> >>>>>>>>>> it seems that it's got something to do with watcher removal.
> When
> >>>>> the
> >>>>>>> test
> >>>>>>>>>> passes, this error is not logged.
> >>>>>>>>>>
> >>>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
> >>>>>>> KeeperErrorCode
> >>>>>>>>>> = No such watcher for /foo/bar/lock/leases
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
> >>>>>>>>>> at
> >> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
> >>>>>>>>>> at
> org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> >>>>>>>>>> at
> >>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
> >>>>>>>>>>
> >>>>>>>>>> Is it possible it's something to do with the way that the
> cluster
> >> is
> >>>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new
> one
> >> is
> >>>>>>> just
> >>>>>>>>>> created.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
> >>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I’ll try to address this as part of CURATOR-333
> >>>>>>>>>>>
> >>>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
> >>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Maybe we need to look at some way of providing a hook for
> tests
> >> to
> >>>>>>> wait
> >>>>>>>>>>>> reliably for asynch tasks to finish?
> >>>>>>>>>>>>
> >>>>>>>>>>>> The latest round of tests ran OK. One test failed on an
> >> unrelated
> >>>>>>> thing
> >>>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as
> >> it's
> >>>>>>>>>>> worked
> >>>>>>>>>>>> ok the next time around.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I will start getting a release together. Thanks for you help
> >> with
> >>>>> the
> >>>>>>>>>>>> updated tests.
> >>>>>>>>>>>> cheers
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
> >>>>>>>>>>> jordan@jordanzimmerman.com
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> The problem is in-flight watchers and async background calls.
> >>>>>>> There’s
> >>>>>>>>>>> no
> >>>>>>>>>>>>> way to cancel these and they can take time to occur - even
> >> after
> >>>>> a
> >>>>>>>>>>> recipe
> >>>>>>>>>>>>> instance is closed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
> >>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Ok, running it again now.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is
> >> done
> >>>>>>>>>>>>>> asynchronously after they are closed?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
> >>>>>>>>>>>>> jordan@jordanzimmerman.com
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
> >>>>> checker.
> >>>>>>> If
> >>>>>>>>>>>>> there
> >>>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
> >>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
> >>>>> directly
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole
> thing
> >>>>> again
> >>>>>>>>>>> in
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> morning and see how it goes.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> There are still 2 tests failing for me:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> FAILURE! - in
> >>>>>>>>>>>>>>>>>
> >>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
> >>>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
> >>>>> still
> >>>>>>>>>>>>>>> registered:
> >>>>>>>>>>>>>>>>> [/test]
> >>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
> >>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> FAILURE! - in
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
> >>>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
> >>>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found
> [false]
> >>>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
> >>>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
> >>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
> >>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
> >>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Failed tests:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
> >>>>> more
> >>>>>>>>>>> child
> >>>>>>>>>>>>>>>>> watchers are still registered: [/test]
> >>>>>>>>>>>>>>>>> Run 2: PASS
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
> >>>>> [true]
> >>>>>>>>>>> but
> >>>>>>>>>>>>>>>>> found [false]
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
> >>>>> against
> >>>>>>>>>>> that,
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
> >>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is
> >> merged
> >>>>>>>>>>> yet. I
> >>>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> -jordan
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to
> the
> >>>>> same
> >>>>>>>>>>> stuff
> >>>>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>> merging your fix:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Failed tests:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or
> more
> >>>>> child
> >>>>>>>>>>>>>>> watchers
> >>>>>>>>>>>>>>>>>>>> are still registered: [/test]
> >>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or
> more
> >>>>> child
> >>>>>>>>>>>>>>> watchers
> >>>>>>>>>>>>>>>>>>>> are still registered: [/test]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>>>>> Run 1:
> >>>>>>>>>>>>>>>
> >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered:
> [/test]
> >>>>>>>>>>>>>>>>>>>> Run 2:
> >>>>>>>>>>>>>>>
> >> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered:
> [/test]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
> >>>>> more
> >>>>>>>>>>> child
> >>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
> >>>>> more
> >>>>>>>>>>> child
> >>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294
> >> expected
> >>>>>>>>>>> [true]
> >>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>> found [false]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>>>>>>>>>>>>> Run 1: PASS
> >>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256
> One
> >> or
> >>>>>>> more
> >>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
> >>>>>>>>>>> watchers are
> >>>>>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>>>> registered: [/count]
> >>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
> >>>>>>>>>>> watchers are
> >>>>>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>>>> registered: [/count]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so
> I’ll
> >>>>>>> spend
> >>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>> time on
> >>>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still
> >> supposed
> >>>>> to
> >>>>>>>>>>> get
> >>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
> >>>>>>> handling
> >>>>>>>>>>> it.
> >>>>>>>>>>>>>>> But,
> >>>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are
> >> some
> >>>>>>>>>>>>>>> significant
> >>>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to
> mirror
> >>>>> what
> >>>>>>>>>>>>>>>>>>> ZooKeeper does
> >>>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In
> hindsight,
> >>>>> the
> >>>>>>>>>>> whole
> >>>>>>>>>>>>> ZK
> >>>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
> >>>>> mutator
> >>>>>>>>>>> APIs.
> >>>>>>>>>>>>>>>>>>> But, of
> >>>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
> >>>>> consistently
> >>>>>>>>>>> on the
> >>>>>>>>>>>>>>> 3.0
> >>>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually
> potentially a
> >>>>> bug
> >>>>>>>>>>> in the
> >>>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
> >> I've
> >>>>>>> had a
> >>>>>>>>>>>>> quick
> >>>>>>>>>>>>>>>>>>> look
> >>>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's
> >> the
> >>>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got
> >> time,
> >>>>>>> can
> >>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>> have a
> >>>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more
> >> digging.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and
> 3.2
> >>>>> onto
> >>>>>>>>>>> Nexus.
> >>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
> >>>>>>>>>>>>>>> dragonsinth@gmail.com
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied
> to
> >>>>> both
> >>>>>>>>>>>>> master
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>> 3.0.
> >>>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie
> <
> >>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
> >> are
> >>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>>>> there.
> >>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
> >>>>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan
> Zimmerman
> >> <
> >>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie
> <
> >>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've
> tried a
> >>>>> few
> >>>>>>>>>>> times
> >>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>>>>>>>>> love:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
> >>>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
> >>>>>>> morning.
> >>>>>>>>>>>>> Given
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we
> >> just
> >>>>>>> want
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>> vote
> >>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
> >> Zimmerman
> >>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron
> >> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
> >>>>> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the
> schema
> >>>>>>>>>>> validation
> >>>>>>>>>>>>>>>>>>> stuff.
> >>>>>>>>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding
> call.
> >>>>>>>>>>> Because
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> unit
> >>>>>>>>>>>>>>>>>>>>>>>>> test
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an
> >> exception
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> adjustPath(client.fixForNamespace(givenPath,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
> >>>>>>>>>>> acling.getAclList(adjustedPath);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>
> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  pathInBackground(adjustedPath, data,
> >>>>>>>>>>> givenPath);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test
> to
> >>>>>>> force a
> >>>>>>>>>>>>>>> failure
> >>>>>>>>>>>>>>>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the
> >> UnhandledErrorListener,
> >>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> expectation is
> >>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
> >>>>> operations?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
> >>>>> McKenzie
> >>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
> >>>>> there,
> >>>>>>> so
> >>>>>>>>>>>>> maybe
> >>>>>>>>>>>>>>>>>>>>>>>>> something
> >>>>>>>>>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
> >>>>> know
> >>>>>>> if
> >>>>>>>>>>> I
> >>>>>>>>>>>>> get
> >>>>>>>>>>>>>>>>>>>>>>>> stuck.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
> >>>>>>> Zimmerman <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you
> compared
> >>>>> it to
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> master
> >>>>>>>>>>>>>>>>>>>>>>>>>> branch?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
> >>>>>>> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
> >>>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
> >>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
> >>>>> seems to
> >>>>>>>>>>> try
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> provoke
> >>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
> >>>>>>>>>>>>>>> CreateBuilderImpl
> >>>>>>>>>>>>>>>>>>>>>>>> prior
> >>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
> >>>>>>>>>>> exception
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So,
> >> it
> >>>>>>> just
> >>>>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
> >>>>>>>>>>> propogated up
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> stack
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I
> just
> >>>>>>> don't
> >>>>>>>>>>>>>>>>>>> understand
> >>>>>>>>>>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Hmm - I’m still getting failures - maybe I’m wrong. It’s late and I’m off to bed. I’ll look at this more tomorrow.

-Jordan

> On Jun 1, 2016, at 10:59 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> The counter is just being used to check if semaphores are still being
> acquired. Essentially it just runs in a loop acquiring semaphores (and
> incrementing the counter when they are acquired).
> 
> Then it shuts down the server, waits until it the session is lost, then
> restarts the server and then checks that semaphores are being acquired
> correctly again (by checking that the counter is being incremented).
> 
> This is just a simplified version of the test that is failing.
> 
> When the test fails, all of the threads are attempting to get a lease on
> the semaphore, but none of them get it, then the test times out while
> waiting.
> 
> 
> 
> On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <jordan@jordanzimmerman.com
>> wrote:
> 
>> I also had to add:
>> 
>> while(!lost.get() && (counter.get() > 0))
>> {
>>    Thread.sleep(1000);
>> }
>> Which seems more correct to me.
>> 
>>> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>>> 
>>> I have just pushed an interprocess_mutex_issue branch. The test case is
>> in
>>> TestInterprocessMutexNotReconnecting
>>> 
>>> For me it's failing around 20% of the time.
>>> cheers
>>> 
>>> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>>> wrote:
>>> 
>>>> Yep, just let me confirm that it's actually getting the same problem.
>> I'm
>>>> sure it was before, but I've just run it a bunch of times and
>> everything's
>>>> been fine.
>>>> 
>>>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
>>>> jordan@jordanzimmerman.com> wrote:
>>>> 
>>>>> Can you push your unit test somewhere?
>>>>> 
>>>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <mc...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2
>> though.
>>>>>> I've written a simplified unit test that just has a bunch of clients
>>>>>> attempting to grab a lease on the semaphore. When I shutdown and
>>>>> restart ZK
>>>>>> about 25% of the time, none of the clients can reacquire the
>> semaphore.
>>>>>> 
>>>>>> Still trying to work out what's going on, but I'm probably not going
>> to
>>>>>> have a lot of time today to look at it.
>>>>>> cheers
>>>>>> 
>>>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>> 
>>>>>>> Odd - SemaphoreClient does seem wrong.
>>>>>>> 
>>>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> It looks like under some circumstances (which I haven't worked out
>>>>> yet)
>>>>>>>> that the InterprocessMutex acquire() is not working correctly when
>>>>>>>> reconnecting to ZK. Still digging into why this is.
>>>>>>>> 
>>>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
>>>>> missing
>>>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
>>>>> throws
>>>>>>> an
>>>>>>>> exception if they return true. As far as I can work out, this means
>>>>> that
>>>>>>>> whenever the lock is acquired, an exception gets thrown indicating
>>>>> that
>>>>>>>> there are Multiple acquirers.
>>>>>>>> 
>>>>>>>> This test is failing fairly consistently. It seems to be the
>> remaining
>>>>>>> test
>>>>>>>> that keeps failing in the Jenkins build also
>>>>>>>> cheers
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Looks like I was incorrect. The NoWatcherException is being thrown
>> on
>>>>>>>>> success as well, and the problem is not in the cluster restart.
>> Will
>>>>>>> keep
>>>>>>>>> digging.
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>>>>> (assertion
>>>>>>> at
>>>>>>>>>> line 294). Again, it seems like some sort of race condition with
>> the
>>>>>>>>>> watcher removal.
>>>>>>>>>> 
>>>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
>>>>> fails
>>>>>>>>>> it seems that it's got something to do with watcher removal. When
>>>>> the
>>>>>>> test
>>>>>>>>>> passes, this error is not logged.
>>>>>>>>>> 
>>>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>>>>>>> KeeperErrorCode
>>>>>>>>>> = No such watcher for /foo/bar/lock/leases
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>>>>>>> at
>> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>>>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>>>> at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>>>>>>> 
>>>>>>>>>> Is it possible it's something to do with the way that the cluster
>> is
>>>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new one
>> is
>>>>>>> just
>>>>>>>>>> created.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>>>>>>> 
>>>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Maybe we need to look at some way of providing a hook for tests
>> to
>>>>>>> wait
>>>>>>>>>>>> reliably for asynch tasks to finish?
>>>>>>>>>>>> 
>>>>>>>>>>>> The latest round of tests ran OK. One test failed on an
>> unrelated
>>>>>>> thing
>>>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as
>> it's
>>>>>>>>>>> worked
>>>>>>>>>>>> ok the next time around.
>>>>>>>>>>>> 
>>>>>>>>>>>> I will start getting a release together. Thanks for you help
>> with
>>>>> the
>>>>>>>>>>>> updated tests.
>>>>>>>>>>>> cheers
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> The problem is in-flight watchers and async background calls.
>>>>>>> There’s
>>>>>>>>>>> no
>>>>>>>>>>>>> way to cancel these and they can take time to occur - even
>> after
>>>>> a
>>>>>>>>>>> recipe
>>>>>>>>>>>>> instance is closed.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Ok, running it again now.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is
>> done
>>>>>>>>>>>>>> asynchronously after they are closed?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>>>>> checker.
>>>>>>> If
>>>>>>>>>>>>> there
>>>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
>>>>> directly
>>>>>>>>>>> in
>>>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
>>>>> again
>>>>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>> 
>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>>>>> still
>>>>>>>>>>>>>>> registered:
>>>>>>>>>>>>>>>>> [/test]
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>> [true]
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>>>>> against
>>>>>>>>>>> that,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is
>> merged
>>>>>>>>>>> yet. I
>>>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
>>>>> same
>>>>>>>>>>> stuff
>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
>>>>> child
>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
>>>>> child
>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>>>>>>> 
>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>>>>>>> 
>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294
>> expected
>>>>>>>>>>> [true]
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One
>> or
>>>>>>> more
>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
>>>>>>> spend
>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still
>> supposed
>>>>> to
>>>>>>>>>>> get
>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>>>>>>> handling
>>>>>>>>>>> it.
>>>>>>>>>>>>>>> But,
>>>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are
>> some
>>>>>>>>>>>>>>> significant
>>>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
>>>>> what
>>>>>>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
>>>>> the
>>>>>>>>>>> whole
>>>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>>>>> mutator
>>>>>>>>>>> APIs.
>>>>>>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>>>>> consistently
>>>>>>>>>>> on the
>>>>>>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
>>>>> bug
>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
>> I've
>>>>>>> had a
>>>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's
>> the
>>>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got
>> time,
>>>>>>> can
>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more
>> digging.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
>>>>> onto
>>>>>>>>>>> Nexus.
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>>>>>>> dragonsinth@gmail.com
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
>>>>> both
>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
>> are
>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman
>> <
>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
>>>>> few
>>>>>>>>>>> times
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>>>>>>> morning.
>>>>>>>>>>>>> Given
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we
>> just
>>>>>>> want
>>>>>>>>>>> to
>>>>>>>>>>>>>>> vote
>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
>> Zimmerman
>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron
>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>>>>>>>>>> validation
>>>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>>>>>>>>>> Because
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an
>> exception
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  pathInBackground(adjustedPath, data,
>>>>>>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
>>>>>>> force a
>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the
>> UnhandledErrorListener,
>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>>>>> operations?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>>>>> McKenzie
>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>>>>> there,
>>>>>>> so
>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>>>>> know
>>>>>>> if
>>>>>>>>>>> I
>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>>>>>>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared
>>>>> it to
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>>>>> seems to
>>>>>>>>>>> try
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>>>>>>> exception
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So,
>> it
>>>>>>> just
>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>>>>>>> propogated up
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
>>>>>>> don't
>>>>>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
OK - I got a failure even with the line commented out. However, I found a similar line in LockInternals. I’m going to comment that out too and retest.

> On Jun 1, 2016, at 11:55 PM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
> 
> My current testing suggests that the problem is the call to:
> 
> 	client.removeWatchers();
> 
> in InterProcessSemaphoreV2
> 
> if I comment out that line your test has yet to fail for me. Maybe you can verify. I’ll also look at why this is causing the failure.
> 
>> On Jun 1, 2016, at 10:59 PM, Cameron McKenzie <mc...@gmail.com> wrote:
>> 
>> The counter is just being used to check if semaphores are still being
>> acquired. Essentially it just runs in a loop acquiring semaphores (and
>> incrementing the counter when they are acquired).
>> 
>> Then it shuts down the server, waits until it the session is lost, then
>> restarts the server and then checks that semaphores are being acquired
>> correctly again (by checking that the counter is being incremented).
>> 
>> This is just a simplified version of the test that is failing.
>> 
>> When the test fails, all of the threads are attempting to get a lease on
>> the semaphore, but none of them get it, then the test times out while
>> waiting.
>> 
>> 
>> 
>> On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <jordan@jordanzimmerman.com
>>> wrote:
>> 
>>> I also had to add:
>>> 
>>> while(!lost.get() && (counter.get() > 0))
>>> {
>>>   Thread.sleep(1000);
>>> }
>>> Which seems more correct to me.
>>> 
>>>> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mc...@gmail.com>
>>> wrote:
>>>> 
>>>> I have just pushed an interprocess_mutex_issue branch. The test case is
>>> in
>>>> TestInterprocessMutexNotReconnecting
>>>> 
>>>> For me it's failing around 20% of the time.
>>>> cheers
>>>> 
>>>> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <
>>> mckenzie.cam@gmail.com>
>>>> wrote:
>>>> 
>>>>> Yep, just let me confirm that it's actually getting the same problem.
>>> I'm
>>>>> sure it was before, but I've just run it a bunch of times and
>>> everything's
>>>>> been fine.
>>>>> 
>>>>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
>>>>> jordan@jordanzimmerman.com> wrote:
>>>>> 
>>>>>> Can you push your unit test somewhere?
>>>>>> 
>>>>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <mc...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2
>>> though.
>>>>>>> I've written a simplified unit test that just has a bunch of clients
>>>>>>> attempting to grab a lease on the semaphore. When I shutdown and
>>>>>> restart ZK
>>>>>>> about 25% of the time, none of the clients can reacquire the
>>> semaphore.
>>>>>>> 
>>>>>>> Still trying to work out what's going on, but I'm probably not going
>>> to
>>>>>>> have a lot of time today to look at it.
>>>>>>> cheers
>>>>>>> 
>>>>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>> 
>>>>>>>> Odd - SemaphoreClient does seem wrong.
>>>>>>>> 
>>>>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <
>>> mckenzie.cam@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> It looks like under some circumstances (which I haven't worked out
>>>>>> yet)
>>>>>>>>> that the InterprocessMutex acquire() is not working correctly when
>>>>>>>>> reconnecting to ZK. Still digging into why this is.
>>>>>>>>> 
>>>>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
>>>>>> missing
>>>>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
>>>>>> throws
>>>>>>>> an
>>>>>>>>> exception if they return true. As far as I can work out, this means
>>>>>> that
>>>>>>>>> whenever the lock is acquired, an exception gets thrown indicating
>>>>>> that
>>>>>>>>> there are Multiple acquirers.
>>>>>>>>> 
>>>>>>>>> This test is failing fairly consistently. It seems to be the
>>> remaining
>>>>>>>> test
>>>>>>>>> that keeps failing in the Jenkins build also
>>>>>>>>> cheers
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>>>>>> mckenzie.cam@gmail.com
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Looks like I was incorrect. The NoWatcherException is being thrown
>>> on
>>>>>>>>>> success as well, and the problem is not in the cluster restart.
>>> Will
>>>>>>>> keep
>>>>>>>>>> digging.
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>>>>>> (assertion
>>>>>>>> at
>>>>>>>>>>> line 294). Again, it seems like some sort of race condition with
>>> the
>>>>>>>>>>> watcher removal.
>>>>>>>>>>> 
>>>>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
>>>>>> fails
>>>>>>>>>>> it seems that it's got something to do with watcher removal. When
>>>>>> the
>>>>>>>> test
>>>>>>>>>>> passes, this error is not logged.
>>>>>>>>>>> 
>>>>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>>>>>>>> KeeperErrorCode
>>>>>>>>>>> = No such watcher for /foo/bar/lock/leases
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>>>>>>>> at
>>> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>>>>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>>>>> at
>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>>>>>>>> 
>>>>>>>>>>> Is it possible it's something to do with the way that the cluster
>>> is
>>>>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new one
>>> is
>>>>>>>> just
>>>>>>>>>>> created.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maybe we need to look at some way of providing a hook for tests
>>> to
>>>>>>>> wait
>>>>>>>>>>>>> reliably for asynch tasks to finish?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The latest round of tests ran OK. One test failed on an
>>> unrelated
>>>>>>>> thing
>>>>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as
>>> it's
>>>>>>>>>>>> worked
>>>>>>>>>>>>> ok the next time around.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I will start getting a release together. Thanks for you help
>>> with
>>>>>> the
>>>>>>>>>>>>> updated tests.
>>>>>>>>>>>>> cheers
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The problem is in-flight watchers and async background calls.
>>>>>>>> There’s
>>>>>>>>>>>> no
>>>>>>>>>>>>>> way to cancel these and they can take time to occur - even
>>> after
>>>>>> a
>>>>>>>>>>>> recipe
>>>>>>>>>>>>>> instance is closed.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Ok, running it again now.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is
>>> done
>>>>>>>>>>>>>>> asynchronously after they are closed?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>>>>>> checker.
>>>>>>>> If
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
>>>>>> directly
>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
>>>>>> again
>>>>>>>>>>>> in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>>> 
>>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>>>>>> still
>>>>>>>>>>>>>>>> registered:
>>>>>>>>>>>>>>>>>> [/test]
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>>>>>> more
>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>>> [true]
>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>>>>>> against
>>>>>>>>>>>> that,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is
>>> merged
>>>>>>>>>>>> yet. I
>>>>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
>>>>>> same
>>>>>>>>>>>> stuff
>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
>>>>>> child
>>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
>>>>>> child
>>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>>>>>>>> 
>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>>>>>>>> 
>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>>>>>> more
>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>>>>>> more
>>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294
>>> expected
>>>>>>>>>>>> [true]
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One
>>> or
>>>>>>>> more
>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
>>>>>>>> spend
>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still
>>> supposed
>>>>>> to
>>>>>>>>>>>> get
>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>>>>>>>> handling
>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>> But,
>>>>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are
>>> some
>>>>>>>>>>>>>>>> significant
>>>>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
>>>>>> what
>>>>>>>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
>>>>>> the
>>>>>>>>>>>> whole
>>>>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>>>>>> mutator
>>>>>>>>>>>> APIs.
>>>>>>>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>>>>>> consistently
>>>>>>>>>>>> on the
>>>>>>>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
>>>>>> bug
>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
>>> I've
>>>>>>>> had a
>>>>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's
>>> the
>>>>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got
>>> time,
>>>>>>>> can
>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more
>>> digging.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
>>>>>> onto
>>>>>>>>>>>> Nexus.
>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>>>>>>>> dragonsinth@gmail.com
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
>>>>>> both
>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
>>> are
>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman
>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
>>>>>> few
>>>>>>>>>>>> times
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>>>>>>>> morning.
>>>>>>>>>>>>>> Given
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we
>>> just
>>>>>>>> want
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> vote
>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
>>> Zimmerman
>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron
>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>>>>>>>>>>> validation
>>>>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>>>>>>>>>>> Because
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an
>>> exception
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>>>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathInBackground(adjustedPath, data,
>>>>>>>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
>>>>>>>> force a
>>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the
>>> UnhandledErrorListener,
>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>>>>>> operations?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>>>>>> McKenzie
>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>>>>>> there,
>>>>>>>> so
>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>>>>>> know
>>>>>>>> if
>>>>>>>>>>>> I
>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>>>>>>>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared
>>>>>> it to
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>>>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>>>>>> seems to
>>>>>>>>>>>> try
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>>>>>>>> exception
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So,
>>> it
>>>>>>>> just
>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>>>>>>>> propogated up
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> 
> 


Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
My current testing suggests that the problem is the call to:

	client.removeWatchers();

in InterProcessSemaphoreV2

if I comment out that line your test has yet to fail for me. Maybe you can verify. I’ll also look at why this is causing the failure.

> On Jun 1, 2016, at 10:59 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> The counter is just being used to check if semaphores are still being
> acquired. Essentially it just runs in a loop acquiring semaphores (and
> incrementing the counter when they are acquired).
> 
> Then it shuts down the server, waits until it the session is lost, then
> restarts the server and then checks that semaphores are being acquired
> correctly again (by checking that the counter is being incremented).
> 
> This is just a simplified version of the test that is failing.
> 
> When the test fails, all of the threads are attempting to get a lease on
> the semaphore, but none of them get it, then the test times out while
> waiting.
> 
> 
> 
> On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <jordan@jordanzimmerman.com
>> wrote:
> 
>> I also had to add:
>> 
>> while(!lost.get() && (counter.get() > 0))
>> {
>>    Thread.sleep(1000);
>> }
>> Which seems more correct to me.
>> 
>>> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>>> 
>>> I have just pushed an interprocess_mutex_issue branch. The test case is
>> in
>>> TestInterprocessMutexNotReconnecting
>>> 
>>> For me it's failing around 20% of the time.
>>> cheers
>>> 
>>> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>>> wrote:
>>> 
>>>> Yep, just let me confirm that it's actually getting the same problem.
>> I'm
>>>> sure it was before, but I've just run it a bunch of times and
>> everything's
>>>> been fine.
>>>> 
>>>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
>>>> jordan@jordanzimmerman.com> wrote:
>>>> 
>>>>> Can you push your unit test somewhere?
>>>>> 
>>>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <mc...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2
>> though.
>>>>>> I've written a simplified unit test that just has a bunch of clients
>>>>>> attempting to grab a lease on the semaphore. When I shutdown and
>>>>> restart ZK
>>>>>> about 25% of the time, none of the clients can reacquire the
>> semaphore.
>>>>>> 
>>>>>> Still trying to work out what's going on, but I'm probably not going
>> to
>>>>>> have a lot of time today to look at it.
>>>>>> cheers
>>>>>> 
>>>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>> 
>>>>>>> Odd - SemaphoreClient does seem wrong.
>>>>>>> 
>>>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> It looks like under some circumstances (which I haven't worked out
>>>>> yet)
>>>>>>>> that the InterprocessMutex acquire() is not working correctly when
>>>>>>>> reconnecting to ZK. Still digging into why this is.
>>>>>>>> 
>>>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
>>>>> missing
>>>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
>>>>> throws
>>>>>>> an
>>>>>>>> exception if they return true. As far as I can work out, this means
>>>>> that
>>>>>>>> whenever the lock is acquired, an exception gets thrown indicating
>>>>> that
>>>>>>>> there are Multiple acquirers.
>>>>>>>> 
>>>>>>>> This test is failing fairly consistently. It seems to be the
>> remaining
>>>>>>> test
>>>>>>>> that keeps failing in the Jenkins build also
>>>>>>>> cheers
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Looks like I was incorrect. The NoWatcherException is being thrown
>> on
>>>>>>>>> success as well, and the problem is not in the cluster restart.
>> Will
>>>>>>> keep
>>>>>>>>> digging.
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>>>>> (assertion
>>>>>>> at
>>>>>>>>>> line 294). Again, it seems like some sort of race condition with
>> the
>>>>>>>>>> watcher removal.
>>>>>>>>>> 
>>>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
>>>>> fails
>>>>>>>>>> it seems that it's got something to do with watcher removal. When
>>>>> the
>>>>>>> test
>>>>>>>>>> passes, this error is not logged.
>>>>>>>>>> 
>>>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>>>>>>> KeeperErrorCode
>>>>>>>>>> = No such watcher for /foo/bar/lock/leases
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>>>>>>> at
>> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>>>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>>>> at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>>>>>>> 
>>>>>>>>>> Is it possible it's something to do with the way that the cluster
>> is
>>>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new one
>> is
>>>>>>> just
>>>>>>>>>> created.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>>>>>>> 
>>>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Maybe we need to look at some way of providing a hook for tests
>> to
>>>>>>> wait
>>>>>>>>>>>> reliably for asynch tasks to finish?
>>>>>>>>>>>> 
>>>>>>>>>>>> The latest round of tests ran OK. One test failed on an
>> unrelated
>>>>>>> thing
>>>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as
>> it's
>>>>>>>>>>> worked
>>>>>>>>>>>> ok the next time around.
>>>>>>>>>>>> 
>>>>>>>>>>>> I will start getting a release together. Thanks for you help
>> with
>>>>> the
>>>>>>>>>>>> updated tests.
>>>>>>>>>>>> cheers
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> The problem is in-flight watchers and async background calls.
>>>>>>> There’s
>>>>>>>>>>> no
>>>>>>>>>>>>> way to cancel these and they can take time to occur - even
>> after
>>>>> a
>>>>>>>>>>> recipe
>>>>>>>>>>>>> instance is closed.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Ok, running it again now.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is
>> done
>>>>>>>>>>>>>> asynchronously after they are closed?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>>>>> checker.
>>>>>>> If
>>>>>>>>>>>>> there
>>>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
>>>>> directly
>>>>>>>>>>> in
>>>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
>>>>> again
>>>>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>> 
>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>>>>> still
>>>>>>>>>>>>>>> registered:
>>>>>>>>>>>>>>>>> [/test]
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>> [true]
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>>>>> against
>>>>>>>>>>> that,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is
>> merged
>>>>>>>>>>> yet. I
>>>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
>>>>> same
>>>>>>>>>>> stuff
>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
>>>>> child
>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
>>>>> child
>>>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>>>>>>> 
>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>>>>>>> 
>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>>>>> more
>>>>>>>>>>> child
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294
>> expected
>>>>>>>>>>> [true]
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One
>> or
>>>>>>> more
>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
>>>>>>> spend
>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still
>> supposed
>>>>> to
>>>>>>>>>>> get
>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>>>>>>> handling
>>>>>>>>>>> it.
>>>>>>>>>>>>>>> But,
>>>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are
>> some
>>>>>>>>>>>>>>> significant
>>>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
>>>>> what
>>>>>>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
>>>>> the
>>>>>>>>>>> whole
>>>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>>>>> mutator
>>>>>>>>>>> APIs.
>>>>>>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>>>>> consistently
>>>>>>>>>>> on the
>>>>>>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
>>>>> bug
>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
>> I've
>>>>>>> had a
>>>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's
>> the
>>>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got
>> time,
>>>>>>> can
>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more
>> digging.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
>>>>> onto
>>>>>>>>>>> Nexus.
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>>>>>>> dragonsinth@gmail.com
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
>>>>> both
>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
>> are
>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman
>> <
>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
>>>>> few
>>>>>>>>>>> times
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>>>>>>> morning.
>>>>>>>>>>>>> Given
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we
>> just
>>>>>>> want
>>>>>>>>>>> to
>>>>>>>>>>>>>>> vote
>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
>> Zimmerman
>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron
>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>>>>>>>>>> validation
>>>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>>>>>>>>>> Because
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an
>> exception
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  pathInBackground(adjustedPath, data,
>>>>>>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
>>>>>>> force a
>>>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the
>> UnhandledErrorListener,
>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>>>>> operations?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>>>>> McKenzie
>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>>>>> there,
>>>>>>> so
>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>>>>> know
>>>>>>> if
>>>>>>>>>>> I
>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>>>>>>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared
>>>>> it to
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>>>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>>>>> seems to
>>>>>>>>>>> try
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>>>>>>> exception
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So,
>> it
>>>>>>> just
>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>>>>>>> propogated up
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
>>>>>>> don't
>>>>>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
The counter is just being used to check if semaphores are still being
acquired. Essentially it just runs in a loop acquiring semaphores (and
incrementing the counter when they are acquired).

Then it shuts down the server, waits until it the session is lost, then
restarts the server and then checks that semaphores are being acquired
correctly again (by checking that the counter is being incremented).

This is just a simplified version of the test that is failing.

When the test fails, all of the threads are attempting to get a lease on
the semaphore, but none of them get it, then the test times out while
waiting.



On Thu, Jun 2, 2016 at 1:29 PM, Jordan Zimmerman <jordan@jordanzimmerman.com
> wrote:

> I also had to add:
>
> while(!lost.get() && (counter.get() > 0))
> {
>     Thread.sleep(1000);
> }
> Which seems more correct to me.
>
> > On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > I have just pushed an interprocess_mutex_issue branch. The test case is
> in
> > TestInterprocessMutexNotReconnecting
> >
> > For me it's failing around 20% of the time.
> > cheers
> >
> > On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <
> mckenzie.cam@gmail.com>
> > wrote:
> >
> >> Yep, just let me confirm that it's actually getting the same problem.
> I'm
> >> sure it was before, but I've just run it a bunch of times and
> everything's
> >> been fine.
> >>
> >> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
> >> jordan@jordanzimmerman.com> wrote:
> >>
> >>> Can you push your unit test somewhere?
> >>>
> >>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <mc...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2
> though.
> >>>> I've written a simplified unit test that just has a bunch of clients
> >>>> attempting to grab a lease on the semaphore. When I shutdown and
> >>> restart ZK
> >>>> about 25% of the time, none of the clients can reacquire the
> semaphore.
> >>>>
> >>>> Still trying to work out what's going on, but I'm probably not going
> to
> >>>> have a lot of time today to look at it.
> >>>> cheers
> >>>>
> >>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
> >>>> jordan@jordanzimmerman.com> wrote:
> >>>>
> >>>>> Odd - SemaphoreClient does seem wrong.
> >>>>>
> >>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <
> mckenzie.cam@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> It looks like under some circumstances (which I haven't worked out
> >>> yet)
> >>>>>> that the InterprocessMutex acquire() is not working correctly when
> >>>>>> reconnecting to ZK. Still digging into why this is.
> >>>>>>
> >>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
> >>> missing
> >>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
> >>> throws
> >>>>> an
> >>>>>> exception if they return true. As far as I can work out, this means
> >>> that
> >>>>>> whenever the lock is acquired, an exception gets thrown indicating
> >>> that
> >>>>>> there are Multiple acquirers.
> >>>>>>
> >>>>>> This test is failing fairly consistently. It seems to be the
> remaining
> >>>>> test
> >>>>>> that keeps failing in the Jenkins build also
> >>>>>> cheers
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
> >>> mckenzie.cam@gmail.com
> >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Looks like I was incorrect. The NoWatcherException is being thrown
> on
> >>>>>>> success as well, and the problem is not in the cluster restart.
> Will
> >>>>> keep
> >>>>>>> digging.
> >>>>>>>
> >>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
> >>>>> mckenzie.cam@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
> >>> (assertion
> >>>>> at
> >>>>>>>> line 294). Again, it seems like some sort of race condition with
> the
> >>>>>>>> watcher removal.
> >>>>>>>>
> >>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
> >>> fails
> >>>>>>>> it seems that it's got something to do with watcher removal. When
> >>> the
> >>>>> test
> >>>>>>>> passes, this error is not logged.
> >>>>>>>>
> >>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
> >>>>> KeeperErrorCode
> >>>>>>>> = No such watcher for /foo/bar/lock/leases
> >>>>>>>> at
> >>>>>>>>
> >>>>>
> >>>
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
> >>>>>>>> at
> >>>>>>>>
> >>>>>
> >>>
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
> >>>>>>>> at
> >>>>>>>>
> >>>>>
> >>>
> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
> >>>>>>>> at
> org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
> >>>>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
> >>>>>>>> at
> >>>>>>>>
> >>>>>
> >>>
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
> >>>>>>>> at
> >>>>>>>>
> >>>>>
> >>>
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
> >>>>>>>> at
> >>>>>>>>
> >>>>>
> >>>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> >>>>>>>> at
> >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
> >>>>>>>>
> >>>>>>>> Is it possible it's something to do with the way that the cluster
> is
> >>>>>>>> restarted at line 282? The old cluster is not shutdown, a new one
> is
> >>>>> just
> >>>>>>>> created.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
> >>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>
> >>>>>>>>> I’ll try to address this as part of CURATOR-333
> >>>>>>>>>
> >>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
> >>>>> mckenzie.cam@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Maybe we need to look at some way of providing a hook for tests
> to
> >>>>> wait
> >>>>>>>>>> reliably for asynch tasks to finish?
> >>>>>>>>>>
> >>>>>>>>>> The latest round of tests ran OK. One test failed on an
> unrelated
> >>>>> thing
> >>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as
> it's
> >>>>>>>>> worked
> >>>>>>>>>> ok the next time around.
> >>>>>>>>>>
> >>>>>>>>>> I will start getting a release together. Thanks for you help
> with
> >>> the
> >>>>>>>>>> updated tests.
> >>>>>>>>>> cheers
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
> >>>>>>>>> jordan@jordanzimmerman.com
> >>>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> The problem is in-flight watchers and async background calls.
> >>>>> There’s
> >>>>>>>>> no
> >>>>>>>>>>> way to cancel these and they can take time to occur - even
> after
> >>> a
> >>>>>>>>> recipe
> >>>>>>>>>>> instance is closed.
> >>>>>>>>>>>
> >>>>>>>>>>> -Jordan
> >>>>>>>>>>>
> >>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
> >>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Ok, running it again now.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is
> done
> >>>>>>>>>>>> asynchronously after they are closed?
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
> >>>>>>>>>>> jordan@jordanzimmerman.com
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
> >>> checker.
> >>>>> If
> >>>>>>>>>>> there
> >>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
> >>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
> >>> directly
> >>>>>>>>> in
> >>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
> >>> again
> >>>>>>>>> in
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> morning and see how it goes.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
> >>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> There are still 2 tests failing for me:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> FAILURE! - in
> >>>>>>>>>>>>>>>
> >>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
> >>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
> >>> still
> >>>>>>>>>>>>> registered:
> >>>>>>>>>>>>>>> [/test]
> >>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
> >>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> FAILURE! - in
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
> >>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
> >>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
> >>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
> >>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
> >>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
> >>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
> >>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Failed tests:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
> >>> more
> >>>>>>>>> child
> >>>>>>>>>>>>>>> watchers are still registered: [/test]
> >>>>>>>>>>>>>>> Run 2: PASS
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
> >>> [true]
> >>>>>>>>> but
> >>>>>>>>>>>>>>> found [false]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
> >>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
> >>> against
> >>>>>>>>> that,
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
> >>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is
> merged
> >>>>>>>>> yet. I
> >>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> -jordan
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
> >>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
> >>> same
> >>>>>>>>> stuff
> >>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>> merging your fix:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Failed tests:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
> >>> child
> >>>>>>>>>>>>> watchers
> >>>>>>>>>>>>>>>>>> are still registered: [/test]
> >>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
> >>> child
> >>>>>>>>>>>>> watchers
> >>>>>>>>>>>>>>>>>> are still registered: [/test]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>>> Run 1:
> >>>>>>>>>>>>>
> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>>>>>>>>>>>>> Run 2:
> >>>>>>>>>>>>>
> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
> >>> more
> >>>>>>>>> child
> >>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
> >>> more
> >>>>>>>>> child
> >>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294
> expected
> >>>>>>>>> [true]
> >>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>> found [false]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>>>>>>>>>>> Run 1: PASS
> >>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One
> or
> >>>>> more
> >>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
> >>>>>>>>> watchers are
> >>>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>> registered: [/count]
> >>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
> >>>>>>>>> watchers are
> >>>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>> registered: [/count]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
> >>>>> spend
> >>>>>>>>> some
> >>>>>>>>>>>>>>>>> time on
> >>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still
> supposed
> >>> to
> >>>>>>>>> get
> >>>>>>>>>>> set
> >>>>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
> >>>>> handling
> >>>>>>>>> it.
> >>>>>>>>>>>>> But,
> >>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are
> some
> >>>>>>>>>>>>> significant
> >>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
> >>> what
> >>>>>>>>>>>>>>>>> ZooKeeper does
> >>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
> >>> the
> >>>>>>>>> whole
> >>>>>>>>>>> ZK
> >>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
> >>> mutator
> >>>>>>>>> APIs.
> >>>>>>>>>>>>>>>>> But, of
> >>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
> >>> consistently
> >>>>>>>>> on the
> >>>>>>>>>>>>> 3.0
> >>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
> >>> bug
> >>>>>>>>> in the
> >>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference.
> I've
> >>>>> had a
> >>>>>>>>>>> quick
> >>>>>>>>>>>>>>>>> look
> >>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's
> the
> >>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got
> time,
> >>>>> can
> >>>>>>>>> you
> >>>>>>>>>>>>>>>>> have a
> >>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more
> digging.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks Scott.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
> >>> onto
> >>>>>>>>> Nexus.
> >>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
> >>>>>>>>>>>>> dragonsinth@gmail.com
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
> >>> both
> >>>>>>>>>>> master
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> 3.0.
> >>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they
> are
> >>>>>>>>> failing
> >>>>>>>>>>>>>>>>> there.
> >>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
> >>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman
> <
> >>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
> >>> few
> >>>>>>>>> times
> >>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>>>>>>> love:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>
> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
> >>>>>>>>>>>>>>>>>>>>>>>>> actual 6
> >>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
> >>>>> morning.
> >>>>>>>>>>> Given
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we
> just
> >>>>> want
> >>>>>>>>> to
> >>>>>>>>>>>>> vote
> >>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
> >>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
> >>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan
> Zimmerman
> >>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron
> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
> >>> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
> >>>>>>>>> validation
> >>>>>>>>>>>>>>>>> stuff.
> >>>>>>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
> >>>>>>>>> Because
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> unit
> >>>>>>>>>>>>>>>>>>>>>>> test
> >>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an
> exception
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
> >>>>>>>>> acling.getAclList(adjustedPath);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> data,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>   pathInBackground(adjustedPath, data,
> >>>>>>>>> givenPath);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
> >>>>> force a
> >>>>>>>>>>>>> failure
> >>>>>>>>>>>>>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the
> UnhandledErrorListener,
> >>>>> the
> >>>>>>>>>>>>>>>>>>>>>> expectation is
> >>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
> >>> operations?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
> >>> McKenzie
> >>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
> >>> there,
> >>>>> so
> >>>>>>>>>>> maybe
> >>>>>>>>>>>>>>>>>>>>>>> something
> >>>>>>>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
> >>> know
> >>>>> if
> >>>>>>>>> I
> >>>>>>>>>>> get
> >>>>>>>>>>>>>>>>>>>>>> stuck.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
> >>>>> Zimmerman <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared
> >>> it to
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>> master
> >>>>>>>>>>>>>>>>>>>>>>>> branch?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
> >>>>> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
> >>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
> >>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
> >>> seems to
> >>>>>>>>> try
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>> provoke
> >>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
> >>>>>>>>>>>>> CreateBuilderImpl
> >>>>>>>>>>>>>>>>>>>>>> prior
> >>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
> >>>>>>>>> exception
> >>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So,
> it
> >>>>> just
> >>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
> >>>>>>>>> propogated up
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> stack
> >>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
> >>>>> don't
> >>>>>>>>>>>>>>>>> understand
> >>>>>>>>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>> ever
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
I also had to add:

while(!lost.get() && (counter.get() > 0))
{
    Thread.sleep(1000);
}
Which seems more correct to me.

> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> I have just pushed an interprocess_mutex_issue branch. The test case is in
> TestInterprocessMutexNotReconnecting
> 
> For me it's failing around 20% of the time.
> cheers
> 
> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> 
>> Yep, just let me confirm that it's actually getting the same problem. I'm
>> sure it was before, but I've just run it a bunch of times and everything's
>> been fine.
>> 
>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com> wrote:
>> 
>>> Can you push your unit test somewhere?
>>> 
>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <mc...@gmail.com>
>>> wrote:
>>>> 
>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2 though.
>>>> I've written a simplified unit test that just has a bunch of clients
>>>> attempting to grab a lease on the semaphore. When I shutdown and
>>> restart ZK
>>>> about 25% of the time, none of the clients can reacquire the semaphore.
>>>> 
>>>> Still trying to work out what's going on, but I'm probably not going to
>>>> have a lot of time today to look at it.
>>>> cheers
>>>> 
>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>>>> jordan@jordanzimmerman.com> wrote:
>>>> 
>>>>> Odd - SemaphoreClient does seem wrong.
>>>>> 
>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <mc...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> It looks like under some circumstances (which I haven't worked out
>>> yet)
>>>>>> that the InterprocessMutex acquire() is not working correctly when
>>>>>> reconnecting to ZK. Still digging into why this is.
>>>>>> 
>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
>>> missing
>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
>>> throws
>>>>> an
>>>>>> exception if they return true. As far as I can work out, this means
>>> that
>>>>>> whenever the lock is acquired, an exception gets thrown indicating
>>> that
>>>>>> there are Multiple acquirers.
>>>>>> 
>>>>>> This test is failing fairly consistently. It seems to be the remaining
>>>>> test
>>>>>> that keeps failing in the Jenkins build also
>>>>>> cheers
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>>> mckenzie.cam@gmail.com
>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Looks like I was incorrect. The NoWatcherException is being thrown on
>>>>>>> success as well, and the problem is not in the cluster restart. Will
>>>>> keep
>>>>>>> digging.
>>>>>>> 
>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>>> (assertion
>>>>> at
>>>>>>>> line 294). Again, it seems like some sort of race condition with the
>>>>>>>> watcher removal.
>>>>>>>> 
>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
>>> fails
>>>>>>>> it seems that it's got something to do with watcher removal. When
>>> the
>>>>> test
>>>>>>>> passes, this error is not logged.
>>>>>>>> 
>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>>>>> KeeperErrorCode
>>>>>>>> = No such watcher for /foo/bar/lock/leases
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>>>>> at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>> at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>>>>> 
>>>>>>>> Is it possible it's something to do with the way that the cluster is
>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new one is
>>>>> just
>>>>>>>> created.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>> 
>>>>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>>>>> 
>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Maybe we need to look at some way of providing a hook for tests to
>>>>> wait
>>>>>>>>>> reliably for asynch tasks to finish?
>>>>>>>>>> 
>>>>>>>>>> The latest round of tests ran OK. One test failed on an unrelated
>>>>> thing
>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as it's
>>>>>>>>> worked
>>>>>>>>>> ok the next time around.
>>>>>>>>>> 
>>>>>>>>>> I will start getting a release together. Thanks for you help with
>>> the
>>>>>>>>>> updated tests.
>>>>>>>>>> cheers
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> The problem is in-flight watchers and async background calls.
>>>>> There’s
>>>>>>>>> no
>>>>>>>>>>> way to cancel these and they can take time to occur - even after
>>> a
>>>>>>>>> recipe
>>>>>>>>>>> instance is closed.
>>>>>>>>>>> 
>>>>>>>>>>> -Jordan
>>>>>>>>>>> 
>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Ok, running it again now.
>>>>>>>>>>>> 
>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is done
>>>>>>>>>>>> asynchronously after they are closed?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>>> checker.
>>>>> If
>>>>>>>>>>> there
>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
>>> directly
>>>>>>>>> in
>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
>>> again
>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>>> still
>>>>>>>>>>>>> registered:
>>>>>>>>>>>>>>> [/test]
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>>> more
>>>>>>>>> child
>>>>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>> [true]
>>>>>>>>> but
>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>>> against
>>>>>>>>> that,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged
>>>>>>>>> yet. I
>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
>>> same
>>>>>>>>> stuff
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
>>> child
>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
>>> child
>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>>> more
>>>>>>>>> child
>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>>> more
>>>>>>>>> child
>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>>>>>> [true]
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or
>>>>> more
>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
>>>>> spend
>>>>>>>>> some
>>>>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still supposed
>>> to
>>>>>>>>> get
>>>>>>>>>>> set
>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>>>>> handling
>>>>>>>>> it.
>>>>>>>>>>>>> But,
>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are some
>>>>>>>>>>>>> significant
>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
>>> what
>>>>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
>>> the
>>>>>>>>> whole
>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>>> mutator
>>>>>>>>> APIs.
>>>>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>>> consistently
>>>>>>>>> on the
>>>>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
>>> bug
>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've
>>>>> had a
>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's the
>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time,
>>>>> can
>>>>>>>>> you
>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more digging.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
>>> onto
>>>>>>>>> Nexus.
>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>>>>> dragonsinth@gmail.com
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
>>> both
>>>>>>>>>>> master
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are
>>>>>>>>> failing
>>>>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
>>> few
>>>>>>>>> times
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>>>>> morning.
>>>>>>>>>>> Given
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just
>>>>> want
>>>>>>>>> to
>>>>>>>>>>>>> vote
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman
>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>>>>>>>> validation
>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>>>>>>>> Because
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   pathInBackground(adjustedPath, data,
>>>>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
>>>>> force a
>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener,
>>>>> the
>>>>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>>> operations?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>>> McKenzie
>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>>> there,
>>>>> so
>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>>> know
>>>>> if
>>>>>>>>> I
>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>>>>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared
>>> it to
>>>>>>>>> the
>>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>>> seems to
>>>>>>>>> try
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>>>>> exception
>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it
>>>>> just
>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>>>>> propogated up
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
>>>>> don't
>>>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 


Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Thanks - I’ll take a look.

> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> I have just pushed an interprocess_mutex_issue branch. The test case is in
> TestInterprocessMutexNotReconnecting
> 
> For me it's failing around 20% of the time.
> cheers
> 
> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> 
>> Yep, just let me confirm that it's actually getting the same problem. I'm
>> sure it was before, but I've just run it a bunch of times and everything's
>> been fine.
>> 
>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com> wrote:
>> 
>>> Can you push your unit test somewhere?
>>> 
>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <mc...@gmail.com>
>>> wrote:
>>>> 
>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2 though.
>>>> I've written a simplified unit test that just has a bunch of clients
>>>> attempting to grab a lease on the semaphore. When I shutdown and
>>> restart ZK
>>>> about 25% of the time, none of the clients can reacquire the semaphore.
>>>> 
>>>> Still trying to work out what's going on, but I'm probably not going to
>>>> have a lot of time today to look at it.
>>>> cheers
>>>> 
>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>>>> jordan@jordanzimmerman.com> wrote:
>>>> 
>>>>> Odd - SemaphoreClient does seem wrong.
>>>>> 
>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <mc...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> It looks like under some circumstances (which I haven't worked out
>>> yet)
>>>>>> that the InterprocessMutex acquire() is not working correctly when
>>>>>> reconnecting to ZK. Still digging into why this is.
>>>>>> 
>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
>>> missing
>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
>>> throws
>>>>> an
>>>>>> exception if they return true. As far as I can work out, this means
>>> that
>>>>>> whenever the lock is acquired, an exception gets thrown indicating
>>> that
>>>>>> there are Multiple acquirers.
>>>>>> 
>>>>>> This test is failing fairly consistently. It seems to be the remaining
>>>>> test
>>>>>> that keeps failing in the Jenkins build also
>>>>>> cheers
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>>> mckenzie.cam@gmail.com
>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Looks like I was incorrect. The NoWatcherException is being thrown on
>>>>>>> success as well, and the problem is not in the cluster restart. Will
>>>>> keep
>>>>>>> digging.
>>>>>>> 
>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>>> (assertion
>>>>> at
>>>>>>>> line 294). Again, it seems like some sort of race condition with the
>>>>>>>> watcher removal.
>>>>>>>> 
>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
>>> fails
>>>>>>>> it seems that it's got something to do with watcher removal. When
>>> the
>>>>> test
>>>>>>>> passes, this error is not logged.
>>>>>>>> 
>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>>>>> KeeperErrorCode
>>>>>>>> = No such watcher for /foo/bar/lock/leases
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>>>>> at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>> at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>>>>> 
>>>>>>>> Is it possible it's something to do with the way that the cluster is
>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new one is
>>>>> just
>>>>>>>> created.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>> 
>>>>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>>>>> 
>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Maybe we need to look at some way of providing a hook for tests to
>>>>> wait
>>>>>>>>>> reliably for asynch tasks to finish?
>>>>>>>>>> 
>>>>>>>>>> The latest round of tests ran OK. One test failed on an unrelated
>>>>> thing
>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as it's
>>>>>>>>> worked
>>>>>>>>>> ok the next time around.
>>>>>>>>>> 
>>>>>>>>>> I will start getting a release together. Thanks for you help with
>>> the
>>>>>>>>>> updated tests.
>>>>>>>>>> cheers
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> The problem is in-flight watchers and async background calls.
>>>>> There’s
>>>>>>>>> no
>>>>>>>>>>> way to cancel these and they can take time to occur - even after
>>> a
>>>>>>>>> recipe
>>>>>>>>>>> instance is closed.
>>>>>>>>>>> 
>>>>>>>>>>> -Jordan
>>>>>>>>>>> 
>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Ok, running it again now.
>>>>>>>>>>>> 
>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is done
>>>>>>>>>>>> asynchronously after they are closed?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>>> checker.
>>>>> If
>>>>>>>>>>> there
>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
>>> directly
>>>>>>>>> in
>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
>>> again
>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>>> still
>>>>>>>>>>>>> registered:
>>>>>>>>>>>>>>> [/test]
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>>> more
>>>>>>>>> child
>>>>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>> [true]
>>>>>>>>> but
>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>>> against
>>>>>>>>> that,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged
>>>>>>>>> yet. I
>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
>>> same
>>>>>>>>> stuff
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
>>> child
>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
>>> child
>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>>> more
>>>>>>>>> child
>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>>> more
>>>>>>>>> child
>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>>>>>> [true]
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or
>>>>> more
>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
>>>>> spend
>>>>>>>>> some
>>>>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still supposed
>>> to
>>>>>>>>> get
>>>>>>>>>>> set
>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>>>>> handling
>>>>>>>>> it.
>>>>>>>>>>>>> But,
>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are some
>>>>>>>>>>>>> significant
>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
>>> what
>>>>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
>>> the
>>>>>>>>> whole
>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>>> mutator
>>>>>>>>> APIs.
>>>>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>>> consistently
>>>>>>>>> on the
>>>>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
>>> bug
>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've
>>>>> had a
>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's the
>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time,
>>>>> can
>>>>>>>>> you
>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more digging.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
>>> onto
>>>>>>>>> Nexus.
>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>>>>> dragonsinth@gmail.com
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
>>> both
>>>>>>>>>>> master
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are
>>>>>>>>> failing
>>>>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
>>> few
>>>>>>>>> times
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>>>>> morning.
>>>>>>>>>>> Given
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just
>>>>> want
>>>>>>>>> to
>>>>>>>>>>>>> vote
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman
>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>>>>>>>> validation
>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>>>>>>>> Because
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   pathInBackground(adjustedPath, data,
>>>>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
>>>>> force a
>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener,
>>>>> the
>>>>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>>> operations?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>>> McKenzie
>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>>> there,
>>>>> so
>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>>> know
>>>>> if
>>>>>>>>> I
>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>>>>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared
>>> it to
>>>>>>>>> the
>>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>>> seems to
>>>>>>>>> try
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>>>>> exception
>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it
>>>>> just
>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>>>>> propogated up
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
>>>>> don't
>>>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 


Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
I don’t fully understand the test. MAX_SEMAPHORES is 1 so the counter can only every be 1. Also, it seems to me that the finally block in the thread should decrement the counter if the lease is released:

if(lease != null) {
    counter.decrementAndGet();
    semaphore.returnLease(lease);
}
Once I add this the test never fails for me. Am I misunderstanding?

-Jordan

> On Jun 1, 2016, at 9:07 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> I have just pushed an interprocess_mutex_issue branch. The test case is in
> TestInterprocessMutexNotReconnecting
> 
> For me it's failing around 20% of the time.
> cheers
> 
> On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> 
>> Yep, just let me confirm that it's actually getting the same problem. I'm
>> sure it was before, but I've just run it a bunch of times and everything's
>> been fine.
>> 
>> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com> wrote:
>> 
>>> Can you push your unit test somewhere?
>>> 
>>>> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <mc...@gmail.com>
>>> wrote:
>>>> 
>>>> Indeed. There seems to be a problem with InterProcessSemaphoreV2 though.
>>>> I've written a simplified unit test that just has a bunch of clients
>>>> attempting to grab a lease on the semaphore. When I shutdown and
>>> restart ZK
>>>> about 25% of the time, none of the clients can reacquire the semaphore.
>>>> 
>>>> Still trying to work out what's going on, but I'm probably not going to
>>>> have a lot of time today to look at it.
>>>> cheers
>>>> 
>>>> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>>>> jordan@jordanzimmerman.com> wrote:
>>>> 
>>>>> Odd - SemaphoreClient does seem wrong.
>>>>> 
>>>>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <mc...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> It looks like under some circumstances (which I haven't worked out
>>> yet)
>>>>>> that the InterprocessMutex acquire() is not working correctly when
>>>>>> reconnecting to ZK. Still digging into why this is.
>>>>>> 
>>>>>> There also seems to be a bug in the SemaphoreClient, unless I'm
>>> missing
>>>>>> something. At lines 126 and 140 it does compareAndSet() calls but
>>> throws
>>>>> an
>>>>>> exception if they return true. As far as I can work out, this means
>>> that
>>>>>> whenever the lock is acquired, an exception gets thrown indicating
>>> that
>>>>>> there are Multiple acquirers.
>>>>>> 
>>>>>> This test is failing fairly consistently. It seems to be the remaining
>>>>> test
>>>>>> that keeps failing in the Jenkins build also
>>>>>> cheers
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>>> mckenzie.cam@gmail.com
>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Looks like I was incorrect. The NoWatcherException is being thrown on
>>>>>>> success as well, and the problem is not in the cluster restart. Will
>>>>> keep
>>>>>>> digging.
>>>>>>> 
>>>>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>>> (assertion
>>>>> at
>>>>>>>> line 294). Again, it seems like some sort of race condition with the
>>>>>>>> watcher removal.
>>>>>>>> 
>>>>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
>>> fails
>>>>>>>> it seems that it's got something to do with watcher removal. When
>>> the
>>>>> test
>>>>>>>> passes, this error is not logged.
>>>>>>>> 
>>>>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>>>>> KeeperErrorCode
>>>>>>>> = No such watcher for /foo/bar/lock/leases
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>>>>> at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>>>>> at
>>>>>>>> 
>>>>> 
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>>>>> at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>>>>> 
>>>>>>>> Is it possible it's something to do with the way that the cluster is
>>>>>>>> restarted at line 282? The old cluster is not shutdown, a new one is
>>>>> just
>>>>>>>> created.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>> 
>>>>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>>>>> 
>>>>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>>>>> mckenzie.cam@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Maybe we need to look at some way of providing a hook for tests to
>>>>> wait
>>>>>>>>>> reliably for asynch tasks to finish?
>>>>>>>>>> 
>>>>>>>>>> The latest round of tests ran OK. One test failed on an unrelated
>>>>> thing
>>>>>>>>>> (ConnectionLoss), but this appears to be a transient thing as it's
>>>>>>>>> worked
>>>>>>>>>> ok the next time around.
>>>>>>>>>> 
>>>>>>>>>> I will start getting a release together. Thanks for you help with
>>> the
>>>>>>>>>> updated tests.
>>>>>>>>>> cheers
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> The problem is in-flight watchers and async background calls.
>>>>> There’s
>>>>>>>>> no
>>>>>>>>>>> way to cancel these and they can take time to occur - even after
>>> a
>>>>>>>>> recipe
>>>>>>>>>>> instance is closed.
>>>>>>>>>>> 
>>>>>>>>>>> -Jordan
>>>>>>>>>>> 
>>>>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Ok, running it again now.
>>>>>>>>>>>> 
>>>>>>>>>>>> Is the problem that the watcher clean up for the recipes is done
>>>>>>>>>>>> asynchronously after they are closed?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>>> checker.
>>>>> If
>>>>>>>>>>> there
>>>>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Looks like these failures are intermittent. Running them
>>> directly
>>>>>>>>> in
>>>>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
>>> again
>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>>> still
>>>>>>>>>>>>> registered:
>>>>>>>>>>>>>>> [/test]
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>>> more
>>>>>>>>> child
>>>>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>> [true]
>>>>>>>>> but
>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>>> against
>>>>>>>>> that,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged
>>>>>>>>> yet. I
>>>>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
>>> same
>>>>>>>>> stuff
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
>>> child
>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
>>> child
>>>>>>>>>>>>> watchers
>>>>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>>> more
>>>>>>>>> child
>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>>> more
>>>>>>>>> child
>>>>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>>>>>> [true]
>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or
>>>>> more
>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>>>>> watchers are
>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
>>>>> spend
>>>>>>>>> some
>>>>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still supposed
>>> to
>>>>>>>>> get
>>>>>>>>>>> set
>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>>>>> handling
>>>>>>>>> it.
>>>>>>>>>>>>> But,
>>>>>>>>>>>>>>>>>>> while I was looking at the code I realized there are some
>>>>>>>>>>>>> significant
>>>>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
>>> what
>>>>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
>>> the
>>>>>>>>> whole
>>>>>>>>>>> ZK
>>>>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>>> mutator
>>>>>>>>> APIs.
>>>>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>>> consistently
>>>>>>>>> on the
>>>>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
>>> bug
>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've
>>>>> had a
>>>>>>>>>>> quick
>>>>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's the
>>>>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time,
>>>>> can
>>>>>>>>> you
>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more digging.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
>>> onto
>>>>>>>>> Nexus.
>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>>>>> dragonsinth@gmail.com
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
>>> both
>>>>>>>>>>> master
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are
>>>>>>>>> failing
>>>>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
>>> few
>>>>>>>>> times
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>>>>> morning.
>>>>>>>>>>> Given
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just
>>>>> want
>>>>>>>>> to
>>>>>>>>>>>>> vote
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman
>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>>>>>>>> validation
>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>>>>>>>> Because
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   pathInBackground(adjustedPath, data,
>>>>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
>>>>> force a
>>>>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener,
>>>>> the
>>>>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>>> operations?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>>> McKenzie
>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>>> there,
>>>>> so
>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>>> know
>>>>> if
>>>>>>>>> I
>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>>>>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared
>>> it to
>>>>>>>>> the
>>>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>>>>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>>> seems to
>>>>>>>>> try
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>>>>> exception
>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it
>>>>> just
>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>>>>> propogated up
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
>>>>> don't
>>>>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
I have just pushed an interprocess_mutex_issue branch. The test case is in
TestInterprocessMutexNotReconnecting

For me it's failing around 20% of the time.
cheers

On Thu, Jun 2, 2016 at 11:17 AM, Cameron McKenzie <mc...@gmail.com>
wrote:

> Yep, just let me confirm that it's actually getting the same problem. I'm
> sure it was before, but I've just run it a bunch of times and everything's
> been fine.
>
> On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
>> Can you push your unit test somewhere?
>>
>> > On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>> >
>> > Indeed. There seems to be a problem with InterProcessSemaphoreV2 though.
>> > I've written a simplified unit test that just has a bunch of clients
>> > attempting to grab a lease on the semaphore. When I shutdown and
>> restart ZK
>> > about 25% of the time, none of the clients can reacquire the semaphore.
>> >
>> > Still trying to work out what's going on, but I'm probably not going to
>> > have a lot of time today to look at it.
>> > cheers
>> >
>> > On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
>> > jordan@jordanzimmerman.com> wrote:
>> >
>> >> Odd - SemaphoreClient does seem wrong.
>> >>
>> >>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <mc...@gmail.com>
>> >> wrote:
>> >>>
>> >>> It looks like under some circumstances (which I haven't worked out
>> yet)
>> >>> that the InterprocessMutex acquire() is not working correctly when
>> >>> reconnecting to ZK. Still digging into why this is.
>> >>>
>> >>> There also seems to be a bug in the SemaphoreClient, unless I'm
>> missing
>> >>> something. At lines 126 and 140 it does compareAndSet() calls but
>> throws
>> >> an
>> >>> exception if they return true. As far as I can work out, this means
>> that
>> >>> whenever the lock is acquired, an exception gets thrown indicating
>> that
>> >>> there are Multiple acquirers.
>> >>>
>> >>> This test is failing fairly consistently. It seems to be the remaining
>> >> test
>> >>> that keeps failing in the Jenkins build also
>> >>> cheers
>> >>>
>> >>>
>> >>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
>> mckenzie.cam@gmail.com
>> >>>
>> >>> wrote:
>> >>>
>> >>>> Looks like I was incorrect. The NoWatcherException is being thrown on
>> >>>> success as well, and the problem is not in the cluster restart. Will
>> >> keep
>> >>>> digging.
>> >>>>
>> >>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>> >> mckenzie.cam@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> TestInterProcessSemaphoreCluster.testCluster() is failling
>> (assertion
>> >> at
>> >>>>> line 294). Again, it seems like some sort of race condition with the
>> >>>>> watcher removal.
>> >>>>>
>> >>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
>> fails
>> >>>>> it seems that it's got something to do with watcher removal. When
>> the
>> >> test
>> >>>>> passes, this error is not logged.
>> >>>>>
>> >>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>> >> KeeperErrorCode
>> >>>>> = No such watcher for /foo/bar/lock/leases
>> >>>>> at
>> >>>>>
>> >>
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>> >>>>> at
>> >>>>>
>> >>
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>> >>>>> at
>> >>>>>
>> >>
>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>> >>>>> at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>> >>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>> >>>>> at
>> >>>>>
>> >>
>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>> >>>>> at
>> >>>>>
>> >>
>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>> >>>>> at
>> >>>>>
>> >>
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>> >>>>> at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>> >>>>>
>> >>>>> Is it possible it's something to do with the way that the cluster is
>> >>>>> restarted at line 282? The old cluster is not shutdown, a new one is
>> >> just
>> >>>>> created.
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>> >>>>> jordan@jordanzimmerman.com> wrote:
>> >>>>>
>> >>>>>> I’ll try to address this as part of CURATOR-333
>> >>>>>>
>> >>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>> >> mckenzie.cam@gmail.com>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Maybe we need to look at some way of providing a hook for tests to
>> >> wait
>> >>>>>>> reliably for asynch tasks to finish?
>> >>>>>>>
>> >>>>>>> The latest round of tests ran OK. One test failed on an unrelated
>> >> thing
>> >>>>>>> (ConnectionLoss), but this appears to be a transient thing as it's
>> >>>>>> worked
>> >>>>>>> ok the next time around.
>> >>>>>>>
>> >>>>>>> I will start getting a release together. Thanks for you help with
>> the
>> >>>>>>> updated tests.
>> >>>>>>> cheers
>> >>>>>>>
>> >>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>> >>>>>> jordan@jordanzimmerman.com
>> >>>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> The problem is in-flight watchers and async background calls.
>> >> There’s
>> >>>>>> no
>> >>>>>>>> way to cancel these and they can take time to occur - even after
>> a
>> >>>>>> recipe
>> >>>>>>>> instance is closed.
>> >>>>>>>>
>> >>>>>>>> -Jordan
>> >>>>>>>>
>> >>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>> >>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Ok, running it again now.
>> >>>>>>>>>
>> >>>>>>>>> Is the problem that the watcher clean up for the recipes is done
>> >>>>>>>>> asynchronously after they are closed?
>> >>>>>>>>>
>> >>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>> >>>>>>>> jordan@jordanzimmerman.com
>> >>>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
>> checker.
>> >> If
>> >>>>>>>> there
>> >>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>> >>>>>>>>>>
>> >>>>>>>>>> -Jordan
>> >>>>>>>>>>
>> >>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>> >>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Looks like these failures are intermittent. Running them
>> directly
>> >>>>>> in
>> >>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
>> again
>> >>>>>> in
>> >>>>>>>> the
>> >>>>>>>>>>> morning and see how it goes.
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>> >>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> There are still 2 tests failing for me:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> FAILURE! - in
>> >>>>>>>>>>>>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>> >>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>> >>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are
>> still
>> >>>>>>>>>> registered:
>> >>>>>>>>>>>> [/test]
>> >>>>>>>>>>>> at
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>> >>>>>>>>>>>> at
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> FAILURE! - in
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>> >>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>> >>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>> >>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>> >>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>> >>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>> >>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>> >>>>>>>>>>>> at
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Failed tests:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>> >>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or
>> more
>> >>>>>> child
>> >>>>>>>>>>>> watchers are still registered: [/test]
>> >>>>>>>>>>>> Run 2: PASS
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>> [true]
>> >>>>>> but
>> >>>>>>>>>>>> found [false]
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>> >>>>>>>>>> mckenzie.cam@gmail.com
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
>> against
>> >>>>>> that,
>> >>>>>>>>>> and
>> >>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>> >>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>> >>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged
>> >>>>>> yet. I
>> >>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> -jordan
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>> >>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
>> same
>> >>>>>> stuff
>> >>>>>>>>>>>>>> after
>> >>>>>>>>>>>>>>> merging your fix:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Failed tests:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>> >>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
>> child
>> >>>>>>>>>> watchers
>> >>>>>>>>>>>>>>> are still registered: [/test]
>> >>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
>> child
>> >>>>>>>>>> watchers
>> >>>>>>>>>>>>>>> are still registered: [/test]
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>> >>>>>>>>>>>>>>> Run 1:
>> >>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>> >>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>> >>>>>>>>>>>>>>> Run 2:
>> >>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>> >>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>> >>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or
>> more
>> >>>>>> child
>> >>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>> >>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or
>> more
>> >>>>>> child
>> >>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>> >>>>>> [true]
>> >>>>>>>> but
>> >>>>>>>>>>>>>>> found [false]
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>> >>>>>>>>>>>>>>> Run 1: PASS
>> >>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or
>> >> more
>> >>>>>>>> data
>> >>>>>>>>>>>>>>> watchers are still registered: [/count]
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>> >>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>> >>>>>> watchers are
>> >>>>>>>>>>>>>> still
>> >>>>>>>>>>>>>>> registered: [/count]
>> >>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>> >>>>>> watchers are
>> >>>>>>>>>>>>>> still
>> >>>>>>>>>>>>>>> registered: [/count]
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>> >>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
>> >> spend
>> >>>>>> some
>> >>>>>>>>>>>>>> time on
>> >>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still supposed
>> to
>> >>>>>> get
>> >>>>>>>> set
>> >>>>>>>>>>>>>> when
>> >>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>> >> handling
>> >>>>>> it.
>> >>>>>>>>>> But,
>> >>>>>>>>>>>>>>>> while I was looking at the code I realized there are some
>> >>>>>>>>>> significant
>> >>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
>> what
>> >>>>>>>>>>>>>> ZooKeeper does
>> >>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
>> the
>> >>>>>> whole
>> >>>>>>>> ZK
>> >>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
>> mutator
>> >>>>>> APIs.
>> >>>>>>>>>>>>>> But, of
>> >>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> -Jordan
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>> >>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Thanks Scott,
>> >>>>>>>>>>>>>>>>> Those tests are now passing for me.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
>> consistently
>> >>>>>> on the
>> >>>>>>>>>> 3.0
>> >>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
>> bug
>> >>>>>> in the
>> >>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've
>> >> had a
>> >>>>>>>> quick
>> >>>>>>>>>>>>>> look
>> >>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's the
>> >>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time,
>> >> can
>> >>>>>> you
>> >>>>>>>>>>>>>> have a
>> >>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more digging.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>> >>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Thanks Scott.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
>> onto
>> >>>>>> Nexus.
>> >>>>>>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>> >>>>>>>>>> dragonsinth@gmail.com
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
>> both
>> >>>>>>>> master
>> >>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>> 3.0.
>> >>>>>>>>>>>>>>>>>>> Where should I push the fix?
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>> >>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Thanks Scott,
>> >>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are
>> >>>>>> failing
>> >>>>>>>>>>>>>> there.
>> >>>>>>>>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>> >>>>>>>>>>>>>> dragonsinth@gmail.com>
>> >>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
>> >>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> -Jordan
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>> >>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
>> few
>> >>>>>> times
>> >>>>>>>>>> but
>> >>>>>>>>>>>>>> no
>> >>>>>>>>>>>>>>>>>>>> love:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>> >>>>>>>>>>>>>>>>>>>>>> actual 6
>> >>>>>>>>>>>>>>>>>>>>>>> expected -31:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>> >> morning.
>> >>>>>>>> Given
>> >>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>> these
>> >>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just
>> >> want
>> >>>>>> to
>> >>>>>>>>>> vote
>> >>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>> 2.11.0
>> >>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>> >>>>>>>>>>>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman
>> <
>> >>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> ====================
>> >>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
>> >>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
>> McKenzie <
>> >>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>> >>>>>> validation
>> >>>>>>>>>>>>>> stuff.
>> >>>>>>>>>>>>>>>>>>> It
>> >>>>>>>>>>>>>>>>>>>>> now
>> >>>>>>>>>>>>>>>>>>>>>>>> does
>> >>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>> >>>>>> Because
>> >>>>>>>> the
>> >>>>>>>>>>>>>> unit
>> >>>>>>>>>>>>>>>>>>>> test
>> >>>>>>>>>>>>>>>>>>>>>>>> uses a
>> >>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>> >>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>> >>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>> >>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>> >>>>>> acling.getAclList(adjustedPath);
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>
>> >> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>> >>>>>>>>>>>>>>>>>>>>>>>> data,
>> >>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>> >>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>> >>>>>>>>>>>>>>>>>>>>>>>>>> {
>> >>>>>>>>>>>>>>>>>>>>>>>>>>    pathInBackground(adjustedPath, data,
>> >>>>>> givenPath);
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
>> >> force a
>> >>>>>>>>>> failure
>> >>>>>>>>>>>>>>>>>>> in a
>> >>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener,
>> >> the
>> >>>>>>>>>>>>>>>>>>> expectation is
>> >>>>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
>> operations?
>> >>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
>> McKenzie
>> >> <
>> >>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes
>> there,
>> >> so
>> >>>>>>>> maybe
>> >>>>>>>>>>>>>>>>>>>> something
>> >>>>>>>>>>>>>>>>>>>>>> has
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you
>> know
>> >> if
>> >>>>>> I
>> >>>>>>>> get
>> >>>>>>>>>>>>>>>>>>> stuck.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>> >> Zimmerman <
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared
>> it to
>> >>>>>> the
>> >>>>>>>>>>>>>> master
>> >>>>>>>>>>>>>>>>>>>>> branch?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>> >> McKenzie <
>> >>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>> >>>>>>>>>> TestFrameworkBackground:testErrorListener
>> >>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It
>> seems to
>> >>>>>> try
>> >>>>>>>> and
>> >>>>>>>>>>>>>>>>>>> provoke
>> >>>>>>>>>>>>>>>>>>>>> an
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>> >>>>>>>>>> CreateBuilderImpl
>> >>>>>>>>>>>>>>>>>>> prior
>> >>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>> >>>>>> exception
>> >>>>>>>> that
>> >>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>> throws
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it
>> >> just
>> >>>>>>>> throws
>> >>>>>>>>>>>>>> an
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>> >>>>>> propogated up
>> >>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>> stack
>> >>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>> which
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
>> >> don't
>> >>>>>>>>>>>>>> understand
>> >>>>>>>>>>>>>>>>>>> how
>> >>>>>>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>>>>>>>> ever
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
Yep, just let me confirm that it's actually getting the same problem. I'm
sure it was before, but I've just run it a bunch of times and everything's
been fine.

On Thu, Jun 2, 2016 at 11:15 AM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Can you push your unit test somewhere?
>
> > On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > Indeed. There seems to be a problem with InterProcessSemaphoreV2 though.
> > I've written a simplified unit test that just has a bunch of clients
> > attempting to grab a lease on the semaphore. When I shutdown and restart
> ZK
> > about 25% of the time, none of the clients can reacquire the semaphore.
> >
> > Still trying to work out what's going on, but I'm probably not going to
> > have a lot of time today to look at it.
> > cheers
> >
> > On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
> > jordan@jordanzimmerman.com> wrote:
> >
> >> Odd - SemaphoreClient does seem wrong.
> >>
> >>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <mc...@gmail.com>
> >> wrote:
> >>>
> >>> It looks like under some circumstances (which I haven't worked out yet)
> >>> that the InterprocessMutex acquire() is not working correctly when
> >>> reconnecting to ZK. Still digging into why this is.
> >>>
> >>> There also seems to be a bug in the SemaphoreClient, unless I'm missing
> >>> something. At lines 126 and 140 it does compareAndSet() calls but
> throws
> >> an
> >>> exception if they return true. As far as I can work out, this means
> that
> >>> whenever the lock is acquired, an exception gets thrown indicating that
> >>> there are Multiple acquirers.
> >>>
> >>> This test is failing fairly consistently. It seems to be the remaining
> >> test
> >>> that keeps failing in the Jenkins build also
> >>> cheers
> >>>
> >>>
> >>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <
> mckenzie.cam@gmail.com
> >>>
> >>> wrote:
> >>>
> >>>> Looks like I was incorrect. The NoWatcherException is being thrown on
> >>>> success as well, and the problem is not in the cluster restart. Will
> >> keep
> >>>> digging.
> >>>>
> >>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> TestInterProcessSemaphoreCluster.testCluster() is failling (assertion
> >> at
> >>>>> line 294). Again, it seems like some sort of race condition with the
> >>>>> watcher removal.
> >>>>>
> >>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it
> fails
> >>>>> it seems that it's got something to do with watcher removal. When the
> >> test
> >>>>> passes, this error is not logged.
> >>>>>
> >>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
> >> KeeperErrorCode
> >>>>> = No such watcher for /foo/bar/lock/leases
> >>>>> at
> >>>>>
> >>
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
> >>>>> at
> >>>>>
> >>
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
> >>>>> at
> >>>>>
> >>
> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
> >>>>> at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
> >>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
> >>>>> at
> >>>>>
> >>
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
> >>>>> at
> >>>>>
> >>
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
> >>>>> at
> >>>>>
> >>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> >>>>> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
> >>>>>
> >>>>> Is it possible it's something to do with the way that the cluster is
> >>>>> restarted at line 282? The old cluster is not shutdown, a new one is
> >> just
> >>>>> created.
> >>>>>
> >>>>>
> >>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
> >>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>
> >>>>>> I’ll try to address this as part of CURATOR-333
> >>>>>>
> >>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
> >> mckenzie.cam@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Maybe we need to look at some way of providing a hook for tests to
> >> wait
> >>>>>>> reliably for asynch tasks to finish?
> >>>>>>>
> >>>>>>> The latest round of tests ran OK. One test failed on an unrelated
> >> thing
> >>>>>>> (ConnectionLoss), but this appears to be a transient thing as it's
> >>>>>> worked
> >>>>>>> ok the next time around.
> >>>>>>>
> >>>>>>> I will start getting a release together. Thanks for you help with
> the
> >>>>>>> updated tests.
> >>>>>>> cheers
> >>>>>>>
> >>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
> >>>>>> jordan@jordanzimmerman.com
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> The problem is in-flight watchers and async background calls.
> >> There’s
> >>>>>> no
> >>>>>>>> way to cancel these and they can take time to occur - even after a
> >>>>>> recipe
> >>>>>>>> instance is closed.
> >>>>>>>>
> >>>>>>>> -Jordan
> >>>>>>>>
> >>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
> >>>>>> mckenzie.cam@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Ok, running it again now.
> >>>>>>>>>
> >>>>>>>>> Is the problem that the watcher clean up for the recipes is done
> >>>>>>>>> asynchronously after they are closed?
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
> >>>>>>>> jordan@jordanzimmerman.com
> >>>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> OK - please try now. I added a loop in the “no watchers”
> checker.
> >> If
> >>>>>>>> there
> >>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
> >>>>>>>>>>
> >>>>>>>>>> -Jordan
> >>>>>>>>>>
> >>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
> >>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Looks like these failures are intermittent. Running them
> directly
> >>>>>> in
> >>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing
> again
> >>>>>> in
> >>>>>>>> the
> >>>>>>>>>>> morning and see how it goes.
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
> >>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> There are still 2 tests failing for me:
> >>>>>>>>>>>>
> >>>>>>>>>>>> FAILURE! - in
> >>>>>>>>>>>>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
> >>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are still
> >>>>>>>>>> registered:
> >>>>>>>>>>>> [/test]
> >>>>>>>>>>>> at
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
> >>>>>>>>>>>> at
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
> >>>>>>>>>>>>
> >>>>>>>>>>>> FAILURE! - in
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
> >>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
> >>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
> >>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
> >>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
> >>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
> >>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
> >>>>>>>>>>>> at
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Failed tests:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more
> >>>>>> child
> >>>>>>>>>>>> watchers are still registered: [/test]
> >>>>>>>>>>>> Run 2: PASS
> >>>>>>>>>>>>
> >>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
> [true]
> >>>>>> but
> >>>>>>>>>>>> found [false]
> >>>>>>>>>>>>
> >>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
> >>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests
> against
> >>>>>> that,
> >>>>>>>>>> and
> >>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
> >>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
> >>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged
> >>>>>> yet. I
> >>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -jordan
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
> >>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the
> same
> >>>>>> stuff
> >>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>> merging your fix:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Failed tests:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more
> child
> >>>>>>>>>> watchers
> >>>>>>>>>>>>>>> are still registered: [/test]
> >>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more
> child
> >>>>>>>>>> watchers
> >>>>>>>>>>>>>>> are still registered: [/test]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>> Run 1:
> >>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>>>>>>>>>> Run 2:
> >>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more
> >>>>>> child
> >>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more
> >>>>>> child
> >>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
> >>>>>> [true]
> >>>>>>>> but
> >>>>>>>>>>>>>>> found [false]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>>>>>>>> Run 1: PASS
> >>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or
> >> more
> >>>>>>>> data
> >>>>>>>>>>>>>>> watchers are still registered: [/count]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
> >>>>>> watchers are
> >>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>> registered: [/count]
> >>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
> >>>>>> watchers are
> >>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>> registered: [/count]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
> >>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
> >> spend
> >>>>>> some
> >>>>>>>>>>>>>> time on
> >>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still supposed
> to
> >>>>>> get
> >>>>>>>> set
> >>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
> >> handling
> >>>>>> it.
> >>>>>>>>>> But,
> >>>>>>>>>>>>>>>> while I was looking at the code I realized there are some
> >>>>>>>>>> significant
> >>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror
> what
> >>>>>>>>>>>>>> ZooKeeper does
> >>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight,
> the
> >>>>>> whole
> >>>>>>>> ZK
> >>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the
> mutator
> >>>>>> APIs.
> >>>>>>>>>>>>>> But, of
> >>>>>>>>>>>>>>>> course, that’s easy for me to say now.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
> >>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>>>>>>> Those tests are now passing for me.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing
> consistently
> >>>>>> on the
> >>>>>>>>>> 3.0
> >>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a
> bug
> >>>>>> in the
> >>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've
> >> had a
> >>>>>>>> quick
> >>>>>>>>>>>>>> look
> >>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's the
> >>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time,
> >> can
> >>>>>> you
> >>>>>>>>>>>>>> have a
> >>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more digging.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks Scott.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2
> onto
> >>>>>> Nexus.
> >>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
> >>>>>>>>>> dragonsinth@gmail.com
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to
> both
> >>>>>>>> master
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> 3.0.
> >>>>>>>>>>>>>>>>>>> Where should I push the fix?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are
> >>>>>> failing
> >>>>>>>>>>>>>> there.
> >>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
> >>>>>>>>>>>>>> dragonsinth@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a
> few
> >>>>>> times
> >>>>>>>>>> but
> >>>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>>>> love:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
> >>>>>>>>>>>>>>>>>>>>>> actual 6
> >>>>>>>>>>>>>>>>>>>>>>> expected -31:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
> >> morning.
> >>>>>>>> Given
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just
> >> want
> >>>>>> to
> >>>>>>>>>> vote
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>> 2.11.0
> >>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
> >>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> ====================
> >>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron
> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
> >>>>>> validation
> >>>>>>>>>>>>>> stuff.
> >>>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
> >>>>>> Because
> >>>>>>>> the
> >>>>>>>>>>>>>> unit
> >>>>>>>>>>>>>>>>>>>> test
> >>>>>>>>>>>>>>>>>>>>>>>> uses a
> >>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
> >>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
> >>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
> >>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
> >>>>>> acling.getAclList(adjustedPath);
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>
> >> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
> >>>>>>>>>>>>>>>>>>>>>>>> data,
> >>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
> >>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
> >>>>>>>>>>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>>>>>>>>>>    pathInBackground(adjustedPath, data,
> >>>>>> givenPath);
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
> >> force a
> >>>>>>>>>> failure
> >>>>>>>>>>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener,
> >> the
> >>>>>>>>>>>>>>>>>>> expectation is
> >>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding
> operations?
> >>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron
> McKenzie
> >> <
> >>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there,
> >> so
> >>>>>>>> maybe
> >>>>>>>>>>>>>>>>>>>> something
> >>>>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know
> >> if
> >>>>>> I
> >>>>>>>> get
> >>>>>>>>>>>>>>>>>>> stuck.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
> >> Zimmerman <
> >>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it
> to
> >>>>>> the
> >>>>>>>>>>>>>> master
> >>>>>>>>>>>>>>>>>>>>> branch?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
> >> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
> >>>>>>>>>> TestFrameworkBackground:testErrorListener
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems
> to
> >>>>>> try
> >>>>>>>> and
> >>>>>>>>>>>>>>>>>>> provoke
> >>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> error
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
> >>>>>>>>>> CreateBuilderImpl
> >>>>>>>>>>>>>>>>>>> prior
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
> >>>>>> exception
> >>>>>>>> that
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it
> >> just
> >>>>>>>> throws
> >>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
> >>>>>> propogated up
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> stack
> >>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
> >> don't
> >>>>>>>>>>>>>> understand
> >>>>>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>> ever
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Can you push your unit test somewhere?

> On Jun 1, 2016, at 7:37 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> Indeed. There seems to be a problem with InterProcessSemaphoreV2 though.
> I've written a simplified unit test that just has a bunch of clients
> attempting to grab a lease on the semaphore. When I shutdown and restart ZK
> about 25% of the time, none of the clients can reacquire the semaphore.
> 
> Still trying to work out what's going on, but I'm probably not going to
> have a lot of time today to look at it.
> cheers
> 
> On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
> 
>> Odd - SemaphoreClient does seem wrong.
>> 
>>> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>>> 
>>> It looks like under some circumstances (which I haven't worked out yet)
>>> that the InterprocessMutex acquire() is not working correctly when
>>> reconnecting to ZK. Still digging into why this is.
>>> 
>>> There also seems to be a bug in the SemaphoreClient, unless I'm missing
>>> something. At lines 126 and 140 it does compareAndSet() calls but throws
>> an
>>> exception if they return true. As far as I can work out, this means that
>>> whenever the lock is acquired, an exception gets thrown indicating that
>>> there are Multiple acquirers.
>>> 
>>> This test is failing fairly consistently. It seems to be the remaining
>> test
>>> that keeps failing in the Jenkins build also
>>> cheers
>>> 
>>> 
>>> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <mckenzie.cam@gmail.com
>>> 
>>> wrote:
>>> 
>>>> Looks like I was incorrect. The NoWatcherException is being thrown on
>>>> success as well, and the problem is not in the cluster restart. Will
>> keep
>>>> digging.
>>>> 
>>>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>>>> wrote:
>>>> 
>>>>> TestInterProcessSemaphoreCluster.testCluster() is failling (assertion
>> at
>>>>> line 294). Again, it seems like some sort of race condition with the
>>>>> watcher removal.
>>>>> 
>>>>> When I run it in Eclipse, it fails maybe 25% of the time. When it fails
>>>>> it seems that it's got something to do with watcher removal. When the
>> test
>>>>> passes, this error is not logged.
>>>>> 
>>>>> org.apache.zookeeper.KeeperException$NoWatcherException:
>> KeeperErrorCode
>>>>> = No such watcher for /foo/bar/lock/leases
>>>>> at
>>>>> 
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>>>> at
>>>>> 
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>>>> at
>>>>> 
>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>>>> at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>>>> at
>>>>> 
>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>>>> at
>>>>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>>>> at
>>>>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>>>> 
>>>>> Is it possible it's something to do with the way that the cluster is
>>>>> restarted at line 282? The old cluster is not shutdown, a new one is
>> just
>>>>> created.
>>>>> 
>>>>> 
>>>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>>>> jordan@jordanzimmerman.com> wrote:
>>>>> 
>>>>>> I’ll try to address this as part of CURATOR-333
>>>>>> 
>>>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Maybe we need to look at some way of providing a hook for tests to
>> wait
>>>>>>> reliably for asynch tasks to finish?
>>>>>>> 
>>>>>>> The latest round of tests ran OK. One test failed on an unrelated
>> thing
>>>>>>> (ConnectionLoss), but this appears to be a transient thing as it's
>>>>>> worked
>>>>>>> ok the next time around.
>>>>>>> 
>>>>>>> I will start getting a release together. Thanks for you help with the
>>>>>>> updated tests.
>>>>>>> cheers
>>>>>>> 
>>>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>>>> jordan@jordanzimmerman.com
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> The problem is in-flight watchers and async background calls.
>> There’s
>>>>>> no
>>>>>>>> way to cancel these and they can take time to occur - even after a
>>>>>> recipe
>>>>>>>> instance is closed.
>>>>>>>> 
>>>>>>>> -Jordan
>>>>>>>> 
>>>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>>>> mckenzie.cam@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Ok, running it again now.
>>>>>>>>> 
>>>>>>>>> Is the problem that the watcher clean up for the recipes is done
>>>>>>>>> asynchronously after they are closed?
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>>>> jordan@jordanzimmerman.com
>>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> OK - please try now. I added a loop in the “no watchers” checker.
>> If
>>>>>>>> there
>>>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>>>> 
>>>>>>>>>> -Jordan
>>>>>>>>>> 
>>>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Looks like these failures are intermittent. Running them directly
>>>>>> in
>>>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing again
>>>>>> in
>>>>>>>> the
>>>>>>>>>>> morning and see how it goes.
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>>>> 
>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>>>> java.lang.AssertionError: One or more child watchers are still
>>>>>>>>>> registered:
>>>>>>>>>>>> [/test]
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>>>> 
>>>>>>>>>>>> FAILURE! - in
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>>>> at
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>>>> 
>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more
>>>>>> child
>>>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>>>> Run 2: PASS
>>>>>>>>>>>> 
>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
>>>>>> but
>>>>>>>>>>>> found [false]
>>>>>>>>>>>> 
>>>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against
>>>>>> that,
>>>>>>>>>> and
>>>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>>>> cheers
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged
>>>>>> yet. I
>>>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -jordan
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the same
>>>>>> stuff
>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child
>>>>>>>>>> watchers
>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child
>>>>>>>>>> watchers
>>>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>> Run 1:
>>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>> Run 2:
>>>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more
>>>>>> child
>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more
>>>>>> child
>>>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>>>> [true]
>>>>>>>> but
>>>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or
>> more
>>>>>>>> data
>>>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>>>> watchers are
>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>>>> watchers are
>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
>> spend
>>>>>> some
>>>>>>>>>>>>>> time on
>>>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still supposed to
>>>>>> get
>>>>>>>> set
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
>> handling
>>>>>> it.
>>>>>>>>>> But,
>>>>>>>>>>>>>>>> while I was looking at the code I realized there are some
>>>>>>>>>> significant
>>>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror what
>>>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight, the
>>>>>> whole
>>>>>>>> ZK
>>>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the mutator
>>>>>> APIs.
>>>>>>>>>>>>>> But, of
>>>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently
>>>>>> on the
>>>>>>>>>> 3.0
>>>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a bug
>>>>>> in the
>>>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've
>> had a
>>>>>>>> quick
>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's the
>>>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time,
>> can
>>>>>> you
>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more digging.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto
>>>>>> Nexus.
>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>>>> dragonsinth@gmail.com
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to both
>>>>>>>> master
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are
>>>>>> failing
>>>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few
>>>>>> times
>>>>>>>>>> but
>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
>> morning.
>>>>>>>> Given
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just
>> want
>>>>>> to
>>>>>>>>>> vote
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>>>>> validation
>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>>>>> Because
>>>>>>>> the
>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>> 
>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>>>    pathInBackground(adjustedPath, data,
>>>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
>> force a
>>>>>>>>>> failure
>>>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener,
>> the
>>>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding operations?
>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie
>> <
>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there,
>> so
>>>>>>>> maybe
>>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know
>> if
>>>>>> I
>>>>>>>> get
>>>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
>> Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to
>>>>>> the
>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
>> McKenzie <
>>>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to
>>>>>> try
>>>>>>>> and
>>>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>>>> exception
>>>>>>>> that
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it
>> just
>>>>>>>> throws
>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>>>> propogated up
>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
>> don't
>>>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
Indeed. There seems to be a problem with InterProcessSemaphoreV2 though.
I've written a simplified unit test that just has a bunch of clients
attempting to grab a lease on the semaphore. When I shutdown and restart ZK
about 25% of the time, none of the clients can reacquire the semaphore.

Still trying to work out what's going on, but I'm probably not going to
have a lot of time today to look at it.
cheers

On Thu, Jun 2, 2016 at 10:30 AM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Odd - SemaphoreClient does seem wrong.
>
> > On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > It looks like under some circumstances (which I haven't worked out yet)
> > that the InterprocessMutex acquire() is not working correctly when
> > reconnecting to ZK. Still digging into why this is.
> >
> > There also seems to be a bug in the SemaphoreClient, unless I'm missing
> > something. At lines 126 and 140 it does compareAndSet() calls but throws
> an
> > exception if they return true. As far as I can work out, this means that
> > whenever the lock is acquired, an exception gets thrown indicating that
> > there are Multiple acquirers.
> >
> > This test is failing fairly consistently. It seems to be the remaining
> test
> > that keeps failing in the Jenkins build also
> > cheers
> >
> >
> > On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <mckenzie.cam@gmail.com
> >
> > wrote:
> >
> >> Looks like I was incorrect. The NoWatcherException is being thrown on
> >> success as well, and the problem is not in the cluster restart. Will
> keep
> >> digging.
> >>
> >> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <
> mckenzie.cam@gmail.com>
> >> wrote:
> >>
> >>> TestInterProcessSemaphoreCluster.testCluster() is failling (assertion
> at
> >>> line 294). Again, it seems like some sort of race condition with the
> >>> watcher removal.
> >>>
> >>> When I run it in Eclipse, it fails maybe 25% of the time. When it fails
> >>> it seems that it's got something to do with watcher removal. When the
> test
> >>> passes, this error is not logged.
> >>>
> >>> org.apache.zookeeper.KeeperException$NoWatcherException:
> KeeperErrorCode
> >>> = No such watcher for /foo/bar/lock/leases
> >>> at
> >>>
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
> >>> at
> >>>
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
> >>> at
> >>>
> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
> >>> at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
> >>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
> >>> at
> >>>
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
> >>> at
> >>>
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
> >>> at
> >>>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> >>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
> >>>
> >>> Is it possible it's something to do with the way that the cluster is
> >>> restarted at line 282? The old cluster is not shutdown, a new one is
> just
> >>> created.
> >>>
> >>>
> >>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
> >>> jordan@jordanzimmerman.com> wrote:
> >>>
> >>>> I’ll try to address this as part of CURATOR-333
> >>>>
> >>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <
> mckenzie.cam@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Maybe we need to look at some way of providing a hook for tests to
> wait
> >>>>> reliably for asynch tasks to finish?
> >>>>>
> >>>>> The latest round of tests ran OK. One test failed on an unrelated
> thing
> >>>>> (ConnectionLoss), but this appears to be a transient thing as it's
> >>>> worked
> >>>>> ok the next time around.
> >>>>>
> >>>>> I will start getting a release together. Thanks for you help with the
> >>>>> updated tests.
> >>>>> cheers
> >>>>>
> >>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
> >>>> jordan@jordanzimmerman.com
> >>>>>> wrote:
> >>>>>
> >>>>>> The problem is in-flight watchers and async background calls.
> There’s
> >>>> no
> >>>>>> way to cancel these and they can take time to occur - even after a
> >>>> recipe
> >>>>>> instance is closed.
> >>>>>>
> >>>>>> -Jordan
> >>>>>>
> >>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
> >>>> mckenzie.cam@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Ok, running it again now.
> >>>>>>>
> >>>>>>> Is the problem that the watcher clean up for the recipes is done
> >>>>>>> asynchronously after they are closed?
> >>>>>>>
> >>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
> >>>>>> jordan@jordanzimmerman.com
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> OK - please try now. I added a loop in the “no watchers” checker.
> If
> >>>>>> there
> >>>>>>>> are remaining watchers, it will sleep a bit and try again.
> >>>>>>>>
> >>>>>>>> -Jordan
> >>>>>>>>
> >>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
> >>>> mckenzie.cam@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Looks like these failures are intermittent. Running them directly
> >>>> in
> >>>>>>>>> Eclipse they seem to be passing. I will run the whole thing again
> >>>> in
> >>>>>> the
> >>>>>>>>> morning and see how it goes.
> >>>>>>>>>
> >>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
> >>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> There are still 2 tests failing for me:
> >>>>>>>>>>
> >>>>>>>>>> FAILURE! - in
> >>>>>>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
> >>>>>>>>>> java.lang.AssertionError: One or more child watchers are still
> >>>>>>>> registered:
> >>>>>>>>>> [/test]
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
> >>>>>>>>>>
> >>>>>>>>>> FAILURE! - in
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
> >>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
> >>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
> >>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
> >>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
> >>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
> >>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
> >>>>>>>>>>
> >>>>>>>>>> Failed tests:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more
> >>>> child
> >>>>>>>>>> watchers are still registered: [/test]
> >>>>>>>>>> Run 2: PASS
> >>>>>>>>>>
> >>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
> >>>> but
> >>>>>>>>>> found [false]
> >>>>>>>>>>
> >>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
> >>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against
> >>>> that,
> >>>>>>>> and
> >>>>>>>>>>> if it's all good will merge into CURATOR-3.0
> >>>>>>>>>>> cheers
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
> >>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged
> >>>> yet. I
> >>>>>>>>>>>> made/pushed my changes in CURATOR-332
> >>>>>>>>>>>>
> >>>>>>>>>>>> -jordan
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
> >>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the same
> >>>> stuff
> >>>>>>>>>>>> after
> >>>>>>>>>>>>> merging your fix:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Failed tests:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child
> >>>>>>>> watchers
> >>>>>>>>>>>>> are still registered: [/test]
> >>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child
> >>>>>>>> watchers
> >>>>>>>>>>>>> are still registered: [/test]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>> Run 1:
> >>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>>>>>>>> Run 2:
> >>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more
> >>>> child
> >>>>>>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more
> >>>> child
> >>>>>>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
> >>>> [true]
> >>>>>> but
> >>>>>>>>>>>>> found [false]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>>>>>> Run 1: PASS
> >>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or
> more
> >>>>>> data
> >>>>>>>>>>>>> watchers are still registered: [/count]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
> >>>> watchers are
> >>>>>>>>>>>> still
> >>>>>>>>>>>>> registered: [/count]
> >>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
> >>>> watchers are
> >>>>>>>>>>>> still
> >>>>>>>>>>>>> registered: [/count]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
> >>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll
> spend
> >>>> some
> >>>>>>>>>>>> time on
> >>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still supposed to
> >>>> get
> >>>>>> set
> >>>>>>>>>>>> when
> >>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t
> handling
> >>>> it.
> >>>>>>>> But,
> >>>>>>>>>>>>>> while I was looking at the code I realized there are some
> >>>>>>>> significant
> >>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror what
> >>>>>>>>>>>> ZooKeeper does
> >>>>>>>>>>>>>> internally which is insanely complicated. In hindsight, the
> >>>> whole
> >>>>>> ZK
> >>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the mutator
> >>>> APIs.
> >>>>>>>>>>>> But, of
> >>>>>>>>>>>>>> course, that’s easy for me to say now.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
> >>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>>>>> Those tests are now passing for me.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently
> >>>> on the
> >>>>>>>> 3.0
> >>>>>>>>>>>>>>> branch. It appears that this is actually potentially a bug
> >>>> in the
> >>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've
> had a
> >>>>>> quick
> >>>>>>>>>>>> look
> >>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's the
> >>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time,
> can
> >>>> you
> >>>>>>>>>>>> have a
> >>>>>>>>>>>>>>> look? If not, let me know and I'll do some more digging.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
> >>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks Scott.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto
> >>>> Nexus.
> >>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
> >>>>>>>> dragonsinth@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to both
> >>>>>> master
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> 3.0.
> >>>>>>>>>>>>>>>>> Where should I push the fix?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
> >>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are
> >>>> failing
> >>>>>>>>>>>> there.
> >>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
> >>>>>>>>>>>> dragonsinth@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Scott can you take a look?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few
> >>>> times
> >>>>>>>> but
> >>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>>> love:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
> >>>>>>>>>>>>>>>>>>>> actual 6
> >>>>>>>>>>>>>>>>>>>>> expected -31:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the
> morning.
> >>>>>> Given
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just
> want
> >>>> to
> >>>>>>>> vote
> >>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>> 2.11.0
> >>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
> >>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> ====================
> >>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
> >>>> validation
> >>>>>>>>>>>> stuff.
> >>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
> >>>> Because
> >>>>>> the
> >>>>>>>>>>>> unit
> >>>>>>>>>>>>>>>>>> test
> >>>>>>>>>>>>>>>>>>>>>> uses a
> >>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
> >>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
> >>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
> >>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
> >>>> acling.getAclList(adjustedPath);
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>
> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
> >>>>>>>>>>>>>>>>>>>>>> data,
> >>>>>>>>>>>>>>>>>>>>>>>> aclList);
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
> >>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
> >>>>>>>>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>>>>>>>>     pathInBackground(adjustedPath, data,
> >>>> givenPath);
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to
> force a
> >>>>>>>> failure
> >>>>>>>>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener,
> the
> >>>>>>>>>>>>>>>>> expectation is
> >>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding operations?
> >>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie
> <
> >>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there,
> so
> >>>>>> maybe
> >>>>>>>>>>>>>>>>>> something
> >>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know
> if
> >>>> I
> >>>>>> get
> >>>>>>>>>>>>>>>>> stuck.
> >>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan
> Zimmerman <
> >>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to
> >>>> the
> >>>>>>>>>>>> master
> >>>>>>>>>>>>>>>>>>> branch?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron
> McKenzie <
> >>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
> >>>>>>>> TestFrameworkBackground:testErrorListener
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to
> >>>> try
> >>>>>> and
> >>>>>>>>>>>>>>>>> provoke
> >>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>> error
> >>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
> >>>>>>>> CreateBuilderImpl
> >>>>>>>>>>>>>>>>> prior
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
> >>>> exception
> >>>>>> that
> >>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>>>>>>>>>>> happens
> >>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it
> just
> >>>>>> throws
> >>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
> >>>> propogated up
> >>>>>>>> the
> >>>>>>>>>>>>>>>>> stack
> >>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just
> don't
> >>>>>>>>>>>> understand
> >>>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>> ever
> >>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
> >>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Odd - SemaphoreClient does seem wrong. 

> On Jun 1, 2016, at 1:43 AM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> It looks like under some circumstances (which I haven't worked out yet)
> that the InterprocessMutex acquire() is not working correctly when
> reconnecting to ZK. Still digging into why this is.
> 
> There also seems to be a bug in the SemaphoreClient, unless I'm missing
> something. At lines 126 and 140 it does compareAndSet() calls but throws an
> exception if they return true. As far as I can work out, this means that
> whenever the lock is acquired, an exception gets thrown indicating that
> there are Multiple acquirers.
> 
> This test is failing fairly consistently. It seems to be the remaining test
> that keeps failing in the Jenkins build also
> cheers
> 
> 
> On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> 
>> Looks like I was incorrect. The NoWatcherException is being thrown on
>> success as well, and the problem is not in the cluster restart. Will keep
>> digging.
>> 
>> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>> 
>>> TestInterProcessSemaphoreCluster.testCluster() is failling (assertion at
>>> line 294). Again, it seems like some sort of race condition with the
>>> watcher removal.
>>> 
>>> When I run it in Eclipse, it fails maybe 25% of the time. When it fails
>>> it seems that it's got something to do with watcher removal. When the test
>>> passes, this error is not logged.
>>> 
>>> org.apache.zookeeper.KeeperException$NoWatcherException: KeeperErrorCode
>>> = No such watcher for /foo/bar/lock/leases
>>> at
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>>> at
>>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>>> at
>>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>>> at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>>> at
>>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>>> at
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>>> at
>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>> 
>>> Is it possible it's something to do with the way that the cluster is
>>> restarted at line 282? The old cluster is not shutdown, a new one is just
>>> created.
>>> 
>>> 
>>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>>> jordan@jordanzimmerman.com> wrote:
>>> 
>>>> I’ll try to address this as part of CURATOR-333
>>>> 
>>>>> On May 31, 2016, at 7:08 PM, Cameron McKenzie <mc...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Maybe we need to look at some way of providing a hook for tests to wait
>>>>> reliably for asynch tasks to finish?
>>>>> 
>>>>> The latest round of tests ran OK. One test failed on an unrelated thing
>>>>> (ConnectionLoss), but this appears to be a transient thing as it's
>>>> worked
>>>>> ok the next time around.
>>>>> 
>>>>> I will start getting a release together. Thanks for you help with the
>>>>> updated tests.
>>>>> cheers
>>>>> 
>>>>> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>>> jordan@jordanzimmerman.com
>>>>>> wrote:
>>>>> 
>>>>>> The problem is in-flight watchers and async background calls. There’s
>>>> no
>>>>>> way to cancel these and they can take time to occur - even after a
>>>> recipe
>>>>>> instance is closed.
>>>>>> 
>>>>>> -Jordan
>>>>>> 
>>>>>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Ok, running it again now.
>>>>>>> 
>>>>>>> Is the problem that the watcher clean up for the recipes is done
>>>>>>> asynchronously after they are closed?
>>>>>>> 
>>>>>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>>>>> jordan@jordanzimmerman.com
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> OK - please try now. I added a loop in the “no watchers” checker. If
>>>>>> there
>>>>>>>> are remaining watchers, it will sleep a bit and try again.
>>>>>>>> 
>>>>>>>> -Jordan
>>>>>>>> 
>>>>>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Looks like these failures are intermittent. Running them directly
>>>> in
>>>>>>>>> Eclipse they seem to be passing. I will run the whole thing again
>>>> in
>>>>>> the
>>>>>>>>> morning and see how it goes.
>>>>>>>>> 
>>>>>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> There are still 2 tests failing for me:
>>>>>>>>>> 
>>>>>>>>>> FAILURE! - in
>>>>>>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>>>>>> java.lang.AssertionError: One or more child watchers are still
>>>>>>>> registered:
>>>>>>>>>> [/test]
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>>>>>> 
>>>>>>>>>> FAILURE! - in
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>>>>>> at
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>>>>>> 
>>>>>>>>>> Failed tests:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more
>>>> child
>>>>>>>>>> watchers are still registered: [/test]
>>>>>>>>>> Run 2: PASS
>>>>>>>>>> 
>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
>>>> but
>>>>>>>>>> found [false]
>>>>>>>>>> 
>>>>>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against
>>>> that,
>>>>>>>> and
>>>>>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>>>>>> cheers
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged
>>>> yet. I
>>>>>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>>>>>> 
>>>>>>>>>>>> -jordan
>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm still seeing 6 failed tests that seem related to the same
>>>> stuff
>>>>>>>>>>>> after
>>>>>>>>>>>>> merging your fix:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Failed tests:
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child
>>>>>>>> watchers
>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child
>>>>>>>> watchers
>>>>>>>>>>>>> are still registered: [/test]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>> Run 1:
>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>> Run 2:
>>>>>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more
>>>> child
>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more
>>>> child
>>>>>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>>> [true]
>>>>>> but
>>>>>>>>>>>>> found [false]
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>> Run 1: PASS
>>>>>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more
>>>>>> data
>>>>>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>>> watchers are
>>>>>>>>>>>> still
>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>>> watchers are
>>>>>>>>>>>> still
>>>>>>>>>>>>> registered: [/count]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I see the problem. The fix is not simple though so I’ll spend
>>>> some
>>>>>>>>>>>> time on
>>>>>>>>>>>>>> it. The TL;DR is that exists watchers are still supposed to
>>>> get
>>>>>> set
>>>>>>>>>>>> when
>>>>>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t handling
>>>> it.
>>>>>>>> But,
>>>>>>>>>>>>>> while I was looking at the code I realized there are some
>>>>>>>> significant
>>>>>>>>>>>>>> additional problems. Curator, here, is trying to mirror what
>>>>>>>>>>>> ZooKeeper does
>>>>>>>>>>>>>> internally which is insanely complicated. In hindsight, the
>>>> whole
>>>>>> ZK
>>>>>>>>>>>>>> watcher mechanism should’ve been decoupled from the mutator
>>>> APIs.
>>>>>>>>>>>> But, of
>>>>>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently
>>>> on the
>>>>>>>> 3.0
>>>>>>>>>>>>>>> branch. It appears that this is actually potentially a bug
>>>> in the
>>>>>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've had a
>>>>>> quick
>>>>>>>>>>>> look
>>>>>>>>>>>>>>> through, but I haven't dived in in any detail. It's the
>>>>>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time, can
>>>> you
>>>>>>>>>>>> have a
>>>>>>>>>>>>>>> look? If not, let me know and I'll do some more digging.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto
>>>> Nexus.
>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>>>>>> dragonsinth@gmail.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to both
>>>>>> master
>>>>>>>>>>>> and
>>>>>>>>>>>>>> 3.0.
>>>>>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are
>>>> failing
>>>>>>>>>>>> there.
>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few
>>>> times
>>>>>>>> but
>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I will have a look into what's going on in the morning.
>>>>>> Given
>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just want
>>>> to
>>>>>>>> vote
>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>>> validation
>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>>> Because
>>>>>> the
>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> final String adjustedPath =
>>>>>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>>>>> List<ACL> aclList =
>>>> acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> String returnPath = null;
>>>>>>>>>>>>>>>>>>>>>>>> if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>     pathInBackground(adjustedPath, data,
>>>> givenPath);
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to force a
>>>>>>>> failure
>>>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener, the
>>>>>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding operations?
>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there, so
>>>>>> maybe
>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know if
>>>> I
>>>>>> get
>>>>>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to
>>>> the
>>>>>>>>>>>> master
>>>>>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>>>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to
>>>> try
>>>>>> and
>>>>>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>>>>>> CreateBuilderImpl
>>>>>>>>>>>>>>>>> prior
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>>> exception
>>>>>> that
>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it just
>>>>>> throws
>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>>> propogated up
>>>>>>>> the
>>>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just don't
>>>>>>>>>>>> understand
>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 


Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
It looks like under some circumstances (which I haven't worked out yet)
that the InterprocessMutex acquire() is not working correctly when
reconnecting to ZK. Still digging into why this is.

There also seems to be a bug in the SemaphoreClient, unless I'm missing
something. At lines 126 and 140 it does compareAndSet() calls but throws an
exception if they return true. As far as I can work out, this means that
whenever the lock is acquired, an exception gets thrown indicating that
there are Multiple acquirers.

This test is failing fairly consistently. It seems to be the remaining test
that keeps failing in the Jenkins build also
cheers


On Wed, Jun 1, 2016 at 3:10 PM, Cameron McKenzie <mc...@gmail.com>
wrote:

> Looks like I was incorrect. The NoWatcherException is being thrown on
> success as well, and the problem is not in the cluster restart. Will keep
> digging.
>
> On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
>
>> TestInterProcessSemaphoreCluster.testCluster() is failling (assertion at
>> line 294). Again, it seems like some sort of race condition with the
>> watcher removal.
>>
>> When I run it in Eclipse, it fails maybe 25% of the time. When it fails
>> it seems that it's got something to do with watcher removal. When the test
>> passes, this error is not logged.
>>
>> org.apache.zookeeper.KeeperException$NoWatcherException: KeeperErrorCode
>> = No such watcher for /foo/bar/lock/leases
>> at
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
>> at
>> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
>> at
>> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
>> at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
>> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
>> at
>> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
>> at
>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
>> at
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>>
>> Is it possible it's something to do with the way that the cluster is
>> restarted at line 282? The old cluster is not shutdown, a new one is just
>> created.
>>
>>
>> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com> wrote:
>>
>>> I’ll try to address this as part of CURATOR-333
>>>
>>> > On May 31, 2016, at 7:08 PM, Cameron McKenzie <mc...@gmail.com>
>>> wrote:
>>> >
>>> > Maybe we need to look at some way of providing a hook for tests to wait
>>> > reliably for asynch tasks to finish?
>>> >
>>> > The latest round of tests ran OK. One test failed on an unrelated thing
>>> > (ConnectionLoss), but this appears to be a transient thing as it's
>>> worked
>>> > ok the next time around.
>>> >
>>> > I will start getting a release together. Thanks for you help with the
>>> > updated tests.
>>> > cheers
>>> >
>>> > On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>>> jordan@jordanzimmerman.com
>>> >> wrote:
>>> >
>>> >> The problem is in-flight watchers and async background calls. There’s
>>> no
>>> >> way to cancel these and they can take time to occur - even after a
>>> recipe
>>> >> instance is closed.
>>> >>
>>> >> -Jordan
>>> >>
>>> >>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <
>>> mckenzie.cam@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Ok, running it again now.
>>> >>>
>>> >>> Is the problem that the watcher clean up for the recipes is done
>>> >>> asynchronously after they are closed?
>>> >>>
>>> >>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>>> >> jordan@jordanzimmerman.com
>>> >>>> wrote:
>>> >>>
>>> >>>> OK - please try now. I added a loop in the “no watchers” checker. If
>>> >> there
>>> >>>> are remaining watchers, it will sleep a bit and try again.
>>> >>>>
>>> >>>> -Jordan
>>> >>>>
>>> >>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>>> mckenzie.cam@gmail.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Looks like these failures are intermittent. Running them directly
>>> in
>>> >>>>> Eclipse they seem to be passing. I will run the whole thing again
>>> in
>>> >> the
>>> >>>>> morning and see how it goes.
>>> >>>>>
>>> >>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>> >>>> mckenzie.cam@gmail.com>
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>>> There are still 2 tests failing for me:
>>> >>>>>>
>>> >>>>>> FAILURE! - in
>>> >>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>> >>>>>>
>>> >>>>
>>> >>
>>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>> >>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>> >>>>>> java.lang.AssertionError: One or more child watchers are still
>>> >>>> registered:
>>> >>>>>> [/test]
>>> >>>>>> at
>>> >>>>>>
>>> >>>>
>>> >>
>>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>
>>> >>
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>> >>>>>>
>>> >>>>>> FAILURE! - in
>>> >>>>>>
>>> >>>>
>>> >>
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>> >>>>>>
>>> >>>>
>>> >>
>>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>> >>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>> >>>>>> java.lang.AssertionError: expected [true] but found [false]
>>> >>>>>> at org.testng.Assert.fail(Assert.java:94)
>>> >>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>> >>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>> >>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>> >>>>>> at
>>> >>>>>>
>>> >>>>
>>> >>
>>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>> >>>>>>
>>> >>>>>> Failed tests:
>>> >>>>>>
>>> >>>>>>
>>> >>>>
>>> >>
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>> >>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more
>>> child
>>> >>>>>> watchers are still registered: [/test]
>>> >>>>>> Run 2: PASS
>>> >>>>>>
>>> >>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
>>> but
>>> >>>>>> found [false]
>>> >>>>>>
>>> >>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>> >>>> mckenzie.cam@gmail.com
>>> >>>>>>> wrote:
>>> >>>>>>
>>> >>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against
>>> that,
>>> >>>> and
>>> >>>>>>> if it's all good will merge into CURATOR-3.0
>>> >>>>>>> cheers
>>> >>>>>>>
>>> >>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>> >>>>>>> jordan@jordanzimmerman.com> wrote:
>>> >>>>>>>
>>> >>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged
>>> yet. I
>>> >>>>>>>> made/pushed my changes in CURATOR-332
>>> >>>>>>>>
>>> >>>>>>>> -jordan
>>> >>>>>>>>
>>> >>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>> >>>> mckenzie.cam@gmail.com>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>> I'm still seeing 6 failed tests that seem related to the same
>>> stuff
>>> >>>>>>>> after
>>> >>>>>>>>> merging your fix:
>>> >>>>>>>>>
>>> >>>>>>>>> Failed tests:
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>
>>> >>
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>> >>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child
>>> >>>> watchers
>>> >>>>>>>>> are still registered: [/test]
>>> >>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child
>>> >>>> watchers
>>> >>>>>>>>> are still registered: [/test]
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>
>>> >>
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>> >>>>>>>>> Run 1:
>>> >>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>> >>>>>>>>> One or more child watchers are still registered: [/test]
>>> >>>>>>>>> Run 2:
>>> >>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>> >>>>>>>>> One or more child watchers are still registered: [/test]
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>
>>> >>
>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>> >>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more
>>> child
>>> >>>>>>>>> watchers are still registered: [/one/two/three]
>>> >>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more
>>> child
>>> >>>>>>>>> watchers are still registered: [/one/two/three]
>>> >>>>>>>>>
>>> >>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected
>>> [true]
>>> >> but
>>> >>>>>>>>> found [false]
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>
>>> >>
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>> >>>>>>>>> Run 1: PASS
>>> >>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more
>>> >> data
>>> >>>>>>>>> watchers are still registered: [/count]
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>
>>> >>
>>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>> >>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data
>>> watchers are
>>> >>>>>>>> still
>>> >>>>>>>>> registered: [/count]
>>> >>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data
>>> watchers are
>>> >>>>>>>> still
>>> >>>>>>>>> registered: [/count]
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>> >>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>> I see the problem. The fix is not simple though so I’ll spend
>>> some
>>> >>>>>>>> time on
>>> >>>>>>>>>> it. The TL;DR is that exists watchers are still supposed to
>>> get
>>> >> set
>>> >>>>>>>> when
>>> >>>>>>>>>> there is a KeeperException.NoNode and the code isn’t handling
>>> it.
>>> >>>> But,
>>> >>>>>>>>>> while I was looking at the code I realized there are some
>>> >>>> significant
>>> >>>>>>>>>> additional problems. Curator, here, is trying to mirror what
>>> >>>>>>>> ZooKeeper does
>>> >>>>>>>>>> internally which is insanely complicated. In hindsight, the
>>> whole
>>> >> ZK
>>> >>>>>>>>>> watcher mechanism should’ve been decoupled from the mutator
>>> APIs.
>>> >>>>>>>> But, of
>>> >>>>>>>>>> course, that’s easy for me to say now.
>>> >>>>>>>>>>
>>> >>>>>>>>>> -Jordan
>>> >>>>>>>>>>
>>> >>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>> >>>>>>>> mckenzie.cam@gmail.com>
>>> >>>>>>>>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Thanks Scott,
>>> >>>>>>>>>>> Those tests are now passing for me.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently
>>> on the
>>> >>>> 3.0
>>> >>>>>>>>>>> branch. It appears that this is actually potentially a bug
>>> in the
>>> >>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've had a
>>> >> quick
>>> >>>>>>>> look
>>> >>>>>>>>>>> through, but I haven't dived in in any detail. It's the
>>> >>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time, can
>>> you
>>> >>>>>>>> have a
>>> >>>>>>>>>>> look? If not, let me know and I'll do some more digging.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> cheers
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>> >>>>>>>>>> mckenzie.cam@gmail.com>
>>> >>>>>>>>>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> Thanks Scott.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto
>>> Nexus.
>>> >>>>>>>>>>>> cheers
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>> >>>> dragonsinth@gmail.com
>>> >>>>>>>>>
>>> >>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to both
>>> >> master
>>> >>>>>>>> and
>>> >>>>>>>>>> 3.0.
>>> >>>>>>>>>>>>> Where should I push the fix?
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>> >>>>>>>>>> mckenzie.cam@gmail.com
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Thanks Scott,
>>> >>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are
>>> failing
>>> >>>>>>>> there.
>>> >>>>>>>>>>>>>> cheers
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>> >>>>>>>> dragonsinth@gmail.com>
>>> >>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
>>> >>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Scott can you take a look?
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> -Jordan
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>> >>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>> >>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few
>>> times
>>> >>>> but
>>> >>>>>>>> no
>>> >>>>>>>>>>>>>> love:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>
>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>> >>>>>>>>>>>>>>>> actual 6
>>> >>>>>>>>>>>>>>>>> expected -31:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> I will have a look into what's going on in the morning.
>>> >> Given
>>> >>>>>>>> that
>>> >>>>>>>>>>>>>>> these
>>> >>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just want
>>> to
>>> >>>> vote
>>> >>>>>>>> on
>>> >>>>>>>>>>>>>>> 2.11.0
>>> >>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>> >>>>>>>>>>>>>>>>> cheers
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman <
>>> >>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> Great news. Thanks.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> ====================
>>> >>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
>>> >>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie <
>>> >>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>> >>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>>> validation
>>> >>>>>>>> stuff.
>>> >>>>>>>>>>>>> It
>>> >>>>>>>>>>>>>>> now
>>> >>>>>>>>>>>>>>>>>> does
>>> >>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call.
>>> Because
>>> >> the
>>> >>>>>>>> unit
>>> >>>>>>>>>>>>>> test
>>> >>>>>>>>>>>>>>>>>> uses a
>>> >>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>  final String adjustedPath =
>>> >>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>> >>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>> >>>>>>>>>>>>>>>>>>>>  List<ACL> aclList =
>>> acling.getAclList(adjustedPath);
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>
>>> >> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>> >>>>>>>>>>>>>>>>>> data,
>>> >>>>>>>>>>>>>>>>>>>> aclList);
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>  String returnPath = null;
>>> >>>>>>>>>>>>>>>>>>>>  if ( backgrounding.inBackground() )
>>> >>>>>>>>>>>>>>>>>>>>  {
>>> >>>>>>>>>>>>>>>>>>>>      pathInBackground(adjustedPath, data,
>>> givenPath);
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to force a
>>> >>>> failure
>>> >>>>>>>>>>>>> in a
>>> >>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener, the
>>> >>>>>>>>>>>>> expectation is
>>> >>>>>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding operations?
>>> >>>>>>>>>>>>>>>>>>>> cheers
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie <
>>> >>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>> >>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there, so
>>> >> maybe
>>> >>>>>>>>>>>>>> something
>>> >>>>>>>>>>>>>>>> has
>>> >>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know if
>>> I
>>> >> get
>>> >>>>>>>>>>>>> stuck.
>>> >>>>>>>>>>>>>>>>>>>>> cheers
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan Zimmerman <
>>> >>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to
>>> the
>>> >>>>>>>> master
>>> >>>>>>>>>>>>>>> branch?
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> -JZ
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron McKenzie <
>>> >>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>> >>>>>>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Guys,
>>> >>>>>>>>>>>>>>>>>>>>>>> There's a test
>>> >>>> TestFrameworkBackground:testErrorListener
>>> >>>>>>>>>>>>> that
>>> >>>>>>>>>>>>>> is
>>> >>>>>>>>>>>>>>>>>>>>>> failing
>>> >>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to
>>> try
>>> >> and
>>> >>>>>>>>>>>>> provoke
>>> >>>>>>>>>>>>>>> an
>>> >>>>>>>>>>>>>>>>>>>>>> error
>>> >>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>> >>>> CreateBuilderImpl
>>> >>>>>>>>>>>>> prior
>>> >>>>>>>>>>>>>> to
>>> >>>>>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the
>>> exception
>>> >> that
>>> >>>>>>>> it
>>> >>>>>>>>>>>>>> throws
>>> >>>>>>>>>>>>>>>>>>>>>> happens
>>> >>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it just
>>> >> throws
>>> >>>>>>>> an
>>> >>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is
>>> propogated up
>>> >>>> the
>>> >>>>>>>>>>>>> stack
>>> >>>>>>>>>>>>>> at
>>> >>>>>>>>>>>>>>>>>> which
>>> >>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just don't
>>> >>>>>>>> understand
>>> >>>>>>>>>>>>> how
>>> >>>>>>>>>>>>>>> it
>>> >>>>>>>>>>>>>>>>>> ever
>>> >>>>>>>>>>>>>>>>>>>>>>> worked?
>>> >>>>>>>>>>>>>>>>>>>>>>> cheers
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>
>>> >>>>
>>> >>
>>> >>
>>>
>>>
>>
>

Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
Looks like I was incorrect. The NoWatcherException is being thrown on
success as well, and the problem is not in the cluster restart. Will keep
digging.

On Wed, Jun 1, 2016 at 2:52 PM, Cameron McKenzie <mc...@gmail.com>
wrote:

> TestInterProcessSemaphoreCluster.testCluster() is failling (assertion at
> line 294). Again, it seems like some sort of race condition with the
> watcher removal.
>
> When I run it in Eclipse, it fails maybe 25% of the time. When it fails it
> seems that it's got something to do with watcher removal. When the test
> passes, this error is not logged.
>
> org.apache.zookeeper.KeeperException$NoWatcherException: KeeperErrorCode =
> No such watcher for /foo/bar/lock/leases
> at
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
> at
> org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
> at
> org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
> at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
> at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)
>
> Is it possible it's something to do with the way that the cluster is
> restarted at line 282? The old cluster is not shutdown, a new one is just
> created.
>
>
> On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
>> I’ll try to address this as part of CURATOR-333
>>
>> > On May 31, 2016, at 7:08 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>> >
>> > Maybe we need to look at some way of providing a hook for tests to wait
>> > reliably for asynch tasks to finish?
>> >
>> > The latest round of tests ran OK. One test failed on an unrelated thing
>> > (ConnectionLoss), but this appears to be a transient thing as it's
>> worked
>> > ok the next time around.
>> >
>> > I will start getting a release together. Thanks for you help with the
>> > updated tests.
>> > cheers
>> >
>> > On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com
>> >> wrote:
>> >
>> >> The problem is in-flight watchers and async background calls. There’s
>> no
>> >> way to cancel these and they can take time to occur - even after a
>> recipe
>> >> instance is closed.
>> >>
>> >> -Jordan
>> >>
>> >>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <mckenzie.cam@gmail.com
>> >
>> >> wrote:
>> >>>
>> >>> Ok, running it again now.
>> >>>
>> >>> Is the problem that the watcher clean up for the recipes is done
>> >>> asynchronously after they are closed?
>> >>>
>> >>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>> >> jordan@jordanzimmerman.com
>> >>>> wrote:
>> >>>
>> >>>> OK - please try now. I added a loop in the “no watchers” checker. If
>> >> there
>> >>>> are remaining watchers, it will sleep a bit and try again.
>> >>>>
>> >>>> -Jordan
>> >>>>
>> >>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
>> mckenzie.cam@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Looks like these failures are intermittent. Running them directly in
>> >>>>> Eclipse they seem to be passing. I will run the whole thing again in
>> >> the
>> >>>>> morning and see how it goes.
>> >>>>>
>> >>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>> >>>> mckenzie.cam@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> There are still 2 tests failing for me:
>> >>>>>>
>> >>>>>> FAILURE! - in
>> >>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>> >>>>>>
>> >>>>
>> >>
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>> >>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>> >>>>>> java.lang.AssertionError: One or more child watchers are still
>> >>>> registered:
>> >>>>>> [/test]
>> >>>>>> at
>> >>>>>>
>> >>>>
>> >>
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>> >>>>>> at
>> >>>>>>
>> >>>>
>> >>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>> >>>>>>
>> >>>>>> FAILURE! - in
>> >>>>>>
>> >>>>
>> >>
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>> >>>>>>
>> >>>>
>> >>
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>> >>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>> >>>>>> java.lang.AssertionError: expected [true] but found [false]
>> >>>>>> at org.testng.Assert.fail(Assert.java:94)
>> >>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>> >>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>> >>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>> >>>>>> at
>> >>>>>>
>> >>>>
>> >>
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>> >>>>>>
>> >>>>>> Failed tests:
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>> >>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more
>> child
>> >>>>>> watchers are still registered: [/test]
>> >>>>>> Run 2: PASS
>> >>>>>>
>> >>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
>> but
>> >>>>>> found [false]
>> >>>>>>
>> >>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>> >>>> mckenzie.cam@gmail.com
>> >>>>>>> wrote:
>> >>>>>>
>> >>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against
>> that,
>> >>>> and
>> >>>>>>> if it's all good will merge into CURATOR-3.0
>> >>>>>>> cheers
>> >>>>>>>
>> >>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>> >>>>>>> jordan@jordanzimmerman.com> wrote:
>> >>>>>>>
>> >>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged yet.
>> I
>> >>>>>>>> made/pushed my changes in CURATOR-332
>> >>>>>>>>
>> >>>>>>>> -jordan
>> >>>>>>>>
>> >>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>> >>>> mckenzie.cam@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> I'm still seeing 6 failed tests that seem related to the same
>> stuff
>> >>>>>>>> after
>> >>>>>>>>> merging your fix:
>> >>>>>>>>>
>> >>>>>>>>> Failed tests:
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>
>> >>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>> >>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child
>> >>>> watchers
>> >>>>>>>>> are still registered: [/test]
>> >>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child
>> >>>> watchers
>> >>>>>>>>> are still registered: [/test]
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>
>> >>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>> >>>>>>>>> Run 1:
>> >>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>> >>>>>>>>> One or more child watchers are still registered: [/test]
>> >>>>>>>>> Run 2:
>> >>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>> >>>>>>>>> One or more child watchers are still registered: [/test]
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>
>> >>
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>> >>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more
>> child
>> >>>>>>>>> watchers are still registered: [/one/two/three]
>> >>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more
>> child
>> >>>>>>>>> watchers are still registered: [/one/two/three]
>> >>>>>>>>>
>> >>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
>> >> but
>> >>>>>>>>> found [false]
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>
>> >>
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>> >>>>>>>>> Run 1: PASS
>> >>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more
>> >> data
>> >>>>>>>>> watchers are still registered: [/count]
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>
>> >>
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>> >>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data watchers
>> are
>> >>>>>>>> still
>> >>>>>>>>> registered: [/count]
>> >>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data watchers
>> are
>> >>>>>>>> still
>> >>>>>>>>> registered: [/count]
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>> >>>>>>>>> jordan@jordanzimmerman.com> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> I see the problem. The fix is not simple though so I’ll spend
>> some
>> >>>>>>>> time on
>> >>>>>>>>>> it. The TL;DR is that exists watchers are still supposed to get
>> >> set
>> >>>>>>>> when
>> >>>>>>>>>> there is a KeeperException.NoNode and the code isn’t handling
>> it.
>> >>>> But,
>> >>>>>>>>>> while I was looking at the code I realized there are some
>> >>>> significant
>> >>>>>>>>>> additional problems. Curator, here, is trying to mirror what
>> >>>>>>>> ZooKeeper does
>> >>>>>>>>>> internally which is insanely complicated. In hindsight, the
>> whole
>> >> ZK
>> >>>>>>>>>> watcher mechanism should’ve been decoupled from the mutator
>> APIs.
>> >>>>>>>> But, of
>> >>>>>>>>>> course, that’s easy for me to say now.
>> >>>>>>>>>>
>> >>>>>>>>>> -Jordan
>> >>>>>>>>>>
>> >>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>> >>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks Scott,
>> >>>>>>>>>>> Those tests are now passing for me.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently on
>> the
>> >>>> 3.0
>> >>>>>>>>>>> branch. It appears that this is actually potentially a bug in
>> the
>> >>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've had a
>> >> quick
>> >>>>>>>> look
>> >>>>>>>>>>> through, but I haven't dived in in any detail. It's the
>> >>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time, can
>> you
>> >>>>>>>> have a
>> >>>>>>>>>>> look? If not, let me know and I'll do some more digging.
>> >>>>>>>>>>>
>> >>>>>>>>>>> cheers
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>> >>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Thanks Scott.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto
>> Nexus.
>> >>>>>>>>>>>> cheers
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>> >>>> dragonsinth@gmail.com
>> >>>>>>>>>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to both
>> >> master
>> >>>>>>>> and
>> >>>>>>>>>> 3.0.
>> >>>>>>>>>>>>> Where should I push the fix?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>> >>>>>>>>>> mckenzie.cam@gmail.com
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Thanks Scott,
>> >>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are
>> failing
>> >>>>>>>> there.
>> >>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>> >>>>>>>> dragonsinth@gmail.com>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
>> >>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Scott can you take a look?
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> -Jordan
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>> >>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few
>> times
>> >>>> but
>> >>>>>>>> no
>> >>>>>>>>>>>>>> love:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>
>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>> >>>>>>>>>>>>>>>> actual 6
>> >>>>>>>>>>>>>>>>> expected -31:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I will have a look into what's going on in the morning.
>> >> Given
>> >>>>>>>> that
>> >>>>>>>>>>>>>>> these
>> >>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just want
>> to
>> >>>> vote
>> >>>>>>>> on
>> >>>>>>>>>>>>>>> 2.11.0
>> >>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>> >>>>>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman <
>> >>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Great news. Thanks.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> ====================
>> >>>>>>>>>>>>>>>>>> Jordan Zimmerman
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
>> >>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie <
>> >>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
>> validation
>> >>>>>>>> stuff.
>> >>>>>>>>>>>>> It
>> >>>>>>>>>>>>>>> now
>> >>>>>>>>>>>>>>>>>> does
>> >>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call. Because
>> >> the
>> >>>>>>>> unit
>> >>>>>>>>>>>>>> test
>> >>>>>>>>>>>>>>>>>> uses a
>> >>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>  final String adjustedPath =
>> >>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>> >>>>>>>>>>>>>>>>>> createMode.isSequential()));
>> >>>>>>>>>>>>>>>>>>>>  List<ACL> aclList = acling.getAclList(adjustedPath);
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>
>> >> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>> >>>>>>>>>>>>>>>>>> data,
>> >>>>>>>>>>>>>>>>>>>> aclList);
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>  String returnPath = null;
>> >>>>>>>>>>>>>>>>>>>>  if ( backgrounding.inBackground() )
>> >>>>>>>>>>>>>>>>>>>>  {
>> >>>>>>>>>>>>>>>>>>>>      pathInBackground(adjustedPath, data, givenPath);
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to force a
>> >>>> failure
>> >>>>>>>>>>>>> in a
>> >>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener, the
>> >>>>>>>>>>>>> expectation is
>> >>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding operations?
>> >>>>>>>>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie <
>> >>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there, so
>> >> maybe
>> >>>>>>>>>>>>>> something
>> >>>>>>>>>>>>>>>> has
>> >>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know if I
>> >> get
>> >>>>>>>>>>>>> stuck.
>> >>>>>>>>>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan Zimmerman <
>> >>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to
>> the
>> >>>>>>>> master
>> >>>>>>>>>>>>>>> branch?
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> -JZ
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron McKenzie <
>> >>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Guys,
>> >>>>>>>>>>>>>>>>>>>>>>> There's a test
>> >>>> TestFrameworkBackground:testErrorListener
>> >>>>>>>>>>>>> that
>> >>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>> failing
>> >>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to
>> try
>> >> and
>> >>>>>>>>>>>>> provoke
>> >>>>>>>>>>>>>>> an
>> >>>>>>>>>>>>>>>>>>>>>> error
>> >>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>> >>>> CreateBuilderImpl
>> >>>>>>>>>>>>> prior
>> >>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the exception
>> >> that
>> >>>>>>>> it
>> >>>>>>>>>>>>>> throws
>> >>>>>>>>>>>>>>>>>>>>>> happens
>> >>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it just
>> >> throws
>> >>>>>>>> an
>> >>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is propogated
>> up
>> >>>> the
>> >>>>>>>>>>>>> stack
>> >>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>> which
>> >>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just don't
>> >>>>>>>> understand
>> >>>>>>>>>>>>> how
>> >>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>>> ever
>> >>>>>>>>>>>>>>>>>>>>>>> worked?
>> >>>>>>>>>>>>>>>>>>>>>>> cheers
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: CURATOR-3.0 tests

Posted by Cameron McKenzie <mc...@gmail.com>.
TestInterProcessSemaphoreCluster.testCluster() is failling (assertion at
line 294). Again, it seems like some sort of race condition with the
watcher removal.

When I run it in Eclipse, it fails maybe 25% of the time. When it fails it
seems that it's got something to do with watcher removal. When the test
passes, this error is not logged.

org.apache.zookeeper.KeeperException$NoWatcherException: KeeperErrorCode =
No such watcher for /foo/bar/lock/leases
at
org.apache.zookeeper.ZooKeeper$ZKWatchManager.containsWatcher(ZooKeeper.java:377)
at
org.apache.zookeeper.ZooKeeper$ZKWatchManager.removeWatcher(ZooKeeper.java:252)
at
org.apache.zookeeper.WatchDeregistration.unregister(WatchDeregistration.java:58)
at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:712)
at org.apache.zookeeper.ClientCnxn.access$1500(ClientCnxn.java:97)
at
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:948)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:99)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1236)

Is it possible it's something to do with the way that the cluster is
restarted at line 282? The old cluster is not shutdown, a new one is just
created.


On Wed, Jun 1, 2016 at 10:44 AM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> I’ll try to address this as part of CURATOR-333
>
> > On May 31, 2016, at 7:08 PM, Cameron McKenzie <mc...@gmail.com>
> wrote:
> >
> > Maybe we need to look at some way of providing a hook for tests to wait
> > reliably for asynch tasks to finish?
> >
> > The latest round of tests ran OK. One test failed on an unrelated thing
> > (ConnectionLoss), but this appears to be a transient thing as it's worked
> > ok the next time around.
> >
> > I will start getting a release together. Thanks for you help with the
> > updated tests.
> > cheers
> >
> > On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <
> jordan@jordanzimmerman.com
> >> wrote:
> >
> >> The problem is in-flight watchers and async background calls. There’s no
> >> way to cancel these and they can take time to occur - even after a
> recipe
> >> instance is closed.
> >>
> >> -Jordan
> >>
> >>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <mc...@gmail.com>
> >> wrote:
> >>>
> >>> Ok, running it again now.
> >>>
> >>> Is the problem that the watcher clean up for the recipes is done
> >>> asynchronously after they are closed?
> >>>
> >>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
> >> jordan@jordanzimmerman.com
> >>>> wrote:
> >>>
> >>>> OK - please try now. I added a loop in the “no watchers” checker. If
> >> there
> >>>> are remaining watchers, it will sleep a bit and try again.
> >>>>
> >>>> -Jordan
> >>>>
> >>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <
> mckenzie.cam@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Looks like these failures are intermittent. Running them directly in
> >>>>> Eclipse they seem to be passing. I will run the whole thing again in
> >> the
> >>>>> morning and see how it goes.
> >>>>>
> >>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
> >>>> mckenzie.cam@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> There are still 2 tests failing for me:
> >>>>>>
> >>>>>> FAILURE! - in
> >>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
> >>>>>>
> >>>>
> >>
> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
> >>>>>> java.lang.AssertionError: One or more child watchers are still
> >>>> registered:
> >>>>>> [/test]
> >>>>>> at
> >>>>>>
> >>>>
> >>
> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
> >>>>>> at
> >>>>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
> >>>>>>
> >>>>>> FAILURE! - in
> >>>>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
> >>>>>>
> >>>>
> >>
> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
> >>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
> >>>>>> java.lang.AssertionError: expected [true] but found [false]
> >>>>>> at org.testng.Assert.fail(Assert.java:94)
> >>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
> >>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
> >>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
> >>>>>> at
> >>>>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
> >>>>>>
> >>>>>> Failed tests:
> >>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more child
> >>>>>> watchers are still registered: [/test]
> >>>>>> Run 2: PASS
> >>>>>>
> >>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but
> >>>>>> found [false]
> >>>>>>
> >>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
> >>>> mckenzie.cam@gmail.com
> >>>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against
> that,
> >>>> and
> >>>>>>> if it's all good will merge into CURATOR-3.0
> >>>>>>> cheers
> >>>>>>>
> >>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
> >>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>
> >>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged yet. I
> >>>>>>>> made/pushed my changes in CURATOR-332
> >>>>>>>>
> >>>>>>>> -jordan
> >>>>>>>>
> >>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
> >>>> mckenzie.cam@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> I'm still seeing 6 failed tests that seem related to the same
> stuff
> >>>>>>>> after
> >>>>>>>>> merging your fix:
> >>>>>>>>>
> >>>>>>>>> Failed tests:
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child
> >>>> watchers
> >>>>>>>>> are still registered: [/test]
> >>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child
> >>>> watchers
> >>>>>>>>> are still registered: [/test]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>> Run 1:
> >>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>>>> Run 2:
> >>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
> >>>>>>>>> One or more child watchers are still registered: [/test]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
> >>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more child
> >>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more child
> >>>>>>>>> watchers are still registered: [/one/two/three]
> >>>>>>>>>
> >>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
> >> but
> >>>>>>>>> found [false]
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>> Run 1: PASS
> >>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more
> >> data
> >>>>>>>>> watchers are still registered: [/count]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
> >>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data watchers
> are
> >>>>>>>> still
> >>>>>>>>> registered: [/count]
> >>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data watchers
> are
> >>>>>>>> still
> >>>>>>>>> registered: [/count]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
> >>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> I see the problem. The fix is not simple though so I’ll spend
> some
> >>>>>>>> time on
> >>>>>>>>>> it. The TL;DR is that exists watchers are still supposed to get
> >> set
> >>>>>>>> when
> >>>>>>>>>> there is a KeeperException.NoNode and the code isn’t handling
> it.
> >>>> But,
> >>>>>>>>>> while I was looking at the code I realized there are some
> >>>> significant
> >>>>>>>>>> additional problems. Curator, here, is trying to mirror what
> >>>>>>>> ZooKeeper does
> >>>>>>>>>> internally which is insanely complicated. In hindsight, the
> whole
> >> ZK
> >>>>>>>>>> watcher mechanism should’ve been decoupled from the mutator
> APIs.
> >>>>>>>> But, of
> >>>>>>>>>> course, that’s easy for me to say now.
> >>>>>>>>>>
> >>>>>>>>>> -Jordan
> >>>>>>>>>>
> >>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
> >>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>> Those tests are now passing for me.
> >>>>>>>>>>>
> >>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently on
> the
> >>>> 3.0
> >>>>>>>>>>> branch. It appears that this is actually potentially a bug in
> the
> >>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've had a
> >> quick
> >>>>>>>> look
> >>>>>>>>>>> through, but I haven't dived in in any detail. It's the
> >>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time, can
> you
> >>>>>>>> have a
> >>>>>>>>>>> look? If not, let me know and I'll do some more digging.
> >>>>>>>>>>>
> >>>>>>>>>>> cheers
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
> >>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks Scott.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Push the fix to master and merge it into 3.0.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto
> Nexus.
> >>>>>>>>>>>> cheers
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
> >>>> dragonsinth@gmail.com
> >>>>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to both
> >> master
> >>>>>>>> and
> >>>>>>>>>> 3.0.
> >>>>>>>>>>>>> Where should I push the fix?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
> >>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks Scott,
> >>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are
> failing
> >>>>>>>> there.
> >>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
> >>>>>>>> dragonsinth@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
> >>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Scott can you take a look?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Jordan
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
> >>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few
> times
> >>>> but
> >>>>>>>> no
> >>>>>>>>>>>>>> love:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
> >>>>>>>>>>>>>>>> actual 6
> >>>>>>>>>>>>>>>>> expected -31:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I will have a look into what's going on in the morning.
> >> Given
> >>>>>>>> that
> >>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just want to
> >>>> vote
> >>>>>>>> on
> >>>>>>>>>>>>>>> 2.11.0
> >>>>>>>>>>>>>>>>> separately, as that is all ready to go?
> >>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Great news. Thanks.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> ====================
> >>>>>>>>>>>>>>>>>> Jordan Zimmerman
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
> >>>>>>>>>>>>>>> mckenzie.cam@gmail.com
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema
> validation
> >>>>>>>> stuff.
> >>>>>>>>>>>>> It
> >>>>>>>>>>>>>>> now
> >>>>>>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call. Because
> >> the
> >>>>>>>> unit
> >>>>>>>>>>>>>> test
> >>>>>>>>>>>>>>>>>> uses a
> >>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>  final String adjustedPath =
> >>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
> >>>>>>>>>>>>>>>>>> createMode.isSequential()));
> >>>>>>>>>>>>>>>>>>>>  List<ACL> aclList = acling.getAclList(adjustedPath);
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>
> >> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
> >>>>>>>>>>>>>>>>>> data,
> >>>>>>>>>>>>>>>>>>>> aclList);
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>  String returnPath = null;
> >>>>>>>>>>>>>>>>>>>>  if ( backgrounding.inBackground() )
> >>>>>>>>>>>>>>>>>>>>  {
> >>>>>>>>>>>>>>>>>>>>      pathInBackground(adjustedPath, data, givenPath);
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to force a
> >>>> failure
> >>>>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener, the
> >>>>>>>>>>>>> expectation is
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding operations?
> >>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there, so
> >> maybe
> >>>>>>>>>>>>>> something
> >>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know if I
> >> get
> >>>>>>>>>>>>> stuck.
> >>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan Zimmerman <
> >>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to the
> >>>>>>>> master
> >>>>>>>>>>>>>>> branch?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> -JZ
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron McKenzie <
> >>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Guys,
> >>>>>>>>>>>>>>>>>>>>>>> There's a test
> >>>> TestFrameworkBackground:testErrorListener
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>> failing
> >>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to try
> >> and
> >>>>>>>>>>>>> provoke
> >>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>> error
> >>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
> >>>> CreateBuilderImpl
> >>>>>>>>>>>>> prior
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the exception
> >> that
> >>>>>>>> it
> >>>>>>>>>>>>>> throws
> >>>>>>>>>>>>>>>>>>>>>> happens
> >>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it just
> >> throws
> >>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is propogated
> up
> >>>> the
> >>>>>>>>>>>>> stack
> >>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just don't
> >>>>>>>> understand
> >>>>>>>>>>>>> how
> >>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>> ever
> >>>>>>>>>>>>>>>>>>>>>>> worked?
> >>>>>>>>>>>>>>>>>>>>>>> cheers
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: CURATOR-3.0 tests

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
I’ll try to address this as part of CURATOR-333

> On May 31, 2016, at 7:08 PM, Cameron McKenzie <mc...@gmail.com> wrote:
> 
> Maybe we need to look at some way of providing a hook for tests to wait
> reliably for asynch tasks to finish?
> 
> The latest round of tests ran OK. One test failed on an unrelated thing
> (ConnectionLoss), but this appears to be a transient thing as it's worked
> ok the next time around.
> 
> I will start getting a release together. Thanks for you help with the
> updated tests.
> cheers
> 
> On Wed, Jun 1, 2016 at 9:12 AM, Jordan Zimmerman <jordan@jordanzimmerman.com
>> wrote:
> 
>> The problem is in-flight watchers and async background calls. There’s no
>> way to cancel these and they can take time to occur - even after a recipe
>> instance is closed.
>> 
>> -Jordan
>> 
>>> On May 31, 2016, at 5:11 PM, Cameron McKenzie <mc...@gmail.com>
>> wrote:
>>> 
>>> Ok, running it again now.
>>> 
>>> Is the problem that the watcher clean up for the recipes is done
>>> asynchronously after they are closed?
>>> 
>>> On Wed, Jun 1, 2016 at 1:35 AM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com
>>>> wrote:
>>> 
>>>> OK - please try now. I added a loop in the “no watchers” checker. If
>> there
>>>> are remaining watchers, it will sleep a bit and try again.
>>>> 
>>>> -Jordan
>>>> 
>>>>> On May 31, 2016, at 1:33 AM, Cameron McKenzie <mc...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Looks like these failures are intermittent. Running them directly in
>>>>> Eclipse they seem to be passing. I will run the whole thing again in
>> the
>>>>> morning and see how it goes.
>>>>> 
>>>>> On Tue, May 31, 2016 at 2:29 PM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> There are still 2 tests failing for me:
>>>>>> 
>>>>>> FAILURE! - in
>>>>>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache
>>>>>> 
>>>> 
>> testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>> Time elapsed: 17.488 sec  <<< FAILURE!
>>>>>> java.lang.AssertionError: One or more child watchers are still
>>>> registered:
>>>>>> [/test]
>>>>>> at
>>>>>> 
>>>> 
>> org.apache.curator.framework.imps.TestCleanState.closeAndTestClean(TestCleanState.java:53)
>>>>>> at
>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(TestPathChildrenCache.java:707)
>>>>>> 
>>>>>> FAILURE! - in
>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster
>>>>>> 
>>>> 
>> testCluster(org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster)
>>>>>> Time elapsed: 96.641 sec  <<< FAILURE!
>>>>>> java.lang.AssertionError: expected [true] but found [false]
>>>>>> at org.testng.Assert.fail(Assert.java:94)
>>>>>> at org.testng.Assert.failNotEquals(Assert.java:494)
>>>>>> at org.testng.Assert.assertTrue(Assert.java:42)
>>>>>> at org.testng.Assert.assertTrue(Assert.java:52)
>>>>>> at
>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.locks.TestInterProcessSemaphoreCluster.testCluster(TestInterProcessSemaphoreCluster.java:294)
>>>>>> 
>>>>>> Failed tests:
>>>>>> 
>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testKilledSession(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>> Run 1: TestPathChildrenCache.testKilledSession:707 One or more child
>>>>>> watchers are still registered: [/test]
>>>>>> Run 2: PASS
>>>>>> 
>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true] but
>>>>>> found [false]
>>>>>> 
>>>>>> Tests run: 495, Failures: 2, Errors: 0, Skipped: 0
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, May 31, 2016 at 12:52 PM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com
>>>>>>> wrote:
>>>>>> 
>>>>>>> Thanks, CURATOR-332 wasn't pushed. I will run the tests against that,
>>>> and
>>>>>>> if it's all good will merge into CURATOR-3.0
>>>>>>> cheers
>>>>>>> 
>>>>>>> On Tue, May 31, 2016 at 12:32 PM, Jordan Zimmerman <
>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>> 
>>>>>>>> Actually - I don’t remember if branch CURATOR-332 is merged yet. I
>>>>>>>> made/pushed my changes in CURATOR-332
>>>>>>>> 
>>>>>>>> -jordan
>>>>>>>> 
>>>>>>>>> On May 26, 2016, at 10:04 PM, Cameron McKenzie <
>>>> mckenzie.cam@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> I'm still seeing 6 failed tests that seem related to the same stuff
>>>>>>>> after
>>>>>>>>> merging your fix:
>>>>>>>>> 
>>>>>>>>> Failed tests:
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasics(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>> Run 1: TestPathChildrenCache.testBasics:863 One or more child
>>>> watchers
>>>>>>>>> are still registered: [/test]
>>>>>>>>> Run 2: TestPathChildrenCache.testBasics:863 One or more child
>>>> watchers
>>>>>>>>> are still registered: [/test]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>> Run 1:
>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>> Run 2:
>>>> TestPathChildrenCache.testBasicsOnTwoCachesWithSameExecutor:934
>>>>>>>>> One or more child watchers are still registered: [/test]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.cache.TestPathChildrenCache.testEnsurePath(org.apache.curator.framework.recipes.cache.TestPathChildrenCache)
>>>>>>>>> Run 1: TestPathChildrenCache.testEnsurePath:363 One or more child
>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>> Run 2: TestPathChildrenCache.testEnsurePath:363 One or more child
>>>>>>>>> watchers are still registered: [/one/two/three]
>>>>>>>>> 
>>>>>>>>> TestInterProcessSemaphoreCluster.testCluster:294 expected [true]
>> but
>>>>>>>>> found [false]
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testMultiClientVersioned(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>> Run 1: PASS
>>>>>>>>> Run 2: TestSharedCount.testMultiClientVersioned:256 One or more
>> data
>>>>>>>>> watchers are still registered: [/count]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.curator.framework.recipes.shared.TestSharedCount.testSimple(org.apache.curator.framework.recipes.shared.TestSharedCount)
>>>>>>>>> Run 1: TestSharedCount.testSimple:174 One or more data watchers are
>>>>>>>> still
>>>>>>>>> registered: [/count]
>>>>>>>>> Run 2: TestSharedCount.testSimple:174 One or more data watchers are
>>>>>>>> still
>>>>>>>>> registered: [/count]
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Tests run: 491, Failures: 6, Errors: 0, Skipped: 0
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, May 27, 2016 at 3:30 AM, Jordan Zimmerman <
>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>> 
>>>>>>>>>> I see the problem. The fix is not simple though so I’ll spend some
>>>>>>>> time on
>>>>>>>>>> it. The TL;DR is that exists watchers are still supposed to get
>> set
>>>>>>>> when
>>>>>>>>>> there is a KeeperException.NoNode and the code isn’t handling it.
>>>> But,
>>>>>>>>>> while I was looking at the code I realized there are some
>>>> significant
>>>>>>>>>> additional problems. Curator, here, is trying to mirror what
>>>>>>>> ZooKeeper does
>>>>>>>>>> internally which is insanely complicated. In hindsight, the whole
>> ZK
>>>>>>>>>> watcher mechanism should’ve been decoupled from the mutator APIs.
>>>>>>>> But, of
>>>>>>>>>> course, that’s easy for me to say now.
>>>>>>>>>> 
>>>>>>>>>> -Jordan
>>>>>>>>>> 
>>>>>>>>>>> On May 26, 2016, at 1:10 AM, Cameron McKenzie <
>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>> Those tests are now passing for me.
>>>>>>>>>>> 
>>>>>>>>>>> Jordan, testNodeCache:testBasics() is failing consistently on the
>>>> 3.0
>>>>>>>>>>> branch. It appears that this is actually potentially a bug in the
>>>>>>>>>>> NodeCache. It ends up leaking a Watcher reference. I've had a
>> quick
>>>>>>>> look
>>>>>>>>>>> through, but I haven't dived in in any detail. It's the
>>>>>>>>>>> WatcherRemovalManager stuff I think. If you've got time, can you
>>>>>>>> have a
>>>>>>>>>>> look? If not, let me know and I'll do some more digging.
>>>>>>>>>>> 
>>>>>>>>>>> cheers
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, May 26, 2016 at 11:47 AM, Cameron McKenzie <
>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Scott.
>>>>>>>>>>>> 
>>>>>>>>>>>> Push the fix to master and merge it into 3.0.
>>>>>>>>>>>> 
>>>>>>>>>>>> Then I guess, I'll push new versions of 2.11 and 3.2 onto Nexus.
>>>>>>>>>>>> cheers
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, May 26, 2016 at 11:44 AM, Scott Blum <
>>>> dragonsinth@gmail.com
>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Alright, I have a fix, but it wants to be applied to both
>> master
>>>>>>>> and
>>>>>>>>>> 3.0.
>>>>>>>>>>>>> Where should I push the fix?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, May 25, 2016 at 6:10 PM, Cameron McKenzie <
>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks Scott,
>>>>>>>>>>>>>> If you just checkout the CURATOR-3.0 branch, they are failing
>>>>>>>> there.
>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, May 26, 2016 at 2:06 AM, Scott Blum <
>>>>>>>> dragonsinth@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Sure, what SHA are they failing at Cam?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 9:36 AM, Jordan Zimmerman <
>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Scott can you take a look?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Jordan
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On May 25, 2016, at 4:35 AM, Cameron McKenzie <
>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Tree cache tests are still failing. I've tried a few times
>>>> but
>>>>>>>> no
>>>>>>>>>>>>>> love:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>> TestTreeCacheEventOrdering>TestEventOrdering.testEventOrdering:151
>>>>>>>>>>>>>>>> actual 6
>>>>>>>>>>>>>>>>> expected -31:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I will have a look into what's going on in the morning.
>> Given
>>>>>>>> that
>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>> may take some messing about to fix up, do we just want to
>>>> vote
>>>>>>>> on
>>>>>>>>>>>>>>> 2.11.0
>>>>>>>>>>>>>>>>> separately, as that is all ready to go?
>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 5:34 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Great news. Thanks.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> ====================
>>>>>>>>>>>>>>>>>> Jordan Zimmerman
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On May 25, 2016, at 12:37 AM, Cameron McKenzie <
>>>>>>>>>>>>>>> mckenzie.cam@gmail.com
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I have fixed up the test case, all good now.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:45 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Looks like it was introduced with the schema validation
>>>>>>>> stuff.
>>>>>>>>>>>>> It
>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>>>>> an ACL check prior to the backgrounding call. Because
>> the
>>>>>>>> unit
>>>>>>>>>>>>>> test
>>>>>>>>>>>>>>>>>> uses a
>>>>>>>>>>>>>>>>>>>> bogus ACL provider it just throws an exception
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>  final String adjustedPath =
>>>>>>>>>>>>>>>>>>>> adjustPath(client.fixForNamespace(givenPath,
>>>>>>>>>>>>>>>>>> createMode.isSequential()));
>>>>>>>>>>>>>>>>>>>>  List<ACL> aclList = acling.getAclList(adjustedPath);
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>> 
>> client.getSchemaSet().getSchema(givenPath).validateCreate(createMode,
>>>>>>>>>>>>>>>>>> data,
>>>>>>>>>>>>>>>>>>>> aclList);
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>  String returnPath = null;
>>>>>>>>>>>>>>>>>>>>  if ( backgrounding.inBackground() )
>>>>>>>>>>>>>>>>>>>>  {
>>>>>>>>>>>>>>>>>>>>      pathInBackground(adjustedPath, data, givenPath);
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> So, I guess the answer is to get the test to force a
>>>> failure
>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>>>> different way. With the UnhandledErrorListener, the
>>>>>>>>>>>>> expectation is
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> this only gets called on backgrounding operations?
>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:39 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Same on the master branch, but it passes there, so
>> maybe
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>> legitimately broken the test. Will let you know if I
>> get
>>>>>>>>>>>>> stuck.
>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Wed, May 25, 2016 at 1:36 PM, Jordan Zimmerman <
>>>>>>>>>>>>>>>>>>>>> jordan@jordanzimmerman.com> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Let me know if you need help.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> It might be a bad merge. Have you compared it to the
>>>>>>>> master
>>>>>>>>>>>>>>> branch?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> -JZ
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On May 24, 2016, at 10:31 PM, Cameron McKenzie <
>>>>>>>>>>>>>>>>>> mckenzie.cam@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Guys,
>>>>>>>>>>>>>>>>>>>>>>> There's a test
>>>> TestFrameworkBackground:testErrorListener
>>>>>>>>>>>>> that
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> failing
>>>>>>>>>>>>>>>>>>>>>>> consistently on the CURATOR-3.0 branch.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I can't see how it has ever worked. It seems to try
>> and
>>>>>>>>>>>>> provoke
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>>>>>> via a bad ACL provider.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> But this ACL provider is called by the
>>>> CreateBuilderImpl
>>>>>>>>>>>>> prior
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> backgrounding call, which means that the exception
>> that
>>>>>>>> it
>>>>>>>>>>>>>> throws
>>>>>>>>>>>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>>>>>>>>> in the main Thread of the unit test. So, it just
>> throws
>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>> UnsupportedOperationException which is propogated up
>>>> the
>>>>>>>>>>>>> stack
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>>>>> point the unit test fails.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> So, I will look at fixing this, but I just don't
>>>>>>>> understand
>>>>>>>>>>>>> how
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>>>>>>>>>> worked?
>>>>>>>>>>>>>>>>>>>>>>> cheers
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>