You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Kirk Lund <kl...@apache.org> on 2019/11/25 21:40:47 UTC

Cache.close is not synchronous?

I found a test that closes the cache and then recreates the cache multiple
times with 2 second sleep between each. I tried to remove the Thread.sleep
and found that recreating the cache
throws DistributedSystemDisconnectedException (see below).

This seems like a usability nightmare. Anyone have any ideas WHY it's this
way?

Personally, I want Cache.close() to block until both Cache and
DistributedSystem are closed and the API is ready to create a new Cache.

org.apache.geode.distributed.DistributedSystemDisconnectedException: This
connection to a distributed system has been disconnected.
        at
org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
        at
org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:791)
        at
org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
        at
org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
        at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)

Re: Cache.close is not synchronous?

Posted by Kirk Lund <kl...@apache.org>.
I added a stack trace to the closing of both GemFireCacheImpl and
InternalDistributedSystem and found a difference.

The test passes when it's the test thread doing the close:

java.lang.Throwable: KIRK GemFireCacheImpl closed 1046056441
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2365)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1912)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1902)
        at
org.apache.geode.cache.CacheFactoryRecreateRegressionTest.recreateDoesNotThrowDistributedSystemDisconnectedException(CacheFactoryRecreateRegressionTest.java:56)
java.lang.Throwable: KIRK InternalDistributedSystem closed 1311844206
        at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1637)
        at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1225)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2351)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1912)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1902)
        at
org.apache.geode.cache.CacheFactoryRecreateRegressionTest.recreateDoesNotThrowDistributedSystemDisconnectedException(CacheFactoryRecreateRegressionTest.java:56)

When the test fails and reproduces the problem, the close is apparently
completed by a different background thread:

java.lang.Throwable: KIRK GemFireCacheImpl closed 277876155
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2365)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1917)
        at
org.apache.geode.internal.cache.DiskStoreImpl.lambda$handleDiskAccessException$2(DiskStoreImpl.java:3380)
        at java.lang.Thread.run(Thread.java:748)
java.lang.Throwable: KIRK InternalDistributedSystem closed 306674056
        at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1637)
        at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1225)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2351)
        at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1917)
        at
org.apache.geode.internal.cache.DiskStoreImpl.lambda$handleDiskAccessException$2(DiskStoreImpl.java:3380)
        at java.lang.Thread.run(Thread.java:748)

On Tue, Nov 26, 2019 at 9:20 AM Kirk Lund <kl...@apache.org> wrote:

> Seems like this must be a bug, so I filed
> https://issues.apache.org/jira/browse/GEODE-7503. I'll look into it...
>
> On Mon, Nov 25, 2019 at 3:24 PM Anilkumar Gingade <ag...@pivotal.io>
> wrote:
>
>> Looking at the code, the cache.close() and InternalCacheBuilder.create()
>> are synchronized on "GemFireCacheImpl.class"'; it's the
>> internalCachebuilder create that seems to be using reference to the old
>> distributed-system.
>> The GemFireCacheImpl.getInstance() and getExisting() both perform
>> "isClosing" check and does early return. The InternalCacheBuilder is new;
>> not sure if its missing early checks.
>>
>> -Anil.
>>
>> On Mon, Nov 25, 2019 at 2:47 PM Mark Hanson <mh...@pivotal.io> wrote:
>>
>> > +1 to fix.
>> >
>> > > On Nov 25, 2019, at 2:02 PM, John Blum <jb...@pivotal.io> wrote:
>> > >
>> > > +1 ^ 64!
>> > >
>> > > I found this out the hard way some time ago and is why STDG exists in
>> the
>> > > first place (i.e. usability issues, particularly with testing).
>> > >
>> > > On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund <kl...@apache.org> wrote:
>> > >
>> > >> I found a test that closes the cache and then recreates the cache
>> > multiple
>> > >> times with 2 second sleep between each. I tried to remove the
>> > Thread.sleep
>> > >> and found that recreating the cache
>> > >> throws DistributedSystemDisconnectedException (see below).
>> > >>
>> > >> This seems like a usability nightmare. Anyone have any ideas WHY it's
>> > this
>> > >> way?
>> > >>
>> > >> Personally, I want Cache.close() to block until both Cache and
>> > >> DistributedSystem are closed and the API is ready to create a new
>> Cache.
>> > >>
>> > >> org.apache.geode.distributed.DistributedSystemDisconnectedException:
>> > This
>> > >> connection to a distributed system has been disconnected.
>> > >>        at
>> > >>
>> > >>
>> >
>> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
>> > >>        at
>> > >>
>> > >>
>> >
>> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
>> > >>        at
>> > >>
>> > >>
>> >
>> org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:791)
>> > >>        at
>> > >>
>> > >>
>> >
>> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
>> > >>        at
>> > >>
>> > >>
>> >
>> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
>> > >>        at
>> > >> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
>> > >>
>> > >
>> > >
>> > > --
>> > > -John
>> > > john.blum10101 (skype)
>> >
>> >
>>
>

Re: Cache.close is not synchronous?

Posted by Kirk Lund <kl...@apache.org>.
Seems like this must be a bug, so I filed
https://issues.apache.org/jira/browse/GEODE-7503. I'll look into it...

On Mon, Nov 25, 2019 at 3:24 PM Anilkumar Gingade <ag...@pivotal.io>
wrote:

> Looking at the code, the cache.close() and InternalCacheBuilder.create()
> are synchronized on "GemFireCacheImpl.class"'; it's the
> internalCachebuilder create that seems to be using reference to the old
> distributed-system.
> The GemFireCacheImpl.getInstance() and getExisting() both perform
> "isClosing" check and does early return. The InternalCacheBuilder is new;
> not sure if its missing early checks.
>
> -Anil.
>
> On Mon, Nov 25, 2019 at 2:47 PM Mark Hanson <mh...@pivotal.io> wrote:
>
> > +1 to fix.
> >
> > > On Nov 25, 2019, at 2:02 PM, John Blum <jb...@pivotal.io> wrote:
> > >
> > > +1 ^ 64!
> > >
> > > I found this out the hard way some time ago and is why STDG exists in
> the
> > > first place (i.e. usability issues, particularly with testing).
> > >
> > > On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund <kl...@apache.org> wrote:
> > >
> > >> I found a test that closes the cache and then recreates the cache
> > multiple
> > >> times with 2 second sleep between each. I tried to remove the
> > Thread.sleep
> > >> and found that recreating the cache
> > >> throws DistributedSystemDisconnectedException (see below).
> > >>
> > >> This seems like a usability nightmare. Anyone have any ideas WHY it's
> > this
> > >> way?
> > >>
> > >> Personally, I want Cache.close() to block until both Cache and
> > >> DistributedSystem are closed and the API is ready to create a new
> Cache.
> > >>
> > >> org.apache.geode.distributed.DistributedSystemDisconnectedException:
> > This
> > >> connection to a distributed system has been disconnected.
> > >>        at
> > >>
> > >>
> >
> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
> > >>        at
> > >>
> > >>
> >
> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
> > >>        at
> > >>
> > >>
> >
> org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:791)
> > >>        at
> > >>
> > >>
> >
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
> > >>        at
> > >>
> > >>
> >
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
> > >>        at
> > >> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
> > >>
> > >
> > >
> > > --
> > > -John
> > > john.blum10101 (skype)
> >
> >
>

Re: Cache.close is not synchronous?

Posted by Ivan Godwin <ig...@pivotal.io>.
+1 for fixing.

On Tue, Nov 26, 2019 at 6:53 AM Alberto Bustamante Reyes
<al...@est.tech> wrote:

> +1 for fixing it.
> ________________________________
> De: Anilkumar Gingade <ag...@pivotal.io>
> Enviado: martes, 26 de noviembre de 2019 0:24
> Para: geode <de...@geode.apache.org>
> Asunto: Re: Cache.close is not synchronous?
>
> Looking at the code, the cache.close() and InternalCacheBuilder.create()
> are synchronized on "GemFireCacheImpl.class"'; it's the
> internalCachebuilder create that seems to be using reference to the old
> distributed-system.
> The GemFireCacheImpl.getInstance() and getExisting() both perform
> "isClosing" check and does early return. The InternalCacheBuilder is new;
> not sure if its missing early checks.
>
> -Anil.
>
> On Mon, Nov 25, 2019 at 2:47 PM Mark Hanson <mh...@pivotal.io> wrote:
>
> > +1 to fix.
> >
> > > On Nov 25, 2019, at 2:02 PM, John Blum <jb...@pivotal.io> wrote:
> > >
> > > +1 ^ 64!
> > >
> > > I found this out the hard way some time ago and is why STDG exists in
> the
> > > first place (i.e. usability issues, particularly with testing).
> > >
> > > On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund <kl...@apache.org> wrote:
> > >
> > >> I found a test that closes the cache and then recreates the cache
> > multiple
> > >> times with 2 second sleep between each. I tried to remove the
> > Thread.sleep
> > >> and found that recreating the cache
> > >> throws DistributedSystemDisconnectedException (see below).
> > >>
> > >> This seems like a usability nightmare. Anyone have any ideas WHY it's
> > this
> > >> way?
> > >>
> > >> Personally, I want Cache.close() to block until both Cache and
> > >> DistributedSystem are closed and the API is ready to create a new
> Cache.
> > >>
> > >> org.apache.geode.distributed.DistributedSystemDisconnectedException:
> > This
> > >> connection to a distributed system has been disconnected.
> > >>        at
> > >>
> > >>
> >
> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
> > >>        at
> > >>
> > >>
> >
> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
> > >>        at
> > >>
> > >>
> >
> org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:791)
> > >>        at
> > >>
> > >>
> >
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
> > >>        at
> > >>
> > >>
> >
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
> > >>        at
> > >> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
> > >>
> > >
> > >
> > > --
> > > -John
> > > john.blum10101 (skype)
> >
> >
>

RE: Cache.close is not synchronous?

Posted by Alberto Bustamante Reyes <al...@est.tech>.
+1 for fixing it.
________________________________
De: Anilkumar Gingade <ag...@pivotal.io>
Enviado: martes, 26 de noviembre de 2019 0:24
Para: geode <de...@geode.apache.org>
Asunto: Re: Cache.close is not synchronous?

Looking at the code, the cache.close() and InternalCacheBuilder.create()
are synchronized on "GemFireCacheImpl.class"'; it's the
internalCachebuilder create that seems to be using reference to the old
distributed-system.
The GemFireCacheImpl.getInstance() and getExisting() both perform
"isClosing" check and does early return. The InternalCacheBuilder is new;
not sure if its missing early checks.

-Anil.

On Mon, Nov 25, 2019 at 2:47 PM Mark Hanson <mh...@pivotal.io> wrote:

> +1 to fix.
>
> > On Nov 25, 2019, at 2:02 PM, John Blum <jb...@pivotal.io> wrote:
> >
> > +1 ^ 64!
> >
> > I found this out the hard way some time ago and is why STDG exists in the
> > first place (i.e. usability issues, particularly with testing).
> >
> > On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund <kl...@apache.org> wrote:
> >
> >> I found a test that closes the cache and then recreates the cache
> multiple
> >> times with 2 second sleep between each. I tried to remove the
> Thread.sleep
> >> and found that recreating the cache
> >> throws DistributedSystemDisconnectedException (see below).
> >>
> >> This seems like a usability nightmare. Anyone have any ideas WHY it's
> this
> >> way?
> >>
> >> Personally, I want Cache.close() to block until both Cache and
> >> DistributedSystem are closed and the API is ready to create a new Cache.
> >>
> >> org.apache.geode.distributed.DistributedSystemDisconnectedException:
> This
> >> connection to a distributed system has been disconnected.
> >>        at
> >>
> >>
> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
> >>        at
> >>
> >>
> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
> >>        at
> >>
> >>
> org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:791)
> >>        at
> >>
> >>
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
> >>        at
> >>
> >>
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
> >>        at
> >> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
> >>
> >
> >
> > --
> > -John
> > john.blum10101 (skype)
>
>

Re: Cache.close is not synchronous?

Posted by Anilkumar Gingade <ag...@pivotal.io>.
Looking at the code, the cache.close() and InternalCacheBuilder.create()
are synchronized on "GemFireCacheImpl.class"'; it's the
internalCachebuilder create that seems to be using reference to the old
distributed-system.
The GemFireCacheImpl.getInstance() and getExisting() both perform
"isClosing" check and does early return. The InternalCacheBuilder is new;
not sure if its missing early checks.

-Anil.

On Mon, Nov 25, 2019 at 2:47 PM Mark Hanson <mh...@pivotal.io> wrote:

> +1 to fix.
>
> > On Nov 25, 2019, at 2:02 PM, John Blum <jb...@pivotal.io> wrote:
> >
> > +1 ^ 64!
> >
> > I found this out the hard way some time ago and is why STDG exists in the
> > first place (i.e. usability issues, particularly with testing).
> >
> > On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund <kl...@apache.org> wrote:
> >
> >> I found a test that closes the cache and then recreates the cache
> multiple
> >> times with 2 second sleep between each. I tried to remove the
> Thread.sleep
> >> and found that recreating the cache
> >> throws DistributedSystemDisconnectedException (see below).
> >>
> >> This seems like a usability nightmare. Anyone have any ideas WHY it's
> this
> >> way?
> >>
> >> Personally, I want Cache.close() to block until both Cache and
> >> DistributedSystem are closed and the API is ready to create a new Cache.
> >>
> >> org.apache.geode.distributed.DistributedSystemDisconnectedException:
> This
> >> connection to a distributed system has been disconnected.
> >>        at
> >>
> >>
> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
> >>        at
> >>
> >>
> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
> >>        at
> >>
> >>
> org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:791)
> >>        at
> >>
> >>
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
> >>        at
> >>
> >>
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
> >>        at
> >> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
> >>
> >
> >
> > --
> > -John
> > john.blum10101 (skype)
>
>

Re: Cache.close is not synchronous?

Posted by Mark Hanson <mh...@pivotal.io>.
+1 to fix.

> On Nov 25, 2019, at 2:02 PM, John Blum <jb...@pivotal.io> wrote:
> 
> +1 ^ 64!
> 
> I found this out the hard way some time ago and is why STDG exists in the
> first place (i.e. usability issues, particularly with testing).
> 
> On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund <kl...@apache.org> wrote:
> 
>> I found a test that closes the cache and then recreates the cache multiple
>> times with 2 second sleep between each. I tried to remove the Thread.sleep
>> and found that recreating the cache
>> throws DistributedSystemDisconnectedException (see below).
>> 
>> This seems like a usability nightmare. Anyone have any ideas WHY it's this
>> way?
>> 
>> Personally, I want Cache.close() to block until both Cache and
>> DistributedSystem are closed and the API is ready to create a new Cache.
>> 
>> org.apache.geode.distributed.DistributedSystemDisconnectedException: This
>> connection to a distributed system has been disconnected.
>>        at
>> 
>> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
>>        at
>> 
>> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
>>        at
>> 
>> org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:791)
>>        at
>> 
>> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
>>        at
>> 
>> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
>>        at
>> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
>> 
> 
> 
> -- 
> -John
> john.blum10101 (skype)


Re: Cache.close is not synchronous?

Posted by John Blum <jb...@pivotal.io>.
+1 ^ 64!

I found this out the hard way some time ago and is why STDG exists in the
first place (i.e. usability issues, particularly with testing).

On Mon, Nov 25, 2019 at 1:41 PM Kirk Lund <kl...@apache.org> wrote:

> I found a test that closes the cache and then recreates the cache multiple
> times with 2 second sleep between each. I tried to remove the Thread.sleep
> and found that recreating the cache
> throws DistributedSystemDisconnectedException (see below).
>
> This seems like a usability nightmare. Anyone have any ideas WHY it's this
> way?
>
> Personally, I want Cache.close() to block until both Cache and
> DistributedSystem are closed and the API is ready to create a new Cache.
>
> org.apache.geode.distributed.DistributedSystemDisconnectedException: This
> connection to a distributed system has been disconnected.
>         at
>
> org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:945)
>         at
>
> org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1665)
>         at
>
> org.apache.geode.internal.cache.GemFireCacheImpl.<init>(GemFireCacheImpl.java:791)
>         at
>
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:187)
>         at
>
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:158)
>         at
> org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
>


-- 
-John
john.blum10101 (skype)