You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Mario Kevo <ma...@est.tech> on 2021/07/06 07:06:04 UTC

NullPointerException while create region during server restart

Hi Geode devs,

I opened a new ticket https://issues.apache.org/jira/browse/GEODE-9409 regarding NullPointerException on creating region while one of the servers is restarting.
If we run the "create region" command through gfsh while the server is starting it passed, but if the server is restarted then it fails. The difference is that when we restarted the server, we kill them and start again. As it has already a server directory, it takes more time to get the server up as expected.
In that case, if we run the "create region" command it can happen that the cache is not fully created and we are trying to do something on that. That can lead to the NullPointerException, as creating region catches pdxRegistry from the cache while doing findDiskStore, but sometimes it is not initialized in the cache yet. So every method run against that will throw NullPoniterException.
There is a part of the code where the exception is thrown:

DiskStoreImpl findDiskStore(RegionAttributes regionAttributes,
    InternalRegionArguments internalRegionArgs) {
  // validate that persistent type registry is persistent
  if (getAttributes().getDataPolicy().withPersistence()) {
    getCache().getPdxRegistry().creatingPersistentRegion();
  }

As I already mention, getPdxRegistry(LocalRegion.java) will be null if it is not yet initialized in create(CacheCreation.java):

DiskStoreAttributesCreation pdxRegDSC = initializePdxDiskStore(cache);

cache.initializePdxRegistry();

createDiskStores(cache, pdxRegDSC);

I tried to do some fixes, but without a success. ๐Ÿ™
It can be passed if we add some retry and sleep, but that is not acceptable.

So if someone has some idea how to do some wait until pdxRegistry is initialized or something else what will help us to avoid this problem?

BR,
Mario

Re: Odg: NullPointerException while create region during server restart

Posted by Anthony Baker <ba...@vmware.com>.
One thing you might check is why the create region request from gfsh was allowed to proceed before initialization was complete.  That is, cluster config and all associated configuration like the pdx registry should be created before any *new configuration* requests are processed.

Iโ€™m not sure what the code path looks like but that might be a place to start investigating.

Anthony


> On Jul 8, 2021, at 4:27 AM, Mario Kevo <ma...@est.tech> wrote:
> 
> Hi Anthony,
> 
> It happened while the server is starting and creating a cache (while fills in the content of a cache based on the creation object's state). The NPE occurs when the "create region" command is executed before pdxRegistry is initialized. There is that part of the code where pdxRegistry is initialized: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNordix%2Fgeode%2Fblob%2Fdevelop%2Fgeode-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgeode%2Finternal%2Fcache%2Fxmlcache%2FCacheCreation.java%23L529&amp;data=04%7C01%7Cbakera%40vmware.com%7C45e30baf46fc4661ec3d08d942036bd5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637613404734267138%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=viKM8UDdp5xydX0AcB03xte%2Bxsdv%2F0p68qXjyca1HW4%3D&amp;reserved=0
> 
> Before this part of the code is executed it has that pdxRegistry is null, and it throws the NPE in findDiskStore.
> 
> 
> BR,
> Mario
> ________________________________
> ล alje: Anthony Baker <ba...@vmware.com>
> Poslano: 7. srpnja 2021. 17:58
> Prima: dev@geode.apache.org <de...@geode.apache.org>
> Predmet: Re: NullPointerException while create region during server restart
> 
> When the NPE occurs, has the server completed its bootstrapping from cluster configuration yet?
> 
> Anthony
> 
> 
>> On Jul 6, 2021, at 12:06 AM, Mario Kevo <ma...@est.tech> wrote:
>> 
>> Hi Geode devs,
>> 
>> I opened a new ticket https://issues.apache.org/jira/browse/GEODE-9409 regarding NullPointerException on creating region while one of the servers is restarting.
>> If we run the "create region" command through gfsh while the server is starting it passed, but if the server is restarted then it fails. The difference is that when we restarted the server, we kill them and start again. As it has already a server directory, it takes more time to get the server up as expected.
>> In that case, if we run the "create region" command it can happen that the cache is not fully created and we are trying to do something on that. That can lead to the NullPointerException, as creating region catches pdxRegistry from the cache while doing findDiskStore, but sometimes it is not initialized in the cache yet. So every method run against that will throw NullPoniterException.
>> There is a part of the code where the exception is thrown:
>> 
>> DiskStoreImpl findDiskStore(RegionAttributes regionAttributes,
>>   InternalRegionArguments internalRegionArgs) {
>> // validate that persistent type registry is persistent
>> if (getAttributes().getDataPolicy().withPersistence()) {
>>   getCache().getPdxRegistry().creatingPersistentRegion();
>> }
>> 
>> As I already mention, getPdxRegistry(LocalRegion.java) will be null if it is not yet initialized in create(CacheCreation.java):
>> 
>> DiskStoreAttributesCreation pdxRegDSC = initializePdxDiskStore(cache);
>> 
>> cache.initializePdxRegistry();
>> 
>> createDiskStores(cache, pdxRegDSC);
>> 
>> I tried to do some fixes, but without a success. ๐Ÿ™
>> It can be passed if we add some retry and sleep, but that is not acceptable.
>> 
>> So if someone has some idea how to do some wait until pdxRegistry is initialized or something else what will help us to avoid this problem?
>> 
>> BR,
>> Mario
> 


Odg: NullPointerException while create region during server restart

Posted by Mario Kevo <ma...@est.tech>.
Hi Anthony,

It happened while the server is starting and creating a cache (while fills in the content of a cache based on the creation object's state). The NPE occurs when the "create region" command is executed before pdxRegistry is initialized. There is that part of the code where pdxRegistry is initialized: https://github.com/Nordix/geode/blob/develop/geode-core/src/main/java/org/apache/geode/internal/cache/xmlcache/CacheCreation.java#L529

Before this part of the code is executed it has that pdxRegistry is null, and it throws the NPE in findDiskStore.


BR,
Mario
________________________________
ล alje: Anthony Baker <ba...@vmware.com>
Poslano: 7. srpnja 2021. 17:58
Prima: dev@geode.apache.org <de...@geode.apache.org>
Predmet: Re: NullPointerException while create region during server restart

When the NPE occurs, has the server completed its bootstrapping from cluster configuration yet?

Anthony


> On Jul 6, 2021, at 12:06 AM, Mario Kevo <ma...@est.tech> wrote:
>
> Hi Geode devs,
>
> I opened a new ticket https://issues.apache.org/jira/browse/GEODE-9409 regarding NullPointerException on creating region while one of the servers is restarting.
> If we run the "create region" command through gfsh while the server is starting it passed, but if the server is restarted then it fails. The difference is that when we restarted the server, we kill them and start again. As it has already a server directory, it takes more time to get the server up as expected.
> In that case, if we run the "create region" command it can happen that the cache is not fully created and we are trying to do something on that. That can lead to the NullPointerException, as creating region catches pdxRegistry from the cache while doing findDiskStore, but sometimes it is not initialized in the cache yet. So every method run against that will throw NullPoniterException.
> There is a part of the code where the exception is thrown:
>
> DiskStoreImpl findDiskStore(RegionAttributes regionAttributes,
>    InternalRegionArguments internalRegionArgs) {
>  // validate that persistent type registry is persistent
>  if (getAttributes().getDataPolicy().withPersistence()) {
>    getCache().getPdxRegistry().creatingPersistentRegion();
>  }
>
> As I already mention, getPdxRegistry(LocalRegion.java) will be null if it is not yet initialized in create(CacheCreation.java):
>
> DiskStoreAttributesCreation pdxRegDSC = initializePdxDiskStore(cache);
>
> cache.initializePdxRegistry();
>
> createDiskStores(cache, pdxRegDSC);
>
> I tried to do some fixes, but without a success. ๐Ÿ™
> It can be passed if we add some retry and sleep, but that is not acceptable.
>
> So if someone has some idea how to do some wait until pdxRegistry is initialized or something else what will help us to avoid this problem?
>
> BR,
> Mario


Re: NullPointerException while create region during server restart

Posted by Anthony Baker <ba...@vmware.com>.
When the NPE occurs, has the server completed its bootstrapping from cluster configuration yet?

Anthony


> On Jul 6, 2021, at 12:06 AM, Mario Kevo <ma...@est.tech> wrote:
> 
> Hi Geode devs,
> 
> I opened a new ticket https://issues.apache.org/jira/browse/GEODE-9409 regarding NullPointerException on creating region while one of the servers is restarting.
> If we run the "create region" command through gfsh while the server is starting it passed, but if the server is restarted then it fails. The difference is that when we restarted the server, we kill them and start again. As it has already a server directory, it takes more time to get the server up as expected.
> In that case, if we run the "create region" command it can happen that the cache is not fully created and we are trying to do something on that. That can lead to the NullPointerException, as creating region catches pdxRegistry from the cache while doing findDiskStore, but sometimes it is not initialized in the cache yet. So every method run against that will throw NullPoniterException.
> There is a part of the code where the exception is thrown:
> 
> DiskStoreImpl findDiskStore(RegionAttributes regionAttributes,
>    InternalRegionArguments internalRegionArgs) {
>  // validate that persistent type registry is persistent
>  if (getAttributes().getDataPolicy().withPersistence()) {
>    getCache().getPdxRegistry().creatingPersistentRegion();
>  }
> 
> As I already mention, getPdxRegistry(LocalRegion.java) will be null if it is not yet initialized in create(CacheCreation.java):
> 
> DiskStoreAttributesCreation pdxRegDSC = initializePdxDiskStore(cache);
> 
> cache.initializePdxRegistry();
> 
> createDiskStores(cache, pdxRegDSC);
> 
> I tried to do some fixes, but without a success. ๐Ÿ™
> It can be passed if we add some retry and sleep, but that is not acceptable.
> 
> So if someone has some idea how to do some wait until pdxRegistry is initialized or something else what will help us to avoid this problem?
> 
> BR,
> Mario


Re: NullPointerException while create region during server restart

Posted by Kirk Lund <kl...@apache.org>.
Hi Mario,

I would guess that *getPdxRegistry()* is returning a null until after the
registry has finished initializing. Just a guess though.

Here's a spreadsheet
<https://docs.google.com/spreadsheets/d/1FXTWwP8mBKc03STo-GcWBgKlWnBWk0I3rwp4-pHT00o/edit?usp=sharing>a
couple of us created and used as a reference for some work about a year and
a half ago. The source code line numbers probably aren't correct anymore,
but the order of steps and general details should still be accurate. As
you'll see, the PDX region (aka the PDX registry) is created at step 19 of
the spreadsheet.

Step 26 is the creation of the CacheServerMXBean.
Step 29 marks 'Online' status change.

You need to wait for all servers to reach 'Online' on Step 29 of the
spreadsheet before making any changes like creating regions.

To understand how to identify 'Online', take a look at these two acceptance
tests:
1.
geode-assembly/src/acceptanceTest/java/org/apache/geode/launcher/ServerStartupOnlineTest.java
2.
geode-assembly/src/acceptanceTest/java/org/apache/geode/launcher/ServerStartupNotificationTest.java

-Kirk

On Tue, Jul 6, 2021 at 12:06 AM Mario Kevo <ma...@est.tech> wrote:

> Hi Geode devs,
>
> I opened a new ticket https://issues.apache.org/jira/browse/GEODE-9409
> regarding NullPointerException on creating region while one of the servers
> is restarting.
> If we run the "create region" command through gfsh while the server is
> starting it passed, but if the server is restarted then it fails. The
> difference is that when we restarted the server, we kill them and start
> again. As it has already a server directory, it takes more time to get the
> server up as expected.
> In that case, if we run the "create region" command it can happen that the
> cache is not fully created and we are trying to do something on that. That
> can lead to the NullPointerException, as creating region catches
> pdxRegistry from the cache while doing findDiskStore, but sometimes it is
> not initialized in the cache yet. So every method run against that will
> throw NullPoniterException.
> There is a part of the code where the exception is thrown:
>
> DiskStoreImpl findDiskStore(RegionAttributes regionAttributes,
>     InternalRegionArguments internalRegionArgs) {
>   // validate that persistent type registry is persistent
>   if (getAttributes().getDataPolicy().withPersistence()) {
>     getCache().getPdxRegistry().creatingPersistentRegion();
>   }
>
> As I already mention, getPdxRegistry(LocalRegion.java) will be null if it
> is not yet initialized in create(CacheCreation.java):
>
> DiskStoreAttributesCreation pdxRegDSC = initializePdxDiskStore(cache);
>
> cache.initializePdxRegistry();
>
> createDiskStores(cache, pdxRegDSC);
>
> I tried to do some fixes, but without a success. ๐Ÿ™
> It can be passed if we add some retry and sleep, but that is not
> acceptable.
>
> So if someone has some idea how to do some wait until pdxRegistry is
> initialized or something else what will help us to avoid this problem?
>
> BR,
> Mario
>