You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Mario Kevo <ma...@est.tech> on 2021/10/08 12:31:44 UTC

Region is not created on one of the servers

Hi geode-dev,

We are using a system with a large number of servers.
While starting all servers, in parallel, we create a region through gfsh.
The problem is that on one of the servers region is not created.

There is an example of the problem:

We started the locator, and then go with starting the servers, one by one.
In the meantime, we run the "create region" command through gfsh.
All servers that are started before the "create region" command got information to create a region on itself, but the problem is in the server which is started after the "create region" command is started and not finished yet.
After the "create region" command is finished, all other servers started after that will get that region in the cluster configuration and create it.

What happened with this one server without a region?
It is started after the "create region" command is started, so it will not get information to create a region on itself from the locator. Also, the cluster configuration doesn't have that information yet, so the server cannot read it from the received cluster configuration.

So the question is, is it allowed to run commands in parallel?
If yes, we should do some checks in the code to avoid this issue.
If not, we need to write it somewhere in the documentation.

BR,
Mario


Re: Region is not created on one of the servers

Posted by Dan Smith <da...@vmware.com>.
I think maybe a better option might be to use a lock for the cluster configuration. We can make the request to get the cluster config wait until the update to the cluster config is completely applied. Maybe we already have a lock to force cluster configuration updates to happen one at a time?

-Dan
________________________________
From: Mario Kevo <ma...@est.tech>
Sent: Tuesday, October 12, 2021 1:35 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Odg: Region is not created on one of the servers

The new ticket is opened.
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9718&amp;data=04%7C01%7Cdasmith%40vmware.com%7C712ca6480e4642cb8dd108d98d5b3ac0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637696245240683976%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Q4suASyMhog5N8ZWdBH266udGFxe9MBQVMKEkms1zhM%3D&amp;reserved=0

There are two proposals on the ticket, so it should be decided in which way we should go.

BR,
Mario
________________________________
Šalje: Udo Kohlmeyer <ud...@vmware.com>
Poslano: 12. listopada 2021. 0:59
Prima: dev@geode.apache.org <de...@geode.apache.org>
Predmet: Re: Region is not created on one of the servers

Hi Mario,

I think your assessment of the problem is correct. Thinking about it, there is no simple (correct) way to easily solve this. Given that there are too many variables in play, users making configurational changes, whilst servers are coming up.

Now, that said, I think we should address this problem. I also think your assessment is correct that cluster configuration was not written to handle this scenario. I think some thought has to go into the algorithm that one would like to follow and how we would like to resolve it.

Can you please raise a ticket on this issue.

--Udo

From: Mario Kevo <ma...@est.tech>
Date: Monday, October 11, 2021 at 11:27 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Odg: Region is not created on one of the servers
I think that there can be a problem if we change to first add it to cluster config and then do distribution to existing servers.

Now, when the "create region" command is executed it got all servers from the view and sends all of them to start creating a region with parameters specified in the command.
The region creating is started on all servers and after it is finished, it is added to the cluster configuration. In case there are some problems with creating a region(wrong parameter used or something else) it will not create a region on the existing servers and will not write anything in a cluster configuration.

In case we decide to change order, it will write in the cluster config before the command is successful, and then we should have some backup to rollback cluster configuration.

Also, this can happen for all commands that editing cluster configuration.

It looks like this is not designed to execute some commands in parallel with starting servers.

BR,
Mario
________________________________
Šalje: Dan Smith <da...@vmware.com>
Poslano: 8. listopada 2021. 20:37
Prima: dev@geode.apache.org <de...@geode.apache.org>
Predmet: Re: Region is not created on one of the servers

This seems like something ought to work, so I would call it a bug if the region didn't get created on 1 server. At first glance, it looks like the problem is that we distribute the region to all the servers before adding it to cluster config? Seems like we need to do distribution after​ adding the region to cluster config, to make sure that all servers get the region.

-Dan
________________________________
From: Mario Kevo <ma...@est.tech>
Sent: Friday, October 8, 2021 5:31 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Region is not created on one of the servers

Hi geode-dev,

We are using a system with a large number of servers.
While starting all servers, in parallel, we create a region through gfsh.
The problem is that on one of the servers region is not created.

There is an example of the problem:

We started the locator, and then go with starting the servers, one by one.
In the meantime, we run the "create region" command through gfsh.
All servers that are started before the "create region" command got information to create a region on itself, but the problem is in the server which is started after the "create region" command is started and not finished yet.
After the "create region" command is finished, all other servers started after that will get that region in the cluster configuration and create it.

What happened with this one server without a region?
It is started after the "create region" command is started, so it will not get information to create a region on itself from the locator. Also, the cluster configuration doesn't have that information yet, so the server cannot read it from the received cluster configuration.

So the question is, is it allowed to run commands in parallel?
If yes, we should do some checks in the code to avoid this issue.
If not, we need to write it somewhere in the documentation.

BR,
Mario

Odg: Region is not created on one of the servers

Posted by Mario Kevo <ma...@est.tech>.
The new ticket is opened.
https://issues.apache.org/jira/browse/GEODE-9718

There are two proposals on the ticket, so it should be decided in which way we should go.

BR,
Mario
________________________________
Šalje: Udo Kohlmeyer <ud...@vmware.com>
Poslano: 12. listopada 2021. 0:59
Prima: dev@geode.apache.org <de...@geode.apache.org>
Predmet: Re: Region is not created on one of the servers

Hi Mario,

I think your assessment of the problem is correct. Thinking about it, there is no simple (correct) way to easily solve this. Given that there are too many variables in play, users making configurational changes, whilst servers are coming up.

Now, that said, I think we should address this problem. I also think your assessment is correct that cluster configuration was not written to handle this scenario. I think some thought has to go into the algorithm that one would like to follow and how we would like to resolve it.

Can you please raise a ticket on this issue.

--Udo

From: Mario Kevo <ma...@est.tech>
Date: Monday, October 11, 2021 at 11:27 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Odg: Region is not created on one of the servers
I think that there can be a problem if we change to first add it to cluster config and then do distribution to existing servers.

Now, when the "create region" command is executed it got all servers from the view and sends all of them to start creating a region with parameters specified in the command.
The region creating is started on all servers and after it is finished, it is added to the cluster configuration. In case there are some problems with creating a region(wrong parameter used or something else) it will not create a region on the existing servers and will not write anything in a cluster configuration.

In case we decide to change order, it will write in the cluster config before the command is successful, and then we should have some backup to rollback cluster configuration.

Also, this can happen for all commands that editing cluster configuration.

It looks like this is not designed to execute some commands in parallel with starting servers.

BR,
Mario
________________________________
Šalje: Dan Smith <da...@vmware.com>
Poslano: 8. listopada 2021. 20:37
Prima: dev@geode.apache.org <de...@geode.apache.org>
Predmet: Re: Region is not created on one of the servers

This seems like something ought to work, so I would call it a bug if the region didn't get created on 1 server. At first glance, it looks like the problem is that we distribute the region to all the servers before adding it to cluster config? Seems like we need to do distribution after​ adding the region to cluster config, to make sure that all servers get the region.

-Dan
________________________________
From: Mario Kevo <ma...@est.tech>
Sent: Friday, October 8, 2021 5:31 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Region is not created on one of the servers

Hi geode-dev,

We are using a system with a large number of servers.
While starting all servers, in parallel, we create a region through gfsh.
The problem is that on one of the servers region is not created.

There is an example of the problem:

We started the locator, and then go with starting the servers, one by one.
In the meantime, we run the "create region" command through gfsh.
All servers that are started before the "create region" command got information to create a region on itself, but the problem is in the server which is started after the "create region" command is started and not finished yet.
After the "create region" command is finished, all other servers started after that will get that region in the cluster configuration and create it.

What happened with this one server without a region?
It is started after the "create region" command is started, so it will not get information to create a region on itself from the locator. Also, the cluster configuration doesn't have that information yet, so the server cannot read it from the received cluster configuration.

So the question is, is it allowed to run commands in parallel?
If yes, we should do some checks in the code to avoid this issue.
If not, we need to write it somewhere in the documentation.

BR,
Mario

Re: Region is not created on one of the servers

Posted by Udo Kohlmeyer <ud...@vmware.com>.
Hi Mario,

I think your assessment of the problem is correct. Thinking about it, there is no simple (correct) way to easily solve this. Given that there are too many variables in play, users making configurational changes, whilst servers are coming up.

Now, that said, I think we should address this problem. I also think your assessment is correct that cluster configuration was not written to handle this scenario. I think some thought has to go into the algorithm that one would like to follow and how we would like to resolve it.

Can you please raise a ticket on this issue.

--Udo

From: Mario Kevo <ma...@est.tech>
Date: Monday, October 11, 2021 at 11:27 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Odg: Region is not created on one of the servers
I think that there can be a problem if we change to first add it to cluster config and then do distribution to existing servers.

Now, when the "create region" command is executed it got all servers from the view and sends all of them to start creating a region with parameters specified in the command.
The region creating is started on all servers and after it is finished, it is added to the cluster configuration. In case there are some problems with creating a region(wrong parameter used or something else) it will not create a region on the existing servers and will not write anything in a cluster configuration.

In case we decide to change order, it will write in the cluster config before the command is successful, and then we should have some backup to rollback cluster configuration.

Also, this can happen for all commands that editing cluster configuration.

It looks like this is not designed to execute some commands in parallel with starting servers.

BR,
Mario
________________________________
Šalje: Dan Smith <da...@vmware.com>
Poslano: 8. listopada 2021. 20:37
Prima: dev@geode.apache.org <de...@geode.apache.org>
Predmet: Re: Region is not created on one of the servers

This seems like something ought to work, so I would call it a bug if the region didn't get created on 1 server. At first glance, it looks like the problem is that we distribute the region to all the servers before adding it to cluster config? Seems like we need to do distribution after​ adding the region to cluster config, to make sure that all servers get the region.

-Dan
________________________________
From: Mario Kevo <ma...@est.tech>
Sent: Friday, October 8, 2021 5:31 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Region is not created on one of the servers

Hi geode-dev,

We are using a system with a large number of servers.
While starting all servers, in parallel, we create a region through gfsh.
The problem is that on one of the servers region is not created.

There is an example of the problem:

We started the locator, and then go with starting the servers, one by one.
In the meantime, we run the "create region" command through gfsh.
All servers that are started before the "create region" command got information to create a region on itself, but the problem is in the server which is started after the "create region" command is started and not finished yet.
After the "create region" command is finished, all other servers started after that will get that region in the cluster configuration and create it.

What happened with this one server without a region?
It is started after the "create region" command is started, so it will not get information to create a region on itself from the locator. Also, the cluster configuration doesn't have that information yet, so the server cannot read it from the received cluster configuration.

So the question is, is it allowed to run commands in parallel?
If yes, we should do some checks in the code to avoid this issue.
If not, we need to write it somewhere in the documentation.

BR,
Mario

Odg: Region is not created on one of the servers

Posted by Mario Kevo <ma...@est.tech>.
I think that there can be a problem if we change to first add it to cluster config and then do distribution to existing servers.

Now, when the "create region" command is executed it got all servers from the view and sends all of them to start creating a region with parameters specified in the command.
The region creating is started on all servers and after it is finished, it is added to the cluster configuration. In case there are some problems with creating a region(wrong parameter used or something else) it will not create a region on the existing servers and will not write anything in a cluster configuration.

In case we decide to change order, it will write in the cluster config before the command is successful, and then we should have some backup to rollback cluster configuration.

Also, this can happen for all commands that editing cluster configuration.

It looks like this is not designed to execute some commands in parallel with starting servers.

BR,
Mario
________________________________
Šalje: Dan Smith <da...@vmware.com>
Poslano: 8. listopada 2021. 20:37
Prima: dev@geode.apache.org <de...@geode.apache.org>
Predmet: Re: Region is not created on one of the servers

This seems like something ought to work, so I would call it a bug if the region didn't get created on 1 server. At first glance, it looks like the problem is that we distribute the region to all the servers before adding it to cluster config? Seems like we need to do distribution after​ adding the region to cluster config, to make sure that all servers get the region.

-Dan
________________________________
From: Mario Kevo <ma...@est.tech>
Sent: Friday, October 8, 2021 5:31 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Region is not created on one of the servers

Hi geode-dev,

We are using a system with a large number of servers.
While starting all servers, in parallel, we create a region through gfsh.
The problem is that on one of the servers region is not created.

There is an example of the problem:

We started the locator, and then go with starting the servers, one by one.
In the meantime, we run the "create region" command through gfsh.
All servers that are started before the "create region" command got information to create a region on itself, but the problem is in the server which is started after the "create region" command is started and not finished yet.
After the "create region" command is finished, all other servers started after that will get that region in the cluster configuration and create it.

What happened with this one server without a region?
It is started after the "create region" command is started, so it will not get information to create a region on itself from the locator. Also, the cluster configuration doesn't have that information yet, so the server cannot read it from the received cluster configuration.

So the question is, is it allowed to run commands in parallel?
If yes, we should do some checks in the code to avoid this issue.
If not, we need to write it somewhere in the documentation.

BR,
Mario


Re: Region is not created on one of the servers

Posted by Dan Smith <da...@vmware.com>.
This seems like something ought to work, so I would call it a bug if the region didn't get created on 1 server. At first glance, it looks like the problem is that we distribute the region to all the servers before adding it to cluster config? Seems like we need to do distribution after​ adding the region to cluster config, to make sure that all servers get the region.

-Dan
________________________________
From: Mario Kevo <ma...@est.tech>
Sent: Friday, October 8, 2021 5:31 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Region is not created on one of the servers

Hi geode-dev,

We are using a system with a large number of servers.
While starting all servers, in parallel, we create a region through gfsh.
The problem is that on one of the servers region is not created.

There is an example of the problem:

We started the locator, and then go with starting the servers, one by one.
In the meantime, we run the "create region" command through gfsh.
All servers that are started before the "create region" command got information to create a region on itself, but the problem is in the server which is started after the "create region" command is started and not finished yet.
After the "create region" command is finished, all other servers started after that will get that region in the cluster configuration and create it.

What happened with this one server without a region?
It is started after the "create region" command is started, so it will not get information to create a region on itself from the locator. Also, the cluster configuration doesn't have that information yet, so the server cannot read it from the received cluster configuration.

So the question is, is it allowed to run commands in parallel?
If yes, we should do some checks in the code to avoid this issue.
If not, we need to write it somewhere in the documentation.

BR,
Mario