You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geode.apache.org by Mario Ivanac <ma...@est.tech> on 2020/06/05 08:19:42 UTC

Odg: About Geode rolling downgrade

Hi all,

just a reminder that Alberto is still waiting for feedback,
regarding his question.

BR,
Mario
________________________________
Šalje: Alberto Gomez <al...@est.tech>
Poslano: 14. svibnja 2020. 14:45
Prima: geode <de...@geode.apache.org>
Predmet: Re: About Geode rolling downgrade

Hi,

I friendly reminder to the community about this request for feedback.

Thanks,

-Alberto G.
________________________________
From: Alberto Gomez <al...@est.tech>
Sent: Thursday, May 7, 2020 10:44 AM
To: geode <de...@geode.apache.org>
Subject: Re: About Geode rolling downgrade

Hi again,

Considering Geode does not support online rollback for the time being and since we have the need to rollback even a standalone system, we were thinking on a procedure to downgrade Geode cluster tolerating downtime, but without a need to:

  *   spin another cluster to sync from,
  *   do a restore or
  *   import data snapshot.

The procedure we came up with is:

  1.  First step - downgrade locators:

     *   While still on the newer version, export cluster configuration.
     *   Shutdown all locators. Existing clients will continue using their server connections. New clients/connections are not possible.
     *   Start new locators using the old SW version and import cluster configuration. They will form a new cluster. Existing client connections should still work, but new client connections are not yet possible (no servers connected to locators).

  1.  Second step – downgrade servers:

     *   First shutdown all servers in parallel. This marks the beginning of total downtime.
     *   Now start all servers in parallel but still on the new software version. Servers connect to the cluster formed by the downgraded locators. When servers are up, downtime ends. New client connections are possible. The rest of the rollback should be fully online.
     *   Now per server:

                                                               i.      Shutdown it, revoke its disk-stores and delete its file system.

                                                             ii.      Start server using old SW version. When up, server will take over cluster configuration and pick up replicated data and partitioned regions buckets satisfying region redundancy (essentially will hold exactly the same data previous server had).

The above has some important prerequisites:

  1.  Partitioned regions have redundancy and region configuration allows recovery as described above.
  2.  Clients version allows connection to new and old clusters - i.e. clients must not use newer version at the moment the procedure starts.
  3.  Geode guarantees cluster configuration exported from newer system can be imported into older system. In case of incompatibility I expect we could even manually edit the configuration to adapt it to the older system but it is a question how new servers will react when they connect (in step 2b).
  4.  Geode guarantees communication between peers with different SW version works and recovery of region data works.

Could we have opinions on this offline procedure? It seems to work well but probably has caveats we do not see at the moment.

What about prerequisites 3 and 4? It is valid in upgrade case but not sure if it holds in this rollback case.

Best regards,

-Alberto G.

________________________________
From: Anilkumar Gingade <ag...@pivotal.io>
Sent: Thursday, April 23, 2020 12:59 AM
To: geode <de...@geode.apache.org>
Subject: Re: About Geode rolling downgrade

That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.

On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker <ab...@pivotal.io> wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade <ag...@pivotal.io>
> wrote:
> >
> >>> Rolling downgrade is a pretty important requirement for our customers
> >>> I'd love to hear what others think about whether this feature is worth
> > the overhead of making sure downgrades can always work.
> >
> > I/We haven't seen users/customers requesting rolling downgrade as a
> > critical requirement for them; most of the time they had both an old and
> > new setup to upgrade or switch back to an older setup.
> > Considering the amount of work involved, and code complexity it brings
> in;
> > while there are ways to downgrade, it is hard to justify supporting this
> > feature.
> >
> > -Anil.
>
>

Re: About Geode rolling downgrade

Posted by Alberto Gomez <al...@est.tech>.

Hi Naba!

Did you manage to comment this topic with some engineers?

Cheers,

/Alberto G.
________________________________
From: Nabarun Nag <nn...@vmware.com>
Sent: Friday, June 5, 2020 11:00 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: About Geode rolling downgrade

Hi Mario and Alberto,

I will sync up with couple of engineers get you a feedback within a couple of days.

@Barry , Jason and I were discussing once, can your idea of WAN GII achieve the downgrade. Like create a DS with old versions and let it do a GII from the newer version cluster and then shutdown the new version DS. Now we have a DS with lower version.

Regards
Naba

________________________________
From: Mario Ivanac <ma...@est.tech>
Sent: Friday, June 5, 2020 1:19:42 AM
To: geode <de...@geode.apache.org>
Subject: Odg: About Geode rolling downgrade

Hi all,

just a reminder that Alberto is still waiting for feedback,
regarding his question.

BR,
Mario
________________________________
Šalje: Alberto Gomez <al...@est.tech>
Poslano: 14. svibnja 2020. 14:45
Prima: geode <de...@geode.apache.org>
Predmet: Re: About Geode rolling downgrade

Hi,

I friendly reminder to the community about this request for feedback.

Thanks,

-Alberto G.
________________________________
From: Alberto Gomez <al...@est.tech>
Sent: Thursday, May 7, 2020 10:44 AM
To: geode <de...@geode.apache.org>
Subject: Re: About Geode rolling downgrade

Hi again,

Considering Geode does not support online rollback for the time being and since we have the need to rollback even a standalone system, we were thinking on a procedure to downgrade Geode cluster tolerating downtime, but without a need to:

  *   spin another cluster to sync from,
  *   do a restore or
  *   import data snapshot.

The procedure we came up with is:

  1.  First step - downgrade locators:

     *   While still on the newer version, export cluster configuration.
     *   Shutdown all locators. Existing clients will continue using their server connections. New clients/connections are not possible.
     *   Start new locators using the old SW version and import cluster configuration. They will form a new cluster. Existing client connections should still work, but new client connections are not yet possible (no servers connected to locators).

  1.  Second step – downgrade servers:

     *   First shutdown all servers in parallel. This marks the beginning of total downtime.
     *   Now start all servers in parallel but still on the new software version. Servers connect to the cluster formed by the downgraded locators. When servers are up, downtime ends. New client connections are possible. The rest of the rollback should be fully online.
     *   Now per server:

                                                               i.      Shutdown it, revoke its disk-stores and delete its file system.

                                                             ii.      Start server using old SW version. When up, server will take over cluster configuration and pick up replicated data and partitioned regions buckets satisfying region redundancy (essentially will hold exactly the same data previous server had).

The above has some important prerequisites:

  1.  Partitioned regions have redundancy and region configuration allows recovery as described above.
  2.  Clients version allows connection to new and old clusters - i.e. clients must not use newer version at the moment the procedure starts.
  3.  Geode guarantees cluster configuration exported from newer system can be imported into older system. In case of incompatibility I expect we could even manually edit the configuration to adapt it to the older system but it is a question how new servers will react when they connect (in step 2b).
  4.  Geode guarantees communication between peers with different SW version works and recovery of region data works.

Could we have opinions on this offline procedure? It seems to work well but probably has caveats we do not see at the moment.

What about prerequisites 3 and 4? It is valid in upgrade case but not sure if it holds in this rollback case.

Best regards,

-Alberto G.

________________________________
From: Anilkumar Gingade <ag...@pivotal.io>
Sent: Thursday, April 23, 2020 12:59 AM
To: geode <de...@geode.apache.org>
Subject: Re: About Geode rolling downgrade

That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.

On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker <ab...@pivotal.io> wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade <ag...@pivotal.io>
> wrote:
> >
> >>> Rolling downgrade is a pretty important requirement for our customers
> >>> I'd love to hear what others think about whether this feature is worth
> > the overhead of making sure downgrades can always work.
> >
> > I/We haven't seen users/customers requesting rolling downgrade as a
> > critical requirement for them; most of the time they had both an old and
> > new setup to upgrade or switch back to an older setup.
> > Considering the amount of work involved, and code complexity it brings
> in;
> > while there are ways to downgrade, it is hard to justify supporting this
> > feature.
> >
> > -Anil.
>
>

Re: About Geode rolling downgrade

Posted by Nabarun Nag <nn...@vmware.com>.

Hi Mario and Alberto,

I will sync up with couple of engineers get you a feedback within a couple of days.

@Barry , Jason and I were discussing once, can your idea of WAN GII achieve the downgrade. Like create a DS with old versions and let it do a GII from the newer version cluster and then shutdown the new version DS. Now we have a DS with lower version.

Regards
Naba

________________________________
From: Mario Ivanac <ma...@est.tech>
Sent: Friday, June 5, 2020 1:19:42 AM
To: geode <de...@geode.apache.org>
Subject: Odg: About Geode rolling downgrade

Hi all,

just a reminder that Alberto is still waiting for feedback,
regarding his question.

BR,
Mario
________________________________
Šalje: Alberto Gomez <al...@est.tech>
Poslano: 14. svibnja 2020. 14:45
Prima: geode <de...@geode.apache.org>
Predmet: Re: About Geode rolling downgrade

Hi,

I friendly reminder to the community about this request for feedback.

Thanks,

-Alberto G.
________________________________
From: Alberto Gomez <al...@est.tech>
Sent: Thursday, May 7, 2020 10:44 AM
To: geode <de...@geode.apache.org>
Subject: Re: About Geode rolling downgrade

Hi again,

Considering Geode does not support online rollback for the time being and since we have the need to rollback even a standalone system, we were thinking on a procedure to downgrade Geode cluster tolerating downtime, but without a need to:

  *   spin another cluster to sync from,
  *   do a restore or
  *   import data snapshot.

The procedure we came up with is:

  1.  First step - downgrade locators:

     *   While still on the newer version, export cluster configuration.
     *   Shutdown all locators. Existing clients will continue using their server connections. New clients/connections are not possible.
     *   Start new locators using the old SW version and import cluster configuration. They will form a new cluster. Existing client connections should still work, but new client connections are not yet possible (no servers connected to locators).

  1.  Second step – downgrade servers:

     *   First shutdown all servers in parallel. This marks the beginning of total downtime.
     *   Now start all servers in parallel but still on the new software version. Servers connect to the cluster formed by the downgraded locators. When servers are up, downtime ends. New client connections are possible. The rest of the rollback should be fully online.
     *   Now per server:

                                                               i.      Shutdown it, revoke its disk-stores and delete its file system.

                                                             ii.      Start server using old SW version. When up, server will take over cluster configuration and pick up replicated data and partitioned regions buckets satisfying region redundancy (essentially will hold exactly the same data previous server had).

The above has some important prerequisites:

  1.  Partitioned regions have redundancy and region configuration allows recovery as described above.
  2.  Clients version allows connection to new and old clusters - i.e. clients must not use newer version at the moment the procedure starts.
  3.  Geode guarantees cluster configuration exported from newer system can be imported into older system. In case of incompatibility I expect we could even manually edit the configuration to adapt it to the older system but it is a question how new servers will react when they connect (in step 2b).
  4.  Geode guarantees communication between peers with different SW version works and recovery of region data works.

Could we have opinions on this offline procedure? It seems to work well but probably has caveats we do not see at the moment.

What about prerequisites 3 and 4? It is valid in upgrade case but not sure if it holds in this rollback case.

Best regards,

-Alberto G.

________________________________
From: Anilkumar Gingade <ag...@pivotal.io>
Sent: Thursday, April 23, 2020 12:59 AM
To: geode <de...@geode.apache.org>
Subject: Re: About Geode rolling downgrade

That's right, most/always no down-time requirement is managed by having
replicated cluster setups (Disaster-recovery/backup site). The data is
either pushed to both systems through the data ingesters or by using WAN
setup.
The clusters are upgraded one at a time. If there is a failure during
upgrade or needs to be rolled back; one system will be always up
and running.

-Anil.

On Wed, Apr 22, 2020 at 1:51 PM Anthony Baker <ab...@pivotal.io> wrote:

> Anil, let me see if I understand your perspective by stating it this way:
>
> If cases where 100% uptime is a requirement, users are almost always
> running a disaster recovery site.  It could be active/active or
> active/standby but there are already at least 2 clusters with current
> copies of the data.  If an upgrade goes badly, the clusters can be
> downgraded one at a time without loss of availability.  This is because we
> ensure compatibility across the wan protocol.
>
> Is that correct?
>
>
> Anthony
>
>
>
> > On Apr 22, 2020, at 10:43 AM, Anilkumar Gingade <ag...@pivotal.io>
> wrote:
> >
> >>> Rolling downgrade is a pretty important requirement for our customers
> >>> I'd love to hear what others think about whether this feature is worth
> > the overhead of making sure downgrades can always work.
> >
> > I/We haven't seen users/customers requesting rolling downgrade as a
> > critical requirement for them; most of the time they had both an old and
> > new setup to upgrade or switch back to an older setup.
> > Considering the amount of work involved, and code complexity it brings
> in;
> > while there are ways to downgrade, it is hard to justify supporting this
> > feature.
> >
> > -Anil.
>
>