You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by "Richard St. John" <rs...@gmail.com> on 2017/05/26 18:15:38 UTC

Zero-Downtime Deployments with Component Versioning

Dear dev,

I recently upgraded to NiFi version 1.2.0. In doing so, I lost the ability
to perform rolling upgrades on my 6-node cluster due to component
versioning, i.e. MissingBundleExceptions.  My previous deployment steps are
listed below.  My question for the devs is this: what is the best process
perform a rolling upgrade, with zero downtime, when custom nars, or NiFi
releases for that matter, need to be deployed.

*Rolling deployment procedure with NiFi 1.1.x:*
1. Package custom nars
2. Run ansible script to copy nars to each node, and set a symlink,
custom-nars-current, to point to the new folder containing the newly built
custom nars.  Note: the previous version of the nars are still on disk, but
the nifi.properties points to the symlink.
3. Restart nifi service one node at a time.
-- 

----------------------------
Richard St. John, Ph.D.
Senior Software Engineer, Applied Mathematician
Asymmetrik, Ltd.

Re: Zero-Downtime Deployments with Component Versioning

Posted by "Richard St. John" <rs...@gmail.com>.
Bryan,

Our current NiFi cluster has a complex flow with dozens of unique custom
NARs.  It would take hours of painstaking patience to identify every
location where a custom component needs upgrading.  It would be helpful to
list the version in the summary table.  At least that way, I could identify
and navigate to the components that need to be manually updated.

Prior to 1.2.0, we were able to version our NARs and perform a rolling
redeploy without issue, as long as the updates were backward compatible.
In fact, most of the processors are unchanged during a deployment, but
since there are packaged together, the versions for all of the processors
would have to be changed manually. More critically, any downtime of our
flow would be extremely harmful for our downstream customers and has to be
scheduled days in advance.

I fully understand why multiple versions of the same component is a great
thing.  Personally, I feel the benefits outweigh the cons.  However, by not
making it straightforward to upgrade in place, I am force to either:

   1. Stop using versioning for our custom NARs, or leave the version
   unchanged, thus eliminating the benefits of versioning and making it
   difficult to manage the releases of our custom NARs.
   2. Create a script to invoke the REST API to walk every processor, and
   if its running, stop it, then change the version via REST call and start it
   back up via REST call.

Is there a configuration setting in the nifi-nar-plugin that sets the nar
to a specific version?

Rick.

On Fri, May 26, 2017 at 3:00 PM Bryan Bende <bb...@gmail.com> wrote:

> Hi Richard,
>
> I believe this is currently expected behavior, meaning that rolling
> deployments are not supported when changes to NARs are being made.
>
> There are two  main issues...
>
> The first is that if nodes are running different versions of the
> framework NAR, they could potentially be running incompatible versions
> of the REST API, which means if someone uses the UI while in this
> state, request replication could fail. This would be an issue even
> before 1.2.0, although I'm guessing you aren't using the UI during
> these deployments, or just not running into a case where the API was
> actually incompatible.
>
> The second is that we currently can't tell the difference between one
> of the nodes coming up with a new version of a custom NAR with the
> intent that you are about to upgrade the other nodes vs. one of the
> nodes coming up with a different version of a custom NAR because it
> was incorrectly deployed. As an example, imagine you have a 3 node
> cluster running v1 of your custom NAR, you stop everything intending
> to deploy v2 to all nodes, but you forget node3, and you start
> everything up. If we let all these nodes join the cluster, the user
> may have no idea that node3 is running the old code and possibly doing
> the wrong thing.
>
> All that being said, I do think we should figure out what can be done
> to support these types of rolling deployments, but I believe it will
> require a bit of design and work to make it happen.
>
> One idea that might be an option is to always deploy the new version
> of your custom NARs along side the existing version, this way the
> rolling restarts would work and your flow would remain running with
> the previous version, then upgrade the components in place through the
> UI. Admittedly this would currently be a bit tedious, but we could
> possibly look at ways of performing bulk upgrades.
>
> -Bryan
>
> On Fri, May 26, 2017 at 2:15 PM, Richard St. John <rs...@gmail.com>
> wrote:
> > Dear dev,
> >
> > I recently upgraded to NiFi version 1.2.0. In doing so, I lost the
> ability
> > to perform rolling upgrades on my 6-node cluster due to component
> > versioning, i.e. MissingBundleExceptions.  My previous deployment steps
> are
> > listed below.  My question for the devs is this: what is the best process
> > perform a rolling upgrade, with zero downtime, when custom nars, or NiFi
> > releases for that matter, need to be deployed.
> >
> > *Rolling deployment procedure with NiFi 1.1.x:*
> > 1. Package custom nars
> > 2. Run ansible script to copy nars to each node, and set a symlink,
> > custom-nars-current, to point to the new folder containing the newly
> built
> > custom nars.  Note: the previous version of the nars are still on disk,
> but
> > the nifi.properties points to the symlink.
> > 3. Restart nifi service one node at a time.
> > --
> >
> > ----------------------------
> > Richard St. John, Ph.D.
> > Senior Software Engineer, Applied Mathematician
> > Asymmetrik, Ltd.
>
-- 

----------------------------
Richard St. John, Ph.D.
Senior Software Engineer, Applied Mathematician
Asymmetrik, Ltd.

Re: Zero-Downtime Deployments with Component Versioning

Posted by Bryan Bende <bb...@gmail.com>.
Hi Richard,

I believe this is currently expected behavior, meaning that rolling
deployments are not supported when changes to NARs are being made.

There are two  main issues...

The first is that if nodes are running different versions of the
framework NAR, they could potentially be running incompatible versions
of the REST API, which means if someone uses the UI while in this
state, request replication could fail. This would be an issue even
before 1.2.0, although I'm guessing you aren't using the UI during
these deployments, or just not running into a case where the API was
actually incompatible.

The second is that we currently can't tell the difference between one
of the nodes coming up with a new version of a custom NAR with the
intent that you are about to upgrade the other nodes vs. one of the
nodes coming up with a different version of a custom NAR because it
was incorrectly deployed. As an example, imagine you have a 3 node
cluster running v1 of your custom NAR, you stop everything intending
to deploy v2 to all nodes, but you forget node3, and you start
everything up. If we let all these nodes join the cluster, the user
may have no idea that node3 is running the old code and possibly doing
the wrong thing.

All that being said, I do think we should figure out what can be done
to support these types of rolling deployments, but I believe it will
require a bit of design and work to make it happen.

One idea that might be an option is to always deploy the new version
of your custom NARs along side the existing version, this way the
rolling restarts would work and your flow would remain running with
the previous version, then upgrade the components in place through the
UI. Admittedly this would currently be a bit tedious, but we could
possibly look at ways of performing bulk upgrades.

-Bryan

On Fri, May 26, 2017 at 2:15 PM, Richard St. John <rs...@gmail.com> wrote:
> Dear dev,
>
> I recently upgraded to NiFi version 1.2.0. In doing so, I lost the ability
> to perform rolling upgrades on my 6-node cluster due to component
> versioning, i.e. MissingBundleExceptions.  My previous deployment steps are
> listed below.  My question for the devs is this: what is the best process
> perform a rolling upgrade, with zero downtime, when custom nars, or NiFi
> releases for that matter, need to be deployed.
>
> *Rolling deployment procedure with NiFi 1.1.x:*
> 1. Package custom nars
> 2. Run ansible script to copy nars to each node, and set a symlink,
> custom-nars-current, to point to the new folder containing the newly built
> custom nars.  Note: the previous version of the nars are still on disk, but
> the nifi.properties points to the symlink.
> 3. Restart nifi service one node at a time.
> --
>
> ----------------------------
> Richard St. John, Ph.D.
> Senior Software Engineer, Applied Mathematician
> Asymmetrik, Ltd.