You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by "Greene (US), Geoffrey N" <ge...@boeing.com> on 2021/01/21 19:57:48 UTC

CM/CI best practices

I'm trying to figure out some CI/CM best practices. I want to be able to design a flow, test the flow on some test data, then distribute that exact same configuration (definitely flows, probably services, and so on) into production. I may have multiple engineers working in this environment, and I want to be able to store my files in a repository, and be able to do standard git merge/branches etc. Of course, you don't want your branch to merge to master if it hasn't passed test. I have already scripted some simple python tests that can start nifi, start a flow, and verify output, so I know that CI CAN work.
I may choose to go to a clustered solution, too, so I'd want to be able to spin up additional cluster nodes if needed.

So what is the recommended way to do this? Here are some of the options I've come up with:

1) Have a dedicated nifi instance, and only CM the flows (using nifi repository). If I understand this correctly, this means that the configuration of nifi itself, would not be CM'd. Im not clear on how services would be handled, if a new flow requires an internal service. I don't like this much, since it doesn't seem terribly repeatable, but maybe its how the overall system is designed.

2) Configuration control EVERY file. This means that as the database changes while authoring a flow, new commits from a developer would be required. This seems troublesome, though, as merges would be difficult, and it would be difficult to actually tell what changed. Hopefully no flowfiles would go into the repo.

3) Configuration control SOME of the files (though not all of them) in the nifi directory structure. I'm not clear on which ones though. Maybe whole directories? A guide would be helpful.

4) Have one git repository housing the nifi repository (the flows). Have another repository that houses the nifi software. The repo containing the flows would be updated frequently, the one containing the flows would NOT be updated as frequently.

5) Don't do CM at all. It can't be done. Rely on backups only.

I'm still struggling with how to maintain some of the custom groovy scripts I've written too that are kept on disk.

In any event how do others do this? Are there any wikis/articles on this?

Thanks for your thoughts
-geoff

Re: CM/CI best practices

Posted by Mike Thomsen <mi...@gmail.com>.

Geoff,

Here's a blog post of mine that shows how to do unit testing against
Groovy scripts that you run in your flows:

https://mikethomsen.github.io/posts/2020/11/08/testing-executescript-modules-with-the-nifi-test-framework/

As far as repositories goes, the NiFi Registry is the best route for
doing CM work while also being able to easily transition between
environments.

My team just uses Chef to do repeatable deployments of NiFi and the
Registry to move between environments. We don't do automated testing.

Mike

On Thu, Jan 21, 2021 at 2:58 PM Greene (US), Geoffrey N
<ge...@boeing.com> wrote:
>
> I’m trying to figure out some CI/CM best practices.  I want to be able to design a flow, test the flow on some test data, then distribute that exact same configuration (definitely flows, probably services, and so on)  into production.  I may have multiple engineers working in this environment, and I want to be able to store my files in a repository, and be able to do standard git merge/branches etc.  Of course, you don’t want your branch to merge to master if it hasn’t passed test.  I have already scripted some simple python tests that can start nifi, start a flow, and verify output, so I know that CI CAN work.
>
> I may choose to go to a clustered solution, too, so I’d want to be able to spin up additional cluster nodes if needed.
>
>
>
> So what is the recommended way to do this?   Here are some of the options I’ve come up with:
>
>
>
> 1)      Have a dedicated nifi instance, and only CM the flows (using nifi repository).  If I understand this correctly, this means that the configuration of nifi itself, would not be CM’d. Im not clear on how services would be handled, if a new flow requires an internal service.  I don’t like this much, since it doesn’t seem terribly repeatable, but maybe its how the overall system is designed.
>
> 2)      Configuration control EVERY file.  This means that as the database changes while authoring a flow, new commits from a developer would be required.  This seems troublesome, though, as merges would be difficult, and it would be difficult to actually tell what changed.  Hopefully no flowfiles would go into the repo.
>
> 3)      Configuration control SOME of the files (though not all of them) in the nifi directory structure.  I’m not clear on which ones though.  Maybe whole directories?  A guide would be helpful.
>
> 4)      Have one git repository housing the nifi repository (the flows).  Have another repository that houses the nifi software.  The repo containing the flows would be updated frequently, the one containing the flows would NOT be updated as frequently.
>
> 5)      Don’t do CM at all.  It can’t be done.  Rely on backups only.
>
>
>
> I’m still struggling with how to maintain some of the custom groovy scripts I’ve written too that are kept on disk.
>
>
>
> In any event how do others do this?  Are there any wikis/articles on this?
>
>
>
> Thanks for your thoughts
>
> -geoff