You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jeff - Data Bean Australia <da...@gmail.com> on 2016/02/17 21:08:44 UTC

Version Control on NiFi flow.xml

Hi,

As my NiFi data flow becomes more and more serious, I need to put on
Version Control. Since flow.xml.gz is generated automatically and it is
saved in a compressed file, I am wondering what would be the best practice
regarding version control?

Thanks,
Jeff

-- 
Data Bean - A Big Data Solution Provider in Australia.

Re: Version Control on NiFi flow.xml

Posted by Joe Witt <jo...@gmail.com>.
Vincent,

Yeah you're hitting the nail on the head from what we're hearing more
and more.  We have a couple really nice roadmap items to make these
work more like you're doing now.

Thanks
Joe

On Wed, Feb 17, 2016 at 5:27 PM, Vincent Russell
<vi...@gmail.com> wrote:
> My team has played around with versioning control with the nifi in the
> following way (we have yet to use this for deployments yet though):
>
> We version control the flow.xml file and all of the config files that need
> to be changed
> We build a distribution of nifi, gziping the flow.xml and string-replacing
> properties in the config files with maven
> We then can install this "version" of our nifi app.
>
> We want to be able to use this to test our flows and processes on our test
> system before making it live in production.  But like I said he have yet to
> actually use this for production deployments.
>
> On Wed, Feb 17, 2016 at 7:21 PM, Jeff - Data Bean Australia
> <da...@gmail.com> wrote:
>>
>> Thanks Matt for describing the feature in such an intuitive way, and
>> pointing out the location for the archive.
>>
>> This looks good. Just wondering whether we also want to archive the
>> templates along with flow.xml.gz.
>>
>> Thanks,
>> Jeff
>>
>> On Thu, Feb 18, 2016 at 11:08 AM, Matthew Clarke
>> <ma...@gmail.com> wrote:
>>>
>>> Jeff,
>>>       NiFi gives users the ability to create snapshot backups of their
>>> flow.xml through the "back-up flow" link found under the "controller
>>> settings" (Icon looks like wrench and screwdriver in upper right corner).
>>> The default nifi.properties configuration will write these back-ups to a
>>> directory called archive inside teh <nifi-root-install>/conf directory, but
>>> you can of course change were they are written.
>>>
>>> Matt
>>>
>>> On Wed, Feb 17, 2016 at 4:52 PM, Jeff - Data Bean Australia
>>> <da...@gmail.com> wrote:
>>>>
>>>> Thanks Oleg for sharing this. They are definitely useful.
>>>>
>>>> By my question focused more on keeping the data flow definition files'
>>>> versions, so that Data Flow Developers, or NiFi Cluster Manager in NiFi's
>>>> term can keep track of our work.
>>>>
>>>> Currently I am using the following command line to generate a formatted
>>>> XML to put it into our Git repository:
>>>>
>>>> cat conf/flow.xml.gz | gzip -dc | xmllint --format -
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Feb 18, 2016 at 10:01 AM, Oleg Zhurakousky
>>>> <oz...@hortonworks.com> wrote:
>>>>>
>>>>> Jeff, what you are describing is in works and actively discussed
>>>>> https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
>>>>> and
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+improvements
>>>>>
>>>>> The last one may not directly speaks to the “ExtensionRegistry”, but if
>>>>> you look through he comments there is a whole lot about it since it is
>>>>> dependent.
>>>>> Feel free to participate, but I can say for now that it is slated for
>>>>> 1.0 release.
>>>>>
>>>>> Cheers
>>>>> Oleg
>>>>>
>>>>> On Feb 17, 2016, at 3:08 PM, Jeff - Data Bean Australia
>>>>> <da...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> As my NiFi data flow becomes more and more serious, I need to put on
>>>>> Version Control. Since flow.xml.gz is generated automatically and it is
>>>>> saved in a compressed file, I am wondering what would be the best practice
>>>>> regarding version control?
>>>>>
>>>>> Thanks,
>>>>> Jeff
>>>>>
>>>>> --
>>>>> Data Bean - A Big Data Solution Provider in Australia.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Data Bean - A Big Data Solution Provider in Australia.
>>>
>>>
>>
>>
>>
>> --
>> Data Bean - A Big Data Solution Provider in Australia.
>
>

Re: Version Control on NiFi flow.xml

Posted by Vincent Russell <vi...@gmail.com>.
My team has played around with versioning control with the nifi in the
following way (we have yet to use this for deployments yet though):


   - We version control the flow.xml file and all of the config files that
   need to be changed
   - We build a distribution of nifi, gziping the flow.xml and
   string-replacing properties in the config files with maven
   - We then can install this "version" of our nifi app.

We want to be able to use this to test our flows and processes on our test
system before making it live in production.  But like I said he have yet to
actually use this for production deployments.

On Wed, Feb 17, 2016 at 7:21 PM, Jeff - Data Bean Australia <
databean.au@gmail.com> wrote:

> Thanks Matt for describing the feature in such an intuitive way, and
> pointing out the location for the archive.
>
> This looks good. Just wondering whether we also want to archive the
> templates along with flow.xml.gz.
>
> Thanks,
> Jeff
>
> On Thu, Feb 18, 2016 at 11:08 AM, Matthew Clarke <
> matt.clarke.138@gmail.com> wrote:
>
>> Jeff,
>>       NiFi gives users the ability to create snapshot backups of their
>> flow.xml through the "back-up flow" link found under the "controller
>> settings" (Icon looks like wrench and screwdriver in upper right corner).
>> The default nifi.properties configuration will write these back-ups to a
>> directory called archive inside teh <nifi-root-install>/conf directory, but
>> you can of course change were they are written.
>>
>> Matt
>>
>> On Wed, Feb 17, 2016 at 4:52 PM, Jeff - Data Bean Australia <
>> databean.au@gmail.com> wrote:
>>
>>> Thanks Oleg for sharing this. They are definitely useful.
>>>
>>> By my question focused more on keeping the data flow definition files'
>>> versions, so that Data Flow Developers, or NiFi Cluster Manager in NiFi's
>>> term can keep track of our work.
>>>
>>> Currently I am using the following command line to generate a formatted
>>> XML to put it into our Git repository:
>>>
>>> cat conf/flow.xml.gz | gzip -dc | xmllint --format -
>>>
>>>
>>>
>>>
>>> On Thu, Feb 18, 2016 at 10:01 AM, Oleg Zhurakousky <
>>> ozhurakousky@hortonworks.com> wrote:
>>>
>>>> Jeff, what you are describing is in works and actively discussed
>>>> https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
>>>> and
>>>>
>>>> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+improvements
>>>>
>>>> The last one may not directly speaks to the “ExtensionRegistry”, but if
>>>> you look through he comments there is a whole lot about it since it is
>>>> dependent.
>>>> Feel free to participate, but I can say for now that it is slated for
>>>> 1.0 release.
>>>>
>>>> Cheers
>>>> Oleg
>>>>
>>>> On Feb 17, 2016, at 3:08 PM, Jeff - Data Bean Australia <
>>>> databean.au@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> As my NiFi data flow becomes more and more serious, I need to put on
>>>> Version Control. Since flow.xml.gz is generated automatically and it is
>>>> saved in a compressed file, I am wondering what would be the best practice
>>>> regarding version control?
>>>>
>>>> Thanks,
>>>> Jeff
>>>>
>>>> --
>>>> Data Bean - A Big Data Solution Provider in Australia.
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Data Bean - A Big Data Solution Provider in Australia.
>>>
>>
>>
>
>
> --
> Data Bean - A Big Data Solution Provider in Australia.
>

Re: Version Control on NiFi flow.xml

Posted by Joe Witt <jo...@gmail.com>.
Jeff,

"do we have some tool to compare two flow.xml.gz for some subtle changes?"

Unfortunately no.  That is what Oleg was referring to.  We're finding
an increasing number of people that are interested in this sort of
Git/Diff capability so we def need to get some momentum on it.

Making ordering deterministic for the flow and templates should be
pretty doable.  We already have feature proposal/JIRA to go after
this.

Thanks
Joe

On Wed, Feb 17, 2016 at 5:21 PM, Jeff - Data Bean Australia
<da...@gmail.com> wrote:
> Thanks Matt for describing the feature in such an intuitive way, and
> pointing out the location for the archive.
>
> This looks good. Just wondering whether we also want to archive the
> templates along with flow.xml.gz.
>
> Thanks,
> Jeff
>
> On Thu, Feb 18, 2016 at 11:08 AM, Matthew Clarke <ma...@gmail.com>
> wrote:
>>
>> Jeff,
>>       NiFi gives users the ability to create snapshot backups of their
>> flow.xml through the "back-up flow" link found under the "controller
>> settings" (Icon looks like wrench and screwdriver in upper right corner).
>> The default nifi.properties configuration will write these back-ups to a
>> directory called archive inside teh <nifi-root-install>/conf directory, but
>> you can of course change were they are written.
>>
>> Matt
>>
>> On Wed, Feb 17, 2016 at 4:52 PM, Jeff - Data Bean Australia
>> <da...@gmail.com> wrote:
>>>
>>> Thanks Oleg for sharing this. They are definitely useful.
>>>
>>> By my question focused more on keeping the data flow definition files'
>>> versions, so that Data Flow Developers, or NiFi Cluster Manager in NiFi's
>>> term can keep track of our work.
>>>
>>> Currently I am using the following command line to generate a formatted
>>> XML to put it into our Git repository:
>>>
>>> cat conf/flow.xml.gz | gzip -dc | xmllint --format -
>>>
>>>
>>>
>>>
>>> On Thu, Feb 18, 2016 at 10:01 AM, Oleg Zhurakousky
>>> <oz...@hortonworks.com> wrote:
>>>>
>>>> Jeff, what you are describing is in works and actively discussed
>>>> https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
>>>> and
>>>>
>>>> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+improvements
>>>>
>>>> The last one may not directly speaks to the “ExtensionRegistry”, but if
>>>> you look through he comments there is a whole lot about it since it is
>>>> dependent.
>>>> Feel free to participate, but I can say for now that it is slated for
>>>> 1.0 release.
>>>>
>>>> Cheers
>>>> Oleg
>>>>
>>>> On Feb 17, 2016, at 3:08 PM, Jeff - Data Bean Australia
>>>> <da...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> As my NiFi data flow becomes more and more serious, I need to put on
>>>> Version Control. Since flow.xml.gz is generated automatically and it is
>>>> saved in a compressed file, I am wondering what would be the best practice
>>>> regarding version control?
>>>>
>>>> Thanks,
>>>> Jeff
>>>>
>>>> --
>>>> Data Bean - A Big Data Solution Provider in Australia.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Data Bean - A Big Data Solution Provider in Australia.
>>
>>
>
>
>
> --
> Data Bean - A Big Data Solution Provider in Australia.

Re: Version Control on NiFi flow.xml

Posted by Jeff - Data Bean Australia <da...@gmail.com>.
Thanks Matt for describing the feature in such an intuitive way, and
pointing out the location for the archive.

This looks good. Just wondering whether we also want to archive the
templates along with flow.xml.gz.

Thanks,
Jeff

On Thu, Feb 18, 2016 at 11:08 AM, Matthew Clarke <ma...@gmail.com>
wrote:

> Jeff,
>       NiFi gives users the ability to create snapshot backups of their
> flow.xml through the "back-up flow" link found under the "controller
> settings" (Icon looks like wrench and screwdriver in upper right corner).
> The default nifi.properties configuration will write these back-ups to a
> directory called archive inside teh <nifi-root-install>/conf directory, but
> you can of course change were they are written.
>
> Matt
>
> On Wed, Feb 17, 2016 at 4:52 PM, Jeff - Data Bean Australia <
> databean.au@gmail.com> wrote:
>
>> Thanks Oleg for sharing this. They are definitely useful.
>>
>> By my question focused more on keeping the data flow definition files'
>> versions, so that Data Flow Developers, or NiFi Cluster Manager in NiFi's
>> term can keep track of our work.
>>
>> Currently I am using the following command line to generate a formatted
>> XML to put it into our Git repository:
>>
>> cat conf/flow.xml.gz | gzip -dc | xmllint --format -
>>
>>
>>
>>
>> On Thu, Feb 18, 2016 at 10:01 AM, Oleg Zhurakousky <
>> ozhurakousky@hortonworks.com> wrote:
>>
>>> Jeff, what you are describing is in works and actively discussed
>>> https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
>>> and
>>>
>>> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+improvements
>>>
>>> The last one may not directly speaks to the “ExtensionRegistry”, but if
>>> you look through he comments there is a whole lot about it since it is
>>> dependent.
>>> Feel free to participate, but I can say for now that it is slated for
>>> 1.0 release.
>>>
>>> Cheers
>>> Oleg
>>>
>>> On Feb 17, 2016, at 3:08 PM, Jeff - Data Bean Australia <
>>> databean.au@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> As my NiFi data flow becomes more and more serious, I need to put on
>>> Version Control. Since flow.xml.gz is generated automatically and it is
>>> saved in a compressed file, I am wondering what would be the best practice
>>> regarding version control?
>>>
>>> Thanks,
>>> Jeff
>>>
>>> --
>>> Data Bean - A Big Data Solution Provider in Australia.
>>>
>>>
>>>
>>
>>
>> --
>> Data Bean - A Big Data Solution Provider in Australia.
>>
>
>


-- 
Data Bean - A Big Data Solution Provider in Australia.

Re: Version Control on NiFi flow.xml

Posted by Matthew Clarke <ma...@gmail.com>.
Jeff,
      NiFi gives users the ability to create snapshot backups of their
flow.xml through the "back-up flow" link found under the "controller
settings" (Icon looks like wrench and screwdriver in upper right corner).
The default nifi.properties configuration will write these back-ups to a
directory called archive inside teh <nifi-root-install>/conf directory, but
you can of course change were they are written.

Matt

On Wed, Feb 17, 2016 at 4:52 PM, Jeff - Data Bean Australia <
databean.au@gmail.com> wrote:

> Thanks Oleg for sharing this. They are definitely useful.
>
> By my question focused more on keeping the data flow definition files'
> versions, so that Data Flow Developers, or NiFi Cluster Manager in NiFi's
> term can keep track of our work.
>
> Currently I am using the following command line to generate a formatted
> XML to put it into our Git repository:
>
> cat conf/flow.xml.gz | gzip -dc | xmllint --format -
>
>
>
>
> On Thu, Feb 18, 2016 at 10:01 AM, Oleg Zhurakousky <
> ozhurakousky@hortonworks.com> wrote:
>
>> Jeff, what you are describing is in works and actively discussed
>> https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
>> and
>>
>> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+improvements
>>
>> The last one may not directly speaks to the “ExtensionRegistry”, but if
>> you look through he comments there is a whole lot about it since it is
>> dependent.
>> Feel free to participate, but I can say for now that it is slated for 1.0
>> release.
>>
>> Cheers
>> Oleg
>>
>> On Feb 17, 2016, at 3:08 PM, Jeff - Data Bean Australia <
>> databean.au@gmail.com> wrote:
>>
>> Hi,
>>
>> As my NiFi data flow becomes more and more serious, I need to put on
>> Version Control. Since flow.xml.gz is generated automatically and it is
>> saved in a compressed file, I am wondering what would be the best practice
>> regarding version control?
>>
>> Thanks,
>> Jeff
>>
>> --
>> Data Bean - A Big Data Solution Provider in Australia.
>>
>>
>>
>
>
> --
> Data Bean - A Big Data Solution Provider in Australia.
>

Re: Version Control on NiFi flow.xml

Posted by Jeff - Data Bean Australia <da...@gmail.com>.
Thanks Joe for pointing out the order issue. Given that, I need to
reconsider my approach, because the original thought was to help
facilitating existing version control tools, such as Git, and compare
different versions on the fly. Given the order issue, this approach doesn't
make more sense than simply store the gz file.

In this case, do we have some tool to compare two flow.xml.gz for some
subtle changes? I am sure the UI based auditing is helpful though.

On Thu, Feb 18, 2016 at 11:07 AM, Joe Witt <jo...@gmail.com> wrote:

> Jeff
>
> I think what you're doing is just fine for now.  To Oleg's point we
> should make it better.
>
> We do also have a database where each flow change is being written to
> from a audit perspective and so we can show in the UI who made what
> changes last.  That is less about true CM and more about providing a
> meaningful user experience.
>
> The biggest knock for CM of our current flow.xml.gz and for the
> templates is that the order in which their components are serialized
> is not presently guaranteed so it means diff won't be meaningful.  But
> as far as capturing at specific intervals and storing the flow you
> should be in good shape with your approach.
>
> Thanks
> Joe
>
> On Wed, Feb 17, 2016 at 4:52 PM, Jeff - Data Bean Australia
> <da...@gmail.com> wrote:
> > Thanks Oleg for sharing this. They are definitely useful.
> >
> > By my question focused more on keeping the data flow definition files'
> > versions, so that Data Flow Developers, or NiFi Cluster Manager in NiFi's
> > term can keep track of our work.
> >
> > Currently I am using the following command line to generate a formatted
> XML
> > to put it into our Git repository:
> >
> > cat conf/flow.xml.gz | gzip -dc | xmllint --format -
> >
> >
> >
> >
> > On Thu, Feb 18, 2016 at 10:01 AM, Oleg Zhurakousky
> > <oz...@hortonworks.com> wrote:
> >>
> >> Jeff, what you are describing is in works and actively discussed
> >> https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
> >> and
> >>
> >>
> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+improvements
> >>
> >> The last one may not directly speaks to the “ExtensionRegistry”, but if
> >> you look through he comments there is a whole lot about it since it is
> >> dependent.
> >> Feel free to participate, but I can say for now that it is slated for
> 1.0
> >> release.
> >>
> >> Cheers
> >> Oleg
> >>
> >> On Feb 17, 2016, at 3:08 PM, Jeff - Data Bean Australia
> >> <da...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> As my NiFi data flow becomes more and more serious, I need to put on
> >> Version Control. Since flow.xml.gz is generated automatically and it is
> >> saved in a compressed file, I am wondering what would be the best
> practice
> >> regarding version control?
> >>
> >> Thanks,
> >> Jeff
> >>
> >> --
> >> Data Bean - A Big Data Solution Provider in Australia.
> >>
> >>
> >
> >
> >
> > --
> > Data Bean - A Big Data Solution Provider in Australia.
>



-- 
Data Bean - A Big Data Solution Provider in Australia.

Re: Version Control on NiFi flow.xml

Posted by Joe Witt <jo...@gmail.com>.
Jeff

I think what you're doing is just fine for now.  To Oleg's point we
should make it better.

We do also have a database where each flow change is being written to
from a audit perspective and so we can show in the UI who made what
changes last.  That is less about true CM and more about providing a
meaningful user experience.

The biggest knock for CM of our current flow.xml.gz and for the
templates is that the order in which their components are serialized
is not presently guaranteed so it means diff won't be meaningful.  But
as far as capturing at specific intervals and storing the flow you
should be in good shape with your approach.

Thanks
Joe

On Wed, Feb 17, 2016 at 4:52 PM, Jeff - Data Bean Australia
<da...@gmail.com> wrote:
> Thanks Oleg for sharing this. They are definitely useful.
>
> By my question focused more on keeping the data flow definition files'
> versions, so that Data Flow Developers, or NiFi Cluster Manager in NiFi's
> term can keep track of our work.
>
> Currently I am using the following command line to generate a formatted XML
> to put it into our Git repository:
>
> cat conf/flow.xml.gz | gzip -dc | xmllint --format -
>
>
>
>
> On Thu, Feb 18, 2016 at 10:01 AM, Oleg Zhurakousky
> <oz...@hortonworks.com> wrote:
>>
>> Jeff, what you are describing is in works and actively discussed
>> https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
>> and
>>
>> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+improvements
>>
>> The last one may not directly speaks to the “ExtensionRegistry”, but if
>> you look through he comments there is a whole lot about it since it is
>> dependent.
>> Feel free to participate, but I can say for now that it is slated for 1.0
>> release.
>>
>> Cheers
>> Oleg
>>
>> On Feb 17, 2016, at 3:08 PM, Jeff - Data Bean Australia
>> <da...@gmail.com> wrote:
>>
>> Hi,
>>
>> As my NiFi data flow becomes more and more serious, I need to put on
>> Version Control. Since flow.xml.gz is generated automatically and it is
>> saved in a compressed file, I am wondering what would be the best practice
>> regarding version control?
>>
>> Thanks,
>> Jeff
>>
>> --
>> Data Bean - A Big Data Solution Provider in Australia.
>>
>>
>
>
>
> --
> Data Bean - A Big Data Solution Provider in Australia.

Re: Version Control on NiFi flow.xml

Posted by Jeff - Data Bean Australia <da...@gmail.com>.
Thanks Oleg for sharing this. They are definitely useful.

By my question focused more on keeping the data flow definition files'
versions, so that Data Flow Developers, or NiFi Cluster Manager in NiFi's
term can keep track of our work.

Currently I am using the following command line to generate a formatted XML
to put it into our Git repository:

cat conf/flow.xml.gz | gzip -dc | xmllint --format -




On Thu, Feb 18, 2016 at 10:01 AM, Oleg Zhurakousky <
ozhurakousky@hortonworks.com> wrote:

> Jeff, what you are describing is in works and actively discussed
> https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
> and
>
> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+improvements
>
> The last one may not directly speaks to the “ExtensionRegistry”, but if
> you look through he comments there is a whole lot about it since it is
> dependent.
> Feel free to participate, but I can say for now that it is slated for 1.0
> release.
>
> Cheers
> Oleg
>
> On Feb 17, 2016, at 3:08 PM, Jeff - Data Bean Australia <
> databean.au@gmail.com> wrote:
>
> Hi,
>
> As my NiFi data flow becomes more and more serious, I need to put on
> Version Control. Since flow.xml.gz is generated automatically and it is
> saved in a compressed file, I am wondering what would be the best practice
> regarding version control?
>
> Thanks,
> Jeff
>
> --
> Data Bean - A Big Data Solution Provider in Australia.
>
>
>


-- 
Data Bean - A Big Data Solution Provider in Australia.

Re: Version Control on NiFi flow.xml

Posted by Oleg Zhurakousky <oz...@hortonworks.com>.
Jeff, what you are describing is in works and actively discussed
https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
and
https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+improvements

The last one may not directly speaks to the “ExtensionRegistry”, but if you look through he comments there is a whole lot about it since it is dependent.
Feel free to participate, but I can say for now that it is slated for 1.0 release.

Cheers
Oleg

On Feb 17, 2016, at 3:08 PM, Jeff - Data Bean Australia <da...@gmail.com>> wrote:

Hi,

As my NiFi data flow becomes more and more serious, I need to put on Version Control. Since flow.xml.gz is generated automatically and it is saved in a compressed file, I am wondering what would be the best practice regarding version control?

Thanks,
Jeff

--
Data Bean - A Big Data Solution Provider in Australia.