You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Russell Bateman <ru...@perfectsearchcorp.com> on 2016/10/10 14:13:03 UTC
Which are the underlying files for NiFi?
In the NiFi release filesystem, where are things kept? Better yet, where in
documentation have I missed learning about this?
Specifically, if I create flows for integration testing of multiple
components (say, processors), where can I find the underlying files I can
pick up, store and deploy to a fresh server with a fresh NiFi, then launch
them on data (that I also deploy)?
Thanks for any help from those who know.
Russ
Re: Which are the underlying files for NiFi?
Posted by Jeff <jt...@gmail.com>.
Hello Russ! I'm glad you brought this up on the list. I've been thinking
about reworking/enhancing the NiFi testing framework to enable testing of
flows and flow templates from a programmatic perspective, without needing
to stand up an entire NiFi instance. A lot of this is in the idea stage
right now, and could require a fair chunk of refactoring. I've discussed
this a bit with other members of the community, and it has been pointed out
that we don't want to require deep knowledge of mocking frameworks, or the
NiFi framework itself, to allow extension developers to test the various
use cases that their extensions would be required to handle gracefully.
Improvements could be made to NiFi's testing framework to put extensions
through a number of predetermined use cases, such as a processor's
onTrigger invocation where the flowfile is null, or that processing can be
stopped within a reasonable amount of time when the processor is told to
stop or is disabled, in addition to the developer's own explicit test cases.
My goal would be to enable the testing of any use case, whether it be a
unit or integration test, and be able to allow the developer to mock
endpoints that would be external to NiFi itself (web services, databases,
etc) at the processor or flow/template level. Developers should be able to
verify a flow/template's behavior throughout the flow/template as well. I
have some ideas on how we can achieve this, and I'll be working on a
different way to write extensions to allow for easier mocking/injection of
collaborators/dependencies.
If you would like to collaborate on achieving these testing goals, I'd be
happy to discuss any ideas with you and the rest of the community.
Thanks,
Jeff
On Tue, Oct 11, 2016 at 12:43 PM Russell Bateman <
russell.bateman@perfectsearchcorp.com> wrote:
> Andy,
>
> First, many thanks for your answer which was very complete. It's exactly
> what I expected and I'm sure I had seen /flow.xml.gz/ in some thread
> before, but I could not remember it nor did I pay attention to that file
> while looking through the NiFi subdirectories (/*_repository/, /run/,
> /state/ and /work/) as I didn't think to look in /conf/.
>
> However, as I peruse testing [4] below, I don't find anything that
> readily explains (or even hints) to me how to replace (what's below
> serves only as the "essence" of) a JUnit test with all or part of a
> /flow.xml[.gz]/ or inject it:
>
> TestRunner runner = TestRunners.newTestRunner( new MyProcessor() );
> InputStream data = new FileInputStream( "sample.data" );
>
> runner.enqueue( data );
> runner.run( 1 );
> runner.assertQueueEmpty();
>
> List< MockFlowFile > results = runner.getFlowFilesForRelationship(
> MyProcessor.SUCCESS );
>
> MockFlowFile result = results.get( 0 );
> String actual = new String( runner.getContentAsByteArray(
> result ) );
>
>
> I have long written fairly exhaustive unit tests for the features of my
> custom processors and am very pleased with the result. Only rarely have
> I done something that worked in a unit test, but did not in production
> (and I think it was something I was over-looking anyway).
>
> Being able to run a whole- or partial flow from the unit-testing
> framework would be immeasurably cheaper and easier to set up than
> configuring an instance of a server, NiFi, and kicking it off. I need to
> be able to set up test code that will run two or more processors in this
> framework with the appropriate inter-connections. (You were suggesting
> that this is possible, right?) And I'm very happy to consume something
> like /flow.xml/ in doing it.
>
> What I (lightly) reproach in the developer guide is that one needs to be
> pretty familiar already with writing custom processors before reading it
> in order to map its advice to actual code because of its gross lack of
> real code samples. Early on, I found NiFi Rules! samples and actual NiFi
> standard-processor code to be much more useful to me than this guide.
>
> Russ
>
>
> On 10/10/2016 08:14 PM, Andy LoPresto wrote:
> > Hi Russell,
> >
> > The “flow” (the processors on the canvas and the relationships between
> > them) is stored on disk as <nifi_home>/conf/flow.xml.gz [1]. You can
> > also export selections from the flow as “templates” [2], along with a
> > name and description, which can then be saved as XML files and loaded
> > into a new instance of NiFi with (non-sensitive) configuration values
> > and properties maintained. You may also be interested in the ongoing
> > work to support the “Software Development Lifecycle (SDLC)” or
> > “Development to Production Pipeline (D2P)” efforts to allow users to
> > “develop” flows in testbed environments and “promote” them through
> > code control/repositories to QE/QA and then Production environments [3].
> >
> > If you are attempting to perform automated testing of flows or flow
> > segments, you can do this with the test runner [4]. I would encourage
> > you to store the flow in a template or full flow file (flow.xml.gz,
> > not a “flowfile”) and load that during the test rather than
> > programmatically instantiating and configuring each component and
> > relationship in the test code, but you are able to do both. Also, be
> > aware that the TestRunner is just that, and approximates/mocks some
> > elements of the NiFi environment, so you should understand what
> > decisions are being made behind the scenes to ensure that what is
> > covered by the test is exactly what you expect.
> >
> > Hope this helps clarify some things. Good luck.
> >
> > [1]
> > https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#terminology
> > [2]
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates
> > [3]
> >
> https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
> > [4]
> > https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#testing
> >
> > Andy LoPresto
> > alopresto@apache.org <ma...@apache.org>
> > /alopresto.apache@gmail.com <ma...@gmail.com>/
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> >
> >> On Oct 10, 2016, at 3:28 PM, Russell Bateman
> >> <russell.bateman@perfectsearchcorp.com
> >> <ma...@perfectsearchcorp.com>> wrote:
> >>
> >> Thanks, Joe. I've digested the document you linked.
> >>
> >> I think my real question is simpler. I don't think I'm interested in
> >> hacking NiFi's internal details. I assumed that once I "paint" my
> >> canvas with processors and relationships, it's harvestable in some
> >> form, in some /.xml/ file somewhere, and easily transported to, then
> >> dropped into, a new NiFi installation on some server or host for
> >> running it. I assumed my testing would be thus configured, then
> >> kicked off somehow.
> >>
> >> Now, I've long used the mocking from the JUnit
> >> /org.apache.nifi.util.TestRunner/, but it never dawned on me (still
> >> hasn't, in fact) that I can erect tests using it to run data through
> >> a whole flow. Were you suggesting that this is the case?
> >>
> >> Thanks,
> >> Russ
> >>
> >>
> >> On 10/10/2016 08:20 AM, Joe Witt wrote:
> >>> Russ,
> >>>
> >>> For reliable integration testing I'd recommend having a mocked
> >>> implementation of the process session. Not sure how easy we make that
> >>> right now but that would be better than trying to rely on internal
> >>> details of any given implementation.
> >>>
> >>> For general understanding of the under the covers bits of a running
> >>> nifi this document is really helpful:
> >>> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
> >>>
> >>> Thanks
> >>> Joe
> >>>
> >>> On Mon, Oct 10, 2016 at 10:13 AM, Russell Bateman
> >>> <ru...@perfectsearchcorp.com> wrote:
> >>>> In the NiFi release filesystem, where are things kept? Better yet,
> >>>> where in
> >>>> documentation have I missed learning about this?
> >>>>
> >>>> Specifically, if I create flows for integration testing of multiple
> >>>> components (say, processors), where can I find the underlying files
> >>>> I can
> >>>> pick up, store and deploy to a fresh server with a fresh NiFi, then
> >>>> launch
> >>>> them on data (that I also deploy)?
> >>>>
> >>>> Thanks for any help from those who know.
> >>>>
> >>>> Russ
> >>
> >
>
>
Re: Which are the underlying files for NiFi?
Posted by Russell Bateman <ru...@perfectsearchcorp.com>.
Andy,
First, many thanks for your answer which was very complete. It's exactly
what I expected and I'm sure I had seen /flow.xml.gz/ in some thread
before, but I could not remember it nor did I pay attention to that file
while looking through the NiFi subdirectories (/*_repository/, /run/,
/state/ and /work/) as I didn't think to look in /conf/.
However, as I peruse testing [4] below, I don't find anything that
readily explains (or even hints) to me how to replace (what's below
serves only as the "essence" of) a JUnit test with all or part of a
/flow.xml[.gz]/ or inject it:
TestRunner runner = TestRunners.newTestRunner( new MyProcessor() );
InputStream data = new FileInputStream( "sample.data" );
runner.enqueue( data );
runner.run( 1 );
runner.assertQueueEmpty();
List< MockFlowFile > results = runner.getFlowFilesForRelationship(
MyProcessor.SUCCESS );
MockFlowFile result = results.get( 0 );
String actual = new String( runner.getContentAsByteArray(
result ) );
I have long written fairly exhaustive unit tests for the features of my
custom processors and am very pleased with the result. Only rarely have
I done something that worked in a unit test, but did not in production
(and I think it was something I was over-looking anyway).
Being able to run a whole- or partial flow from the unit-testing
framework would be immeasurably cheaper and easier to set up than
configuring an instance of a server, NiFi, and kicking it off. I need to
be able to set up test code that will run two or more processors in this
framework with the appropriate inter-connections. (You were suggesting
that this is possible, right?) And I'm very happy to consume something
like /flow.xml/ in doing it.
What I (lightly) reproach in the developer guide is that one needs to be
pretty familiar already with writing custom processors before reading it
in order to map its advice to actual code because of its gross lack of
real code samples. Early on, I found NiFi Rules! samples and actual NiFi
standard-processor code to be much more useful to me than this guide.
Russ
On 10/10/2016 08:14 PM, Andy LoPresto wrote:
> Hi Russell,
>
> The \u201cflow\u201d (the processors on the canvas and the relationships between
> them) is stored on disk as <nifi_home>/conf/flow.xml.gz [1]. You can
> also export selections from the flow as \u201ctemplates\u201d [2], along with a
> name and description, which can then be saved as XML files and loaded
> into a new instance of NiFi with (non-sensitive) configuration values
> and properties maintained. You may also be interested in the ongoing
> work to support the \u201cSoftware Development Lifecycle (SDLC)\u201d or
> \u201cDevelopment to Production Pipeline (D2P)\u201d efforts to allow users to
> \u201cdevelop\u201d flows in testbed environments and \u201cpromote\u201d them through
> code control/repositories to QE/QA and then Production environments [3].
>
> If you are attempting to perform automated testing of flows or flow
> segments, you can do this with the test runner [4]. I would encourage
> you to store the flow in a template or full flow file (flow.xml.gz,
> not a \u201cflowfile\u201d) and load that during the test rather than
> programmatically instantiating and configuring each component and
> relationship in the test code, but you are able to do both. Also, be
> aware that the TestRunner is just that, and approximates/mocks some
> elements of the NiFi environment, so you should understand what
> decisions are being made behind the scenes to ensure that what is
> covered by the test is exactly what you expect.
>
> Hope this helps clarify some things. Good luck.
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#terminology
> [2] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates
> [3]
> https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
> [4]
> https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#testing
>
> Andy LoPresto
> alopresto@apache.org <ma...@apache.org>
> /alopresto.apache@gmail.com <ma...@gmail.com>/
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
>
>> On Oct 10, 2016, at 3:28 PM, Russell Bateman
>> <russell.bateman@perfectsearchcorp.com
>> <ma...@perfectsearchcorp.com>> wrote:
>>
>> Thanks, Joe. I've digested the document you linked.
>>
>> I think my real question is simpler. I don't think I'm interested in
>> hacking NiFi's internal details. I assumed that once I "paint" my
>> canvas with processors and relationships, it's harvestable in some
>> form, in some /.xml/ file somewhere, and easily transported to, then
>> dropped into, a new NiFi installation on some server or host for
>> running it. I assumed my testing would be thus configured, then
>> kicked off somehow.
>>
>> Now, I've long used the mocking from the JUnit
>> /org.apache.nifi.util.TestRunner/, but it never dawned on me (still
>> hasn't, in fact) that I can erect tests using it to run data through
>> a whole flow. Were you suggesting that this is the case?
>>
>> Thanks,
>> Russ
>>
>>
>> On 10/10/2016 08:20 AM, Joe Witt wrote:
>>> Russ,
>>>
>>> For reliable integration testing I'd recommend having a mocked
>>> implementation of the process session. Not sure how easy we make that
>>> right now but that would be better than trying to rely on internal
>>> details of any given implementation.
>>>
>>> For general understanding of the under the covers bits of a running
>>> nifi this document is really helpful:
>>> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
>>>
>>> Thanks
>>> Joe
>>>
>>> On Mon, Oct 10, 2016 at 10:13 AM, Russell Bateman
>>> <ru...@perfectsearchcorp.com> wrote:
>>>> In the NiFi release filesystem, where are things kept? Better yet,
>>>> where in
>>>> documentation have I missed learning about this?
>>>>
>>>> Specifically, if I create flows for integration testing of multiple
>>>> components (say, processors), where can I find the underlying files
>>>> I can
>>>> pick up, store and deploy to a fresh server with a fresh NiFi, then
>>>> launch
>>>> them on data (that I also deploy)?
>>>>
>>>> Thanks for any help from those who know.
>>>>
>>>> Russ
>>
>
Re: Which are the underlying files for NiFi?
Posted by Andy LoPresto <al...@apache.org>.
Hi Russell,
The “flow” (the processors on the canvas and the relationships between them) is stored on disk as <nifi_home>/conf/flow.xml.gz [1]. You can also export selections from the flow as “templates” [2], along with a name and description, which can then be saved as XML files and loaded into a new instance of NiFi with (non-sensitive) configuration values and properties maintained. You may also be interested in the ongoing work to support the “Software Development Lifecycle (SDLC)” or “Development to Production Pipeline (D2P)” efforts to allow users to “develop” flows in testbed environments and “promote” them through code control/repositories to QE/QA and then Production environments [3].
If you are attempting to perform automated testing of flows or flow segments, you can do this with the test runner [4]. I would encourage you to store the flow in a template or full flow file (flow.xml.gz, not a “flowfile”) and load that during the test rather than programmatically instantiating and configuring each component and relationship in the test code, but you are able to do both. Also, be aware that the TestRunner is just that, and approximates/mocks some elements of the NiFi environment, so you should understand what decisions are being made behind the scenes to ensure that what is covered by the test is exactly what you expect.
Hope this helps clarify some things. Good luck.
[1] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#terminology
[2] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates
[3] https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows <https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows>
[4] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#testing
Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> On Oct 10, 2016, at 3:28 PM, Russell Bateman <ru...@perfectsearchcorp.com> wrote:
>
> Thanks, Joe. I've digested the document you linked.
>
> I think my real question is simpler. I don't think I'm interested in hacking NiFi's internal details. I assumed that once I "paint" my canvas with processors and relationships, it's harvestable in some form, in some /.xml/ file somewhere, and easily transported to, then dropped into, a new NiFi installation on some server or host for running it. I assumed my testing would be thus configured, then kicked off somehow.
>
> Now, I've long used the mocking from the JUnit /org.apache.nifi.util.TestRunner/, but it never dawned on me (still hasn't, in fact) that I can erect tests using it to run data through a whole flow. Were you suggesting that this is the case?
>
> Thanks,
> Russ
>
>
> On 10/10/2016 08:20 AM, Joe Witt wrote:
>> Russ,
>>
>> For reliable integration testing I'd recommend having a mocked
>> implementation of the process session. Not sure how easy we make that
>> right now but that would be better than trying to rely on internal
>> details of any given implementation.
>>
>> For general understanding of the under the covers bits of a running
>> nifi this document is really helpful:
>> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
>>
>> Thanks
>> Joe
>>
>> On Mon, Oct 10, 2016 at 10:13 AM, Russell Bateman
>> <ru...@perfectsearchcorp.com> wrote:
>>> In the NiFi release filesystem, where are things kept? Better yet, where in
>>> documentation have I missed learning about this?
>>>
>>> Specifically, if I create flows for integration testing of multiple
>>> components (say, processors), where can I find the underlying files I can
>>> pick up, store and deploy to a fresh server with a fresh NiFi, then launch
>>> them on data (that I also deploy)?
>>>
>>> Thanks for any help from those who know.
>>>
>>> Russ
>
Re: Which are the underlying files for NiFi?
Posted by Russell Bateman <ru...@perfectsearchcorp.com>.
Thanks, Joe. I've digested the document you linked.
I think my real question is simpler. I don't think I'm interested in
hacking NiFi's internal details. I assumed that once I "paint" my canvas
with processors and relationships, it's harvestable in some form, in
some /.xml/ file somewhere, and easily transported to, then dropped
into, a new NiFi installation on some server or host for running it. I
assumed my testing would be thus configured, then kicked off somehow.
Now, I've long used the mocking from the JUnit
/org.apache.nifi.util.TestRunner/, but it never dawned on me (still
hasn't, in fact) that I can erect tests using it to run data through a
whole flow. Were you suggesting that this is the case?
Thanks,
Russ
On 10/10/2016 08:20 AM, Joe Witt wrote:
> Russ,
>
> For reliable integration testing I'd recommend having a mocked
> implementation of the process session. Not sure how easy we make that
> right now but that would be better than trying to rely on internal
> details of any given implementation.
>
> For general understanding of the under the covers bits of a running
> nifi this document is really helpful:
> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
>
> Thanks
> Joe
>
> On Mon, Oct 10, 2016 at 10:13 AM, Russell Bateman
> <ru...@perfectsearchcorp.com> wrote:
>> In the NiFi release filesystem, where are things kept? Better yet, where in
>> documentation have I missed learning about this?
>>
>> Specifically, if I create flows for integration testing of multiple
>> components (say, processors), where can I find the underlying files I can
>> pick up, store and deploy to a fresh server with a fresh NiFi, then launch
>> them on data (that I also deploy)?
>>
>> Thanks for any help from those who know.
>>
>> Russ
Re: Which are the underlying files for NiFi?
Posted by Joe Witt <jo...@gmail.com>.
Russ,
For reliable integration testing I'd recommend having a mocked
implementation of the process session. Not sure how easy we make that
right now but that would be better than trying to rely on internal
details of any given implementation.
For general understanding of the under the covers bits of a running
nifi this document is really helpful:
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
Thanks
Joe
On Mon, Oct 10, 2016 at 10:13 AM, Russell Bateman
<ru...@perfectsearchcorp.com> wrote:
> In the NiFi release filesystem, where are things kept? Better yet, where in
> documentation have I missed learning about this?
>
> Specifically, if I create flows for integration testing of multiple
> components (say, processors), where can I find the underlying files I can
> pick up, store and deploy to a fresh server with a fresh NiFi, then launch
> them on data (that I also deploy)?
>
> Thanks for any help from those who know.
>
> Russ