You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Russell Bateman <ru...@perfectsearchcorp.com> on 2016/10/10 14:13:03 UTC

Which are the underlying files for NiFi?

In the NiFi release filesystem, where are things kept? Better yet, where in
documentation have I missed learning about this?

Specifically, if I create flows for integration testing of multiple
components (say, processors), where can I find the underlying files I can
pick up, store and deploy to a fresh server with a fresh NiFi, then launch
them on data (that I also deploy)?

Thanks for any help from those who know.

Russ

Re: Which are the underlying files for NiFi?

Posted by Jeff <jt...@gmail.com>.

Hello Russ!  I'm glad you brought this up on the list.  I've been thinking
about reworking/enhancing the NiFi testing framework to enable testing of
flows and flow templates from a programmatic perspective, without needing
to stand up an entire NiFi instance.  A lot of this is in the idea stage
right now, and could require a fair chunk of refactoring.  I've discussed
this a bit with other members of the community, and it has been pointed out
that we don't want to require deep knowledge of mocking frameworks, or the
NiFi framework itself, to allow extension developers to test the various
use cases that their extensions would be required to handle gracefully.
Improvements could be made to NiFi's testing framework to put extensions
through a number of predetermined use cases, such as a processor's
onTrigger invocation where the flowfile is null, or that processing can be
stopped within a reasonable amount of time when the processor is told to
stop or is disabled, in addition to the developer's own explicit test cases.

My goal would be to enable the testing of any use case, whether it be a
unit or integration test, and be able to allow the developer to mock
endpoints that would be external to NiFi itself (web services, databases,
etc) at the processor or flow/template level.  Developers should be able to
verify a flow/template's behavior throughout the flow/template as well.  I
have some ideas on how we can achieve this, and I'll be working on a
different way to write extensions to allow for easier mocking/injection of
collaborators/dependencies.

If you would like to collaborate on achieving these testing goals, I'd be
happy to discuss any ideas with you and the rest of the community.

Thanks,
Jeff

On Tue, Oct 11, 2016 at 12:43 PM Russell Bateman <
russell.bateman@perfectsearchcorp.com> wrote:

> Andy,
>
> First, many thanks for your answer which was very complete. It's exactly
> what I expected and I'm sure I had seen /flow.xml.gz/ in some thread
> before, but I could not remember it nor did I pay attention to that file
> while looking through the NiFi subdirectories (/*_repository/, /run/,
> /state/ and /work/) as I didn't think to look in /conf/.
>
> However, as I peruse testing [4] below, I don't find anything that
> readily explains (or even hints) to me how to replace (what's below
> serves only as the "essence" of) a JUnit test with all or part of a
> /flow.xml[.gz]/ or inject it:
>
>     TestRunner  runner = TestRunners.newTestRunner( new MyProcessor() );
>     InputStream data   = new FileInputStream( "sample.data" );
>
>     runner.enqueue( data );
>     runner.run( 1 );
>     runner.assertQueueEmpty();
>
>     List< MockFlowFile > results = runner.getFlowFilesForRelationship(
>     MyProcessor.SUCCESS );
>
>     MockFlowFile result = results.get( 0 );
>     String       actual = new String( runner.getContentAsByteArray(
>     result ) );
>
>
> I have long written fairly exhaustive unit tests for the features of my
> custom processors and am very pleased with the result. Only rarely have
> I done something that worked in a unit test, but did not in production
> (and I think it was something I was over-looking anyway).
>
> Being able to run a whole- or partial flow from the unit-testing
> framework would be immeasurably cheaper and easier to set up than
> configuring an instance of a server, NiFi, and kicking it off. I need to
> be able to set up test code that will run two or more processors in this
> framework with the appropriate inter-connections. (You were suggesting
> that this is possible, right?) And I'm very happy to consume something
> like /flow.xml/ in doing it.
>
> What I (lightly) reproach in the developer guide is that one needs to be
> pretty familiar already with writing custom processors before reading it
> in order to map its advice to actual code because of its gross lack of
> real code samples. Early on, I found NiFi Rules! samples and actual NiFi
> standard-processor code to be much more useful to me than this guide.
>
> Russ
>
>
> On 10/10/2016 08:14 PM, Andy LoPresto wrote:
> > Hi Russell,
> >
> > The “flow” (the processors on the canvas and the relationships between
> > them) is stored on disk as <nifi_home>/conf/flow.xml.gz [1]. You can
> > also export selections from the flow as “templates” [2], along with a
> > name and description, which can then be saved as XML files and loaded
> > into a new instance of NiFi with (non-sensitive) configuration values
> > and properties maintained. You may also be interested in the ongoing
> > work to support the “Software Development Lifecycle (SDLC)” or
> > “Development to Production Pipeline (D2P)” efforts to allow users to
> > “develop” flows in testbed environments and “promote” them through
> > code control/repositories to QE/QA and then Production environments [3].
> >
> > If you are attempting to perform automated testing of flows or flow
> > segments, you can do this with the test runner [4]. I would encourage
> > you to store the flow in a template or full flow file (flow.xml.gz,
> > not a “flowfile”) and load that during the test rather than
> > programmatically instantiating and configuring each component and
> > relationship in the test code, but you are able to do both. Also, be
> > aware that the TestRunner is just that, and approximates/mocks some
> > elements of the NiFi environment, so you should understand what
> > decisions are being made behind the scenes to ensure that what is
> > covered by the test is exactly what you expect.
> >
> > Hope this helps clarify some things. Good luck.
> >
> > [1]
> > https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#terminology
> > [2]
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates
> > [3]
> >
> https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
> > [4]
> > https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#testing
> >
> > Andy LoPresto
> > alopresto@apache.org <ma...@apache.org>
> > /alopresto.apache@gmail.com <ma...@gmail.com>/
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >
> >> On Oct 10, 2016, at 3:28 PM, Russell Bateman
> >> <russell.bateman@perfectsearchcorp.com
> >> <ma...@perfectsearchcorp.com>> wrote:
> >>
> >> Thanks, Joe. I've digested the document you linked.
> >>
> >> I think my real question is simpler. I don't think I'm interested in
> >> hacking NiFi's internal details. I assumed that once I "paint" my
> >> canvas with processors and relationships, it's harvestable in some
> >> form, in some /.xml/ file somewhere, and easily transported to, then
> >> dropped into, a new NiFi installation on some server or host for
> >> running it. I assumed my testing would be thus configured, then
> >> kicked off somehow.
> >>
> >> Now, I've long used the mocking from the JUnit
> >> /org.apache.nifi.util.TestRunner/, but it never dawned on me (still
> >> hasn't, in fact) that I can erect tests using it to run data through
> >> a whole flow. Were you suggesting that this is the case?
> >>
> >> Thanks,
> >> Russ
> >>
> >>
> >> On 10/10/2016 08:20 AM, Joe Witt wrote:
> >>> Russ,
> >>>
> >>> For reliable integration testing I'd recommend having a mocked
> >>> implementation of the process session.  Not sure how easy we make that
> >>> right now but that would be better than trying to rely on internal
> >>> details of any given implementation.
> >>>
> >>> For general understanding of the under the covers bits of a running
> >>> nifi this document is really helpful:
> >>> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
> >>>
> >>> Thanks
> >>> Joe
> >>>
> >>> On Mon, Oct 10, 2016 at 10:13 AM, Russell Bateman
> >>> <ru...@perfectsearchcorp.com> wrote:
> >>>> In the NiFi release filesystem, where are things kept? Better yet,
> >>>> where in
> >>>> documentation have I missed learning about this?
> >>>>
> >>>> Specifically, if I create flows for integration testing of multiple
> >>>> components (say, processors), where can I find the underlying files
> >>>> I can
> >>>> pick up, store and deploy to a fresh server with a fresh NiFi, then
> >>>> launch
> >>>> them on data (that I also deploy)?
> >>>>
> >>>> Thanks for any help from those who know.
> >>>>
> >>>> Russ
> >>
> >
>
>

Re: Which are the underlying files for NiFi?

Posted by Russell Bateman <ru...@perfectsearchcorp.com>.

Andy,

First, many thanks for your answer which was very complete. It's exactly 
what I expected and I'm sure I had seen /flow.xml.gz/ in some thread 
before, but I could not remember it nor did I pay attention to that file 
while looking through the NiFi subdirectories (/*_repository/, /run/, 
/state/ and /work/) as I didn't think to look in /conf/.

However, as I peruse testing [4] below, I don't find anything that 
readily explains (or even hints) to me how to replace (what's below 
serves only as the "essence" of) a JUnit test with all or part of a 
/flow.xml[.gz]/ or inject it:

    TestRunner  runner = TestRunners.newTestRunner( new MyProcessor() );
    InputStream data   = new FileInputStream( "sample.data" );

    runner.enqueue( data );
    runner.run( 1 );
    runner.assertQueueEmpty();

    List< MockFlowFile > results = runner.getFlowFilesForRelationship(
    MyProcessor.SUCCESS );

    MockFlowFile result = results.get( 0 );
    String       actual = new String( runner.getContentAsByteArray(
    result ) );


I have long written fairly exhaustive unit tests for the features of my 
custom processors and am very pleased with the result. Only rarely have 
I done something that worked in a unit test, but did not in production 
(and I think it was something I was over-looking anyway).

Being able to run a whole- or partial flow from the unit-testing 
framework would be immeasurably cheaper and easier to set up than 
configuring an instance of a server, NiFi, and kicking it off. I need to 
be able to set up test code that will run two or more processors in this 
framework with the appropriate inter-connections. (You were suggesting 
that this is possible, right?) And I'm very happy to consume something 
like /flow.xml/ in doing it.

What I (lightly) reproach in the developer guide is that one needs to be 
pretty familiar already with writing custom processors before reading it 
in order to map its advice to actual code because of its gross lack of 
real code samples. Early on, I found NiFi Rules! samples and actual NiFi 
standard-processor code to be much more useful to me than this guide.

Russ


On 10/10/2016 08:14 PM, Andy LoPresto wrote:
> Hi Russell,
>
> The \u201cflow\u201d (the processors on the canvas and the relationships between 
> them) is stored on disk as <nifi_home>/conf/flow.xml.gz [1]. You can 
> also export selections from the flow as \u201ctemplates\u201d [2], along with a 
> name and description, which can then be saved as XML files and loaded 
> into a new instance of NiFi with (non-sensitive) configuration values 
> and properties maintained. You may also be interested in the ongoing 
> work to support the \u201cSoftware Development Lifecycle (SDLC)\u201d or 
> \u201cDevelopment to Production Pipeline (D2P)\u201d efforts to allow users to 
> \u201cdevelop\u201d flows in testbed environments and \u201cpromote\u201d them through 
> code control/repositories to QE/QA and then Production environments [3].
>
> If you are attempting to perform automated testing of flows or flow 
> segments, you can do this with the test runner [4]. I would encourage 
> you to store the flow in a template or full flow file (flow.xml.gz, 
> not a \u201cflowfile\u201d) and load that during the test rather than 
> programmatically instantiating and configuring each component and 
> relationship in the test code, but you are able to do both. Also, be 
> aware that the TestRunner is just that, and approximates/mocks some 
> elements of the NiFi environment, so you should understand what 
> decisions are being made behind the scenes to ensure that what is 
> covered by the test is exactly what you expect.
>
> Hope this helps clarify some things. Good luck.
>
> [1] 
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#terminology
> [2] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates
> [3] 
> https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
> [4] 
> https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#testing
>
> Andy LoPresto
> alopresto@apache.org <ma...@apache.org>
> /alopresto.apache@gmail.com <ma...@gmail.com>/
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
>> On Oct 10, 2016, at 3:28 PM, Russell Bateman 
>> <russell.bateman@perfectsearchcorp.com 
>> <ma...@perfectsearchcorp.com>> wrote:
>>
>> Thanks, Joe. I've digested the document you linked.
>>
>> I think my real question is simpler. I don't think I'm interested in 
>> hacking NiFi's internal details. I assumed that once I "paint" my 
>> canvas with processors and relationships, it's harvestable in some 
>> form, in some /.xml/ file somewhere, and easily transported to, then 
>> dropped into, a new NiFi installation on some server or host for 
>> running it. I assumed my testing would be thus configured, then 
>> kicked off somehow.
>>
>> Now, I've long used the mocking from the JUnit 
>> /org.apache.nifi.util.TestRunner/, but it never dawned on me (still 
>> hasn't, in fact) that I can erect tests using it to run data through 
>> a whole flow. Were you suggesting that this is the case?
>>
>> Thanks,
>> Russ
>>
>>
>> On 10/10/2016 08:20 AM, Joe Witt wrote:
>>> Russ,
>>>
>>> For reliable integration testing I'd recommend having a mocked
>>> implementation of the process session.  Not sure how easy we make that
>>> right now but that would be better than trying to rely on internal
>>> details of any given implementation.
>>>
>>> For general understanding of the under the covers bits of a running
>>> nifi this document is really helpful:
>>> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
>>>
>>> Thanks
>>> Joe
>>>
>>> On Mon, Oct 10, 2016 at 10:13 AM, Russell Bateman
>>> <ru...@perfectsearchcorp.com> wrote:
>>>> In the NiFi release filesystem, where are things kept? Better yet, 
>>>> where in
>>>> documentation have I missed learning about this?
>>>>
>>>> Specifically, if I create flows for integration testing of multiple
>>>> components (say, processors), where can I find the underlying files 
>>>> I can
>>>> pick up, store and deploy to a fresh server with a fresh NiFi, then 
>>>> launch
>>>> them on data (that I also deploy)?
>>>>
>>>> Thanks for any help from those who know.
>>>>
>>>> Russ
>>
>

Re: Which are the underlying files for NiFi?

Posted by Andy LoPresto <al...@apache.org>.

Hi Russell,

The “flow” (the processors on the canvas and the relationships between them) is stored on disk as <nifi_home>/conf/flow.xml.gz [1]. You can also export selections from the flow as “templates” [2], along with a name and description, which can then be saved as XML files and loaded into a new instance of NiFi with (non-sensitive) configuration values and properties maintained. You may also be interested in the ongoing work to support the “Software Development Lifecycle (SDLC)” or “Development to Production Pipeline (D2P)” efforts to allow users to “develop” flows in testbed environments and “promote” them through code control/repositories to QE/QA and then Production environments [3].

If you are attempting to perform automated testing of flows or flow segments, you can do this with the test runner [4]. I would encourage you to store the flow in a template or full flow file (flow.xml.gz, not a “flowfile”) and load that during the test rather than programmatically instantiating and configuring each component and relationship in the test code, but you are able to do both. Also, be aware that the TestRunner is just that, and approximates/mocks some elements of the NiFi environment, so you should understand what decisions are being made behind the scenes to ensure that what is covered by the test is exactly what you expect.

Hope this helps clarify some things. Good luck.

[1] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#terminology
[2] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates
[3] https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows <https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows>
[4] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#testing

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 10, 2016, at 3:28 PM, Russell Bateman <ru...@perfectsearchcorp.com> wrote:
> 
> Thanks, Joe. I've digested the document you linked.
> 
> I think my real question is simpler. I don't think I'm interested in hacking NiFi's internal details. I assumed that once I "paint" my canvas with processors and relationships, it's harvestable in some form, in some /.xml/ file somewhere, and easily transported to, then dropped into, a new NiFi installation on some server or host for running it. I assumed my testing would be thus configured, then kicked off somehow.
> 
> Now, I've long used the mocking from the JUnit /org.apache.nifi.util.TestRunner/, but it never dawned on me (still hasn't, in fact) that I can erect tests using it to run data through a whole flow. Were you suggesting that this is the case?
> 
> Thanks,
> Russ
> 
> 
> On 10/10/2016 08:20 AM, Joe Witt wrote:
>> Russ,
>> 
>> For reliable integration testing I'd recommend having a mocked
>> implementation of the process session.  Not sure how easy we make that
>> right now but that would be better than trying to rely on internal
>> details of any given implementation.
>> 
>> For general understanding of the under the covers bits of a running
>> nifi this document is really helpful:
>> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
>> 
>> Thanks
>> Joe
>> 
>> On Mon, Oct 10, 2016 at 10:13 AM, Russell Bateman
>> <ru...@perfectsearchcorp.com> wrote:
>>> In the NiFi release filesystem, where are things kept? Better yet, where in
>>> documentation have I missed learning about this?
>>> 
>>> Specifically, if I create flows for integration testing of multiple
>>> components (say, processors), where can I find the underlying files I can
>>> pick up, store and deploy to a fresh server with a fresh NiFi, then launch
>>> them on data (that I also deploy)?
>>> 
>>> Thanks for any help from those who know.
>>> 
>>> Russ
>

Re: Which are the underlying files for NiFi?

Posted by Russell Bateman <ru...@perfectsearchcorp.com>.

Thanks, Joe. I've digested the document you linked.

I think my real question is simpler. I don't think I'm interested in 
hacking NiFi's internal details. I assumed that once I "paint" my canvas 
with processors and relationships, it's harvestable in some form, in 
some /.xml/ file somewhere, and easily transported to, then dropped 
into, a new NiFi installation on some server or host for running it. I 
assumed my testing would be thus configured, then kicked off somehow.

Now, I've long used the mocking from the JUnit 
/org.apache.nifi.util.TestRunner/, but it never dawned on me (still 
hasn't, in fact) that I can erect tests using it to run data through a 
whole flow. Were you suggesting that this is the case?

Thanks,
Russ

On 10/10/2016 08:20 AM, Joe Witt wrote:
> Russ,
>
> For reliable integration testing I'd recommend having a mocked
> implementation of the process session.  Not sure how easy we make that
> right now but that would be better than trying to rely on internal
> details of any given implementation.
>
> For general understanding of the under the covers bits of a running
> nifi this document is really helpful:
> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
>
> Thanks
> Joe
>
> On Mon, Oct 10, 2016 at 10:13 AM, Russell Bateman
> <ru...@perfectsearchcorp.com> wrote:
>> In the NiFi release filesystem, where are things kept? Better yet, where in
>> documentation have I missed learning about this?
>>
>> Specifically, if I create flows for integration testing of multiple
>> components (say, processors), where can I find the underlying files I can
>> pick up, store and deploy to a fresh server with a fresh NiFi, then launch
>> them on data (that I also deploy)?
>>
>> Thanks for any help from those who know.
>>
>> Russ

Re: Which are the underlying files for NiFi?

Posted by Joe Witt <jo...@gmail.com>.

Russ,

For reliable integration testing I'd recommend having a mocked
implementation of the process session.  Not sure how easy we make that
right now but that would be better than trying to rely on internal
details of any given implementation.

For general understanding of the under the covers bits of a running
nifi this document is really helpful:
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html

Thanks
Joe

On Mon, Oct 10, 2016 at 10:13 AM, Russell Bateman
<ru...@perfectsearchcorp.com> wrote:
> In the NiFi release filesystem, where are things kept? Better yet, where in
> documentation have I missed learning about this?
>
> Specifically, if I create flows for integration testing of multiple
> components (say, processors), where can I find the underlying files I can
> pick up, store and deploy to a fresh server with a fresh NiFi, then launch
> them on data (that I also deploy)?
>
> Thanks for any help from those who know.
>
> Russ