You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Stéphane Maarek <st...@gmail.com> on 2016/04/29 01:06:48 UTC

Doing development on nifi

Hi,

I'm very new to nifi and love the concept. As part of the process, I'm
learning. My biggest frustration is that I can't see the data flowing
through the system as I do development.

Maybe I missed an article or a link, but is it possible to view the data
while in the flow? I.e. Say I create a get http, I'd like it to fire once,
get some data so I can see what it looks like. Then if I do a split json,
I'd like to see if my output of it is what I expected or if I somehow
messed up, etc etc

I hope my question is clear

Thanks in advance,
Stéphane

Re: Doing development on nifi

Posted by Stéphane Maarek <st...@gmail.com>.

Thanks a lot everyone!

On Fri, Apr 29, 2016, 10:02 AM Joe Percivall <jo...@yahoo.com> wrote:

> Hello Stéphane,
>
> Just adding on to Matt's and Andy's answers, Andy mentioned Provenance[1]
> for replaying events but I also find it very useful for debugging
> processors/flows as well. Data Provenance is a core feature of NiFi and it
> allows you to see exactly what the FlowFile looked like (attributes and
> content) before and after a processor acted on it as well as the ability to
> see a map of the journey that FlowFile underwent through your flow. The
> easiest way to see the provenance of a processor is to right click on it
> and then click "Data provenance".
>
> The documentation below should be a great introduction and if you have any
> questions feel free to ask!
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance
>
>
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
> On Thursday, April 28, 2016 7:30 PM, Matt Burgess <ma...@gmail.com>
> wrote:
>
>
>
> Stéphane,
>
> Welcome to NiFi, glad to have you aboard!  May I ask what version you
> are using? I believe as of at least 0.6.0, you can view the items in a
> queued connection. So for your example, you can have a GetHttp into a
> SplitJson, but don't start the SplitJson, just the GetHttp. You will
> see any flowfiles generated by GetHttp queued up in the success (or
> response?) connection (whichever you have wired to SplitJson). Then
> you can right-click on the connection (the line between the
> processors) and choose List Queue. In that dialog you can choose an
> element by clicking on the Info icon ('i' in a circle) and see the
> information about it, including a View button for the content.
>
> The best part is that you don't have to do a "preview" run, then a
> "real" run. The data is in the connection's queue, so you can make
> alterations to your SplitJson, then start it to see if it works. If it
> doesn't, stop it and start the GetHttp again (if stopped) to put more
> data in the queue.  For fine-grained debugging, you can temporarily
> set the Run schedule for the SplitJson to something like 10 seconds,
> then when you start it, it will likely only bring in one flow file, so
> you can react to how it works, then stop it before it empties the
> queue.
>
> I hope that makes sense, I apologize in advance if I made things more
> confusing. The good news is there is a solution to your problem, even
> if I am not the right person to describe it :)
>
> Cheers,
> Matt
>
>
> On Thu, Apr 28, 2016 at 7:06 PM, Stéphane Maarek
> <st...@gmail.com> wrote:
> > Hi,
> >
> > I'm very new to nifi and love the concept. As part of the process, I'm
> > learning. My biggest frustration is that I can't see the data flowing
> > through the system as I do development.
> >
> > Maybe I missed an article or a link, but is it possible to view the data
> > while in the flow? I.e. Say I create a get http, I'd like it to fire
> once,
> > get some data so I can see what it looks like. Then if I do a split json,
> > I'd like to see if my output of it is what I expected or if I somehow
> messed
> > up, etc etc
> >
> > I hope my question is clear
> >
> > Thanks in advance,
> > Stéphane
>

Re: Doing development on nifi

Posted by Joe Percivall <jo...@yahoo.com>.

Hello Stéphane,

Just adding on to Matt's and Andy's answers, Andy mentioned Provenance[1] for replaying events but I also find it very useful for debugging processors/flows as well. Data Provenance is a core feature of NiFi and it allows you to see exactly what the FlowFile looked like (attributes and content) before and after a processor acted on it as well as the ability to see a map of the journey that FlowFile underwent through your flow. The easiest way to see the provenance of a processor is to right click on it and then click "Data provenance".

The documentation below should be a great introduction and if you have any questions feel free to ask!

[1] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance

Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com

On Thursday, April 28, 2016 7:30 PM, Matt Burgess <ma...@gmail.com> wrote:

Stéphane,

Welcome to NiFi, glad to have you aboard!  May I ask what version you
are using? I believe as of at least 0.6.0, you can view the items in a
queued connection. So for your example, you can have a GetHttp into a
SplitJson, but don't start the SplitJson, just the GetHttp. You will
see any flowfiles generated by GetHttp queued up in the success (or
response?) connection (whichever you have wired to SplitJson). Then
you can right-click on the connection (the line between the
processors) and choose List Queue. In that dialog you can choose an
element by clicking on the Info icon ('i' in a circle) and see the
information about it, including a View button for the content.

The best part is that you don't have to do a "preview" run, then a
"real" run. The data is in the connection's queue, so you can make
alterations to your SplitJson, then start it to see if it works. If it
doesn't, stop it and start the GetHttp again (if stopped) to put more
data in the queue.  For fine-grained debugging, you can temporarily
set the Run schedule for the SplitJson to something like 10 seconds,
then when you start it, it will likely only bring in one flow file, so
you can react to how it works, then stop it before it empties the
queue.

I hope that makes sense, I apologize in advance if I made things more
confusing. The good news is there is a solution to your problem, even
if I am not the right person to describe it :)

Cheers,
Matt

On Thu, Apr 28, 2016 at 7:06 PM, Stéphane Maarek
<st...@gmail.com> wrote:
> Hi,
>
> I'm very new to nifi and love the concept. As part of the process, I'm
> learning. My biggest frustration is that I can't see the data flowing
> through the system as I do development.
>
> Maybe I missed an article or a link, but is it possible to view the data
> while in the flow? I.e. Say I create a get http, I'd like it to fire once,
> get some data so I can see what it looks like. Then if I do a split json,
> I'd like to see if my output of it is what I expected or if I somehow messed
> up, etc etc
>
> I hope my question is clear
>
> Thanks in advance,
> Stéphane

Re: Doing development on nifi

Posted by Matt Burgess <ma...@gmail.com>.

Stéphane,

Welcome to NiFi, glad to have you aboard!  May I ask what version you
are using? I believe as of at least 0.6.0, you can view the items in a
queued connection. So for your example, you can have a GetHttp into a
SplitJson, but don't start the SplitJson, just the GetHttp. You will
see any flowfiles generated by GetHttp queued up in the success (or
response?) connection (whichever you have wired to SplitJson). Then
you can right-click on the connection (the line between the
processors) and choose List Queue. In that dialog you can choose an
element by clicking on the Info icon ('i' in a circle) and see the
information about it, including a View button for the content.

The best part is that you don't have to do a "preview" run, then a
"real" run. The data is in the connection's queue, so you can make
alterations to your SplitJson, then start it to see if it works. If it
doesn't, stop it and start the GetHttp again (if stopped) to put more
data in the queue.  For fine-grained debugging, you can temporarily
set the Run schedule for the SplitJson to something like 10 seconds,
then when you start it, it will likely only bring in one flow file, so
you can react to how it works, then stop it before it empties the
queue.

I hope that makes sense, I apologize in advance if I made things more
confusing. The good news is there is a solution to your problem, even
if I am not the right person to describe it :)

Cheers,
Matt

On Thu, Apr 28, 2016 at 7:06 PM, Stéphane Maarek
<st...@gmail.com> wrote:
> Hi,
>
> I'm very new to nifi and love the concept. As part of the process, I'm
> learning. My biggest frustration is that I can't see the data flowing
> through the system as I do development.
>
> Maybe I missed an article or a link, but is it possible to view the data
> while in the flow? I.e. Say I create a get http, I'd like it to fire once,
> get some data so I can see what it looks like. Then if I do a split json,
> I'd like to see if my output of it is what I expected or if I somehow messed
> up, etc etc
>
> I hope my question is clear
>
> Thanks in advance,
> Stéphane

Re: Doing development on nifi

Posted by Andy LoPresto <al...@apache.org>.

Hi Stéphane,

NiFi is intentionally designed to allow you to make changes on a very tight cycle while interacting with live data. This separates it from a number of tools that require lengthy deployment processes to push them to sandbox/production environments and test with real data. As one of the engineers says frequently, “NiFi is like digging irrigation ditches as the water flows, rather than building out a sprinkler system in advance."

Each processor component and queue will show statistics about the data they are processing and you can see further information and history by going into the Stats dialog of the component (available through the right-click context menu on each element).

To monitor actual data (such as the response of an HTTP request or the result of a JSON split), I’d recommend using the LogAttribute processor [1]. You can use this processor to print the value of specific attributes, all attributes, flowfile payload, expression language queries, etc. to a log file and monitor the content in real-time. I’m not sure if we have a specific tutorial for this process, but many of the tutorials and videos [2] for other flows include this functionality to demonstrate exactly what you are looking for. If you decide that the flow needs modification, you can also replay flowfiles through the new flow very easily from the provenance view [3].

One other way to explore the processors, is of course to write a unit test and evaluate the result compared to your expected values. This is more advanced and requires downloading the source code and does not use the UI, but for some more complicated usages or for people familiar with the development process, NiFi provides extensive flow mocking capabilities to allow developers to do this.

I hope this helps. If we can make anything clearer or you have more questions, please don’t hesitate to ask.

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.LogAttribute/index.html <https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.LogAttribute/index.html>
[2] https://nifi.apache.org/videos.html <https://nifi.apache.org/videos.html>
[3] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#replaying-a-flowfile <https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#replaying-a-flowfile>

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 28, 2016, at 4:06 PM, Stéphane Maarek <st...@gmail.com> wrote:
> 
> Hi,
> 
> I'm very new to nifi and love the concept. As part of the process, I'm learning. My biggest frustration is that I can't see the data flowing through the system as I do development.
> 
> Maybe I missed an article or a link, but is it possible to view the data while in the flow? I.e. Say I create a get http, I'd like it to fire once, get some data so I can see what it looks like. Then if I do a split json, I'd like to see if my output of it is what I expected or if I somehow messed up, etc etc
> 
> I hope my question is clear
> 
> Thanks in advance,
> Stéphane