You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Utkarsh Srivastava <ut...@phenompeople.com> on 2019/10/14 14:11:58 UTC

Need help with building a nifi dashboard

Greetings,

Situation: I have a set of operations that I need to perform on my data in
a particular order.
For every operation, I am making processor.
I am organising different functionalities in seperate process groups, and
connecting the process groups through input/output ports.


Now I need to build a dashboard to display a bunch of information about
this 'flow'.

The information I need to display on the dashboard :

1. The path of the flow - The 'path' of processors that the data went
through in a particular flow ( Every time I start the respective process
group). I also need to record this data,sorted by time, and show a 'flow
history' of all the previous flows.

2. Wether a particular flow was succesful or not.

This is the design I came up with to get all of this information :

Step 1. Find the First processor for a particular process group.

To do this, I will hit the respective process group's API
http://localhost:8080/nifi-api/processors/{ID}, and then check which
processor has 0 'readbytes' and some 'writebytes', or check for input
ports, and find out the processor which has a connection with the input
group.

Step 2. To find the 'processor path' of my flow files, I will check the
connections of the process group, and map the source and destination
processors for the connections.

 I can't completely figure out how to find wether a particular flow
succeeded or failed, and how to record the flow history of the previous
flows.

To get the success/failure status, I was thinking I can find out the
flowFilesIn and flowFilesOut properties of every processor and match them
to find out how many files went through the success processor, and how many
went through the failure processor.

But this approach can fail because the FlowFilesIn and FlowFilesOut numbers
from the previous runs would be added up too. ( Unless I can clear that
data everytime I start my process group ?)

The other approach I thought of for getting this is to find out the data
provenance for the respective processors and based on the provenance event,
figure out wether the processor did what it is supposed to do ( Haven't
completely figured this out ). Would really appreciate some help for this.

My biggest problem right now is how to get the 'flow history' of my process
groups. I need to segregrate and record my flows by time.
I am not able to figure out how exactly to accomplish this.
One way that I thought of is to record the statsLastRefreshed of every
processor, and use those timestamps to construct a timeline for each flow.
I think data provenance can be effectively used for this, but I am confused
about the provenance events' timestamps, and how to use them for this
purpose.

Any help on these two issues is much appreciated.

Also, would really appreciate some feedback/improvements on my solution, or
if you think I need to correct/change something.


Thanks,
Utkarsh