You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Lee Laim (leelaim)" <le...@micron.com> on 2018/10/11 19:49:31 UTC

RE: [EXT] Back Pressure on Process Group

Hi John,  


You can send the initiating flowfile into the Process Group and simultaneously send a duplicate flowfile around the process group into a MergeContent processor.  This duplicate will act as a back pressure "latch".

Upon successful exit of the process group, Merge Content with a strict correlation strategy will clear the back pressure latch, allowing the next flowfile into the group.

Personally, I think Wait/Notify is the more elegant solution, but have successfully used the back pressure latch before Wait/Notify was readily available.   
 

This thread might offer some additional insight: http://apache-nifi.1125220.n5.nabble.com/Having-a-processor-wait-for-all-inputs-td15614.html 

Thanks,
Lee


-----Original Message-----
From: John McGinn [mailto:amruginn-nifi@yahoo.com] 
Sent: Thursday, October 11, 2018 12:17 PM
To: users@nifi.apache.org
Subject: [EXT] Back Pressure on Process Group

I've been going through mailing list archives, and looking at blog posts around Wait/Notify, and these don't seem to be the solution for my use case.

My basic use case is as follows. I have 4 DB tables, 3 of which are id/name pairs (office name, city, state), and the 4th table joins the 3 ids together to a new id which is used elsewhere in the database.

Using NiFi to injest data from a different database system, we have to verify if that office is active, and if it isn't active or non-existent, create a new record, as well as any of the other 3 tables necessary.

The first step, then, is to join the 4 tables together to search for the name fields, and if the join comes back with a row, use that top level id as an attribute. No problem, works fine. (FetchDatabaseTable -> AvroToJson -> EvaluateJsonPath, etc.) If the join comes back empty, I need to insert rows for the 3 pieces and then join them together. Ideally, this would be a flow of 3 PutSqls, then a connection back to the top level search of the database. (Currently I'm using a modified custom processor, LookupAttributeFromSQL, that Brett Ryan did in January 18th, before he worked on a SQLLookupService.)

The problem is that I could have 2 records coming in with the same pieces of information, and because it's flow based, the check for the 4 table join will come up empty on the second record before the first record is done creating the 4 table records. I've investigated the Wait/Notify pattern, but the odd part for me is that you need to have a separate "initialization" of the Wait/Notify release signal indicator (https://gist.github.com/ijokarumawak/9e1a4855934f2bb9661f88ca625bd244) and that seems "hack-ish" to me.

With all of that said, I was curious if there was a way to have a back pressure value of 1 into the Process Group, so that if there is a flow file anywhere in the Process Group, the flow file is unable to enter that Process Group? That way, the creation of the 4 records could be inside a process group, and no other flowfile can enter until that first flowfile has exited. 

Thanks for any insight,
John