You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by DAVID SMITH <da...@btinternet.com.INVALID> on 2020/06/01 12:33:55 UTC

Merging the unique attributes of 2 flowfiles

Hi
I have a group of log files coming in via http listener, up to 30 logs  per transaction, of which I only need the values that are in 2 of those log files per transaction. After using some RouteOnContents I end up with the two log flowfiles I want.
 In my current flow I am using a MergeContent processor to try and merge the two required flowfiles on a common ident attribute value  which I have extracted from each log files earlier, I have also extracted some other attributes from the flowfiles at this point, and as everything I am interested in these attributes I don't mind what happens with the content of the flowfiiles. When I step through the flow all is fine and works as I expect, however when I run it at pace and log files are coming in for multiple transactions at the same time the merge fails on most occasions. 

My mergecontent settings are:Merge Strategy                        Bin Packing AlgorithmMerge Format                          Binary ConcatenationAttribute Strategy                    Keep all Unique atttributesCorrelation Attribute Name         ${import.ident}Metadata Strategy                    Ignore MeatdataMinimum No Of Entries            2Maximum No Of Enteries            2Max bin age                             1 minutes
All the other properties are at default.
Have I not set something correctly or is there a simpler way of merging the attributes from two flowfiles onto one flowfile?
Many thanksDave

Re: Merging the unique attributes of 2 flowfiles

Posted by Mark Payne <ma...@hotmail.com>.
David,

I suspect you need to change the "Maximum number of Bins” property. The default, I believe, is 1 bin. Or maybe 5 or 10. Something small. This works fine if you’re not using the correlation attribute.

When a FlowFile comes into the Processor, the Processor has to determine which bin to put the FlowFile in. If using the Correlation Attribute, it determines that by looking at the value of the attribute. So when FlowFile 1 comes in with an attribute value of Foo, it goes to Bin 1. FlowFile 2 comes in with and attribute value of Bar and it goes to Bin 2. FlowFile 3 comes in and has an attribute value of Baz so it goes to Bin 3.

So let’s say that we’ve filled up all of the bins. And another FlowFile comes in. It must go to one of the existing bins, or be put into a new bin. If its attribute value matches one of the bins, it’ll be merged together with the other FlowFiles in that bin. But if it doesn’t match one of the bins, it needs its own, new bin. Since all of the bins have now been used up, it must evict one of the existing bins prematurely and fail it.

So at a low volume you’re likely not seeing all bins used. But when you increase the volume, you’re filling all of the bins and failing the merge. So you may want to set it to at least 30, given that you’re indicating that you’ll have up to 30 logs per transaction - or perhaps a bit more if you want to leave a little extra room for that to change.

Thanks
-Mark





> On Jun 1, 2020, at 8:33 AM, DAVID SMITH <da...@btinternet.com.INVALID> wrote:
> 
> Hi
> I have a group of log files coming in via http listener, up to 30 logs  per transaction, of which I only need the values that are in 2 of those log files per transaction. After using some RouteOnContents I end up with the two log flowfiles I want.
> In my current flow I am using a MergeContent processor to try and merge the two required flowfiles on a common ident attribute value  which I have extracted from each log files earlier, I have also extracted some other attributes from the flowfiles at this point, and as everything I am interested in these attributes I don't mind what happens with the content of the flowfiiles. When I step through the flow all is fine and works as I expect, however when I run it at pace and log files are coming in for multiple transactions at the same time the merge fails on most occasions. 
> 
> My mergecontent settings are:Merge Strategy                        Bin Packing AlgorithmMerge Format                          Binary ConcatenationAttribute Strategy                    Keep all Unique atttributesCorrelation Attribute Name         ${import.ident}Metadata Strategy                    Ignore MeatdataMinimum No Of Entries            2Maximum No Of Enteries            2Max bin age                             1 minutes
> All the other properties are at default.
> Have I not set something correctly or is there a simpler way of merging the attributes from two flowfiles onto one flowfile?
> Many thanksDave