You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by sam <sa...@streamhub.co.uk> on 2017/01/25 11:40:29 UTC

Merging content with n events with MergeContent / bin packing algorithm

Hi!

I am reading a file of events, splitting the file into events, doing some
transformations, merging the file back and posting events in groups of 100.
It think this most common use case for data flow tools. 

I facing problems in last part of merging the events back. I am using
MergeContent processor with following configuration

Screen_Shot_2017-01-25_at_13.png
<http://apache-nifi-developer-list.39713.n7.nabble.com/file/n14514/Screen_Shot_2017-01-25_at_13.png>  

Problem is am getting data in small chunks 1-20 events each but I want to
group them to 100 events (unless the time expires which is like 2 mins) and
post them all together. Not sure how I can configure bin packing algorithm
or is there any other processor I should use.

Thank you



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Merging-content-with-n-events-with-MergeContent-bin-packing-algorithm-tp14514.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Merging content with n events with MergeContent / bin packing algorithm

Posted by Lee Laim <le...@gmail.com>.
Sam,
I'll throw out a few options that may help.

1. Split the file of events into groups of 100 first, then split again to single events for transformation.  Set the merge strategy to defragment.  The merge will wait for the initial 100 events before sending them on.  This will guarantee groups of 100 or it will route to failure.

2. In your current flow, You can implement a series of bin-packing merge processors, merge them via 5 groups of 20.  The second merge would run slightly slower. 

3. In your current flow, you can increase the max bin age to sync up with the ball park duration of 100 incoming flow files.  In your case, the transform is not fast enough to fill 100 orders in 2 minutes.   You could also remove the max age property completely, but the tail of the batch might not merge for a while. 

I'd go with option 1.

Thanks,
Lee





> On Jan 25, 2017, at 4:40 AM, sam <sa...@streamhub.co.uk> wrote:
> 
> Hi!
> 
> I am reading a file of events, splitting the file into events, doing some
> transformations, merging the file back and posting events in groups of 100.
> It think this most common use case for data flow tools. 
> 
> I facing problems in last part of merging the events back. I am using
> MergeContent processor with following configuration
> 
> Screen_Shot_2017-01-25_at_13.png
> <http://apache-nifi-developer-list.39713.n7.nabble.com/file/n14514/Screen_Shot_2017-01-25_at_13.png>  
> 
> Problem is am getting data in small chunks 1-20 events each but I want to
> group them to 100 events (unless the time expires which is like 2 mins) and
> post them all together. Not sure how I can configure bin packing algorithm
> or is there any other processor I should use.
> 
> Thank you
> 
> 
> 
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Merging-content-with-n-events-with-MergeContent-bin-packing-algorithm-tp14514.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.