You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Igor Kravzov <ig...@gmail.com> on 2016/05/31 16:07:18 UTC

MergeContent questions

There are 2 configuration properties: Maximum Group Size and Maximum Number
of entries.
Are these mutually exclusive? I want to create a file to store in HDFS but
limit size at 64MB as HDFS block (or should I go bigger?).

Max Bin Age property
Since content can be in different length and and not know when max size
will be reached, whar role it will play?

Re: MergeContent questions

Posted by Igor Kravzov <ig...@gmail.com>.
Thank you Mark.

On Tue, May 31, 2016 at 1:02 PM, Mark Payne <ma...@hotmail.com> wrote:

> Igor,
>
> MergeContent will consider a 'bin' full when any one of those conditions
> hit. I.e., if you set:
>
> Max Group Size = 64 MB
> Max Number of Entries = 100
> Max Bin Age = 5 mins
>
> Then you will get a merged bin whenever a bin hits 64 MB, regardless of
> how long its been or how many entires there are.
> Similarly, if you have 100 entries, then you'll get a bin even if the data
> is only 1 MB total.
> Also, if you go 5 minutes without reaching either of those thresholds, the
> 5 minute threshold will cause the bin to be created,
> regardless of how many FlowFiles there are.
>
> A common pattern for sending to HDFS is to set the Maximum Bin Age to some
> threshold (5 mins or 1 hour or whatever makes
> sense for you) and the Min Group Size to 64 MB and Max Group Size to 128
> MB and not set anything for the Maximum Number
> of Entries. In this case, you will get bins of 64 - 128 MB most of the
> time, but if the data volume is low for a while, you'll still get some
> data flowing into HDFS because the of the Max Bin Age.
>
> Thanks
> -Markk
>
> > On May 31, 2016, at 12:07 PM, Igor Kravzov <ig...@gmail.com>
> wrote:
> >
> > There are 2 configuration properties: Maximum Group Size and Maximum
> Number of entries.
> > Are these mutually exclusive? I want to create a file to store in HDFS
> but limit size at 64MB as HDFS block (or should I go bigger?).
> >
> > Max Bin Age property
> > Since content can be in different length and and not know when max size
> will be reached, whar role it will play?
>
>

Re: MergeContent questions

Posted by Mark Payne <ma...@hotmail.com>.
Igor,

MergeContent will consider a 'bin' full when any one of those conditions hit. I.e., if you set:

Max Group Size = 64 MB
Max Number of Entries = 100
Max Bin Age = 5 mins

Then you will get a merged bin whenever a bin hits 64 MB, regardless of how long its been or how many entires there are.
Similarly, if you have 100 entries, then you'll get a bin even if the data is only 1 MB total.
Also, if you go 5 minutes without reaching either of those thresholds, the 5 minute threshold will cause the bin to be created,
regardless of how many FlowFiles there are.

A common pattern for sending to HDFS is to set the Maximum Bin Age to some threshold (5 mins or 1 hour or whatever makes
sense for you) and the Min Group Size to 64 MB and Max Group Size to 128 MB and not set anything for the Maximum Number
of Entries. In this case, you will get bins of 64 - 128 MB most of the time, but if the data volume is low for a while, you'll still get some
data flowing into HDFS because the of the Max Bin Age.

Thanks
-Markk

> On May 31, 2016, at 12:07 PM, Igor Kravzov <ig...@gmail.com> wrote:
> 
> There are 2 configuration properties: Maximum Group Size and Maximum Number of entries.
> Are these mutually exclusive? I want to create a file to store in HDFS but limit size at 64MB as HDFS block (or should I go bigger?).
> 
> Max Bin Age property
> Since content can be in different length and and not know when max size will be reached, whar role it will play?