You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jianan Zhang <wi...@gmail.com> on 2019/01/04 10:16:04 UTC

A question about [MergeContent] processor

Hi all,
I have a job consist of following steps: first consuming data from kafka,
and then packing data every 5 minutes into one file, finally put the packed
file into hdfs.
I use the [MergeContent] processor to accomplish the “packing” step. The
properties of MergeContent I configured is list below:

----------------------
Merge Strategy: Bin-Packing Algorithm
Merge Format: Binary Concatenation
Attribute Strategy: Keep Only Common Attributes
Correlation Attribute Name: No value set
Metadata Strategy: Do Not Merge Uncommon Metadata
Minimum Number of Entries: 1
Maximum Number of Entries: 999999999
Minimum Group Size: 255 MB
Maximum Group Size:No value set
Max Bin Age: 5 minutes
Maximum number of Bins: 1
----------------------

I found the behavior of the MergeContent processor is very uncontrollable.
There are serveral workflows running on the nifi with the same
configuration of MergeContent processor, some workflows can packing the
data every 5 minutes into one file correctly, but some others can’t. It
even happened that some MergeContent processor generate one flowfile per
record.

I am wondering if I misunderstanding the machanism of MergeContent
processor.

An newbie of nifi, please help me.

Thanks!

Re: A question about [MergeContent] processor

Posted by Bryan Bende <bb...@gmail.com>.
Hello,

The last release (1.8.0) introduced a new feature called load balanced
connections which can be used to converge data from all nodes down to one
node. You would make the connection right before merge content load balance
to a single node.

Thanks,

Bryan

On Tue, Jan 15, 2019 at 11:17 PM Jianan Zhang <wi...@gmail.com>
wrote:

> Hello Mike,
> This link helps a lot, and I found another cause for the problems I met.
> I am running my job on a cluster of three nodes, and the MergeContent
> processor's "Scheduling--Execution" field is set to "all nodes", that makes
> the MergeContent processor will randomly (in my point of view) assigned to
> some nodes, if it happens to just one node executing the merge job, then I
> can get what I want: every 5 min produce one merged file, otherwise, I will
> get several output file.
> I know that there is a "primary node" option, but I can't allow all the
> merge job running on the primary node (because of the load balancing
> issue). If nifi can support a "single node"-like option, or there exists
> some solution to handling this.
>
> Thanks for help,
> Jianan
>
> On Fri, Jan 4, 2019 at 11:19 PM Michael Moser <mo...@gmail.com> wrote:
>
>> The inner workings of MergeContent is certainly a FAQ.  This message [1]
>> to the users list from a long time ago may help.  I think it's still
>> accurate.
>>
>> [1] -
>> https://lists.apache.org/thread.html/5ab5d9d0bcd0eef8ace391d00f5f5678427bee4b2fbf1e48d78ea8c8@1445464430@%3Cusers.nifi.apache.org%3E
>>
>> Regards,
>> -- Mike
>>
>>
>> On Fri, Jan 4, 2019 at 6:57 AM <Jo...@swisscom.com> wrote:
>>
>>> Hi Jianan
>>>
>>> I just say that as soon as “Minimum Number of Entries” is reached the
>>> flow can be flushed out,  and further if the minimum number isn’t reached I
>>> would expect that the “Max Bin Age” takes place. Have you tried that?
>>>
>>> Cheers Josef
>>>
>>>
>>>
>>>
>>>
>>> *From: *Jianan Zhang <wi...@gmail.com>
>>> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
>>> *Date: *Friday, 4 January 2019 at 12:46
>>> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
>>> *Subject: *Re: A question about [MergeContent] processor
>>>
>>>
>>>
>>> Hi Josef,
>>>
>>>
>>>
>>> Thanks for reply. In my opinion the “Minimum Number of Entries” is
>>> should not and can not stronger than the “Max Bin Age”. Suppose I have only
>>> ONE flowfile from datasource put into MergeContent processor, and I set
>>> "Minimum Number of Entries" = 2, then this ONE flowfile will never coming
>>> out from nifi, even if it reach the deadline of bin. This is very easy lead
>>> to dead lock.
>>>
>>>
>>>
>>> And I don't know how to use the “Merge Strategy: Defragment” to merge
>>> the flowfile from kafka, I really don't know the speed the producer produce
>>> the messge.
>>>
>>>
>>>
>>> Jianan Zhang
>>>
>>>
>>>
>>> <Jo...@swisscom.com> 于2019年1月4日周五 下午6:43写道:
>>>
>>> Hi Jianan
>>>
>>>
>>>
>>> As you have “Minimum Number of Entries: 1” it is normal that you can see
>>> merges with only one flowfile. In my opinion the “Minimum Number of
>>> Entries” is stronger than the “Max Bin Age” (first is written bold and
>>> second not). Additionally it is called “Max Bin Age” and not “Bin Age”. So
>>> as soon as you reach at least 1 flowfile it could be pushed out. However,
>>> in my opinion the documentation for “Max Bin Age” is to unspecific (when
>>> does it really takes place?), only the developers know exactly the function
>>> behind it. Would be great to get more information here…
>>>
>>>
>>>
>>> Just my 2 cents. Whenever possible try to use “Merge Strategy:
>>> Defragment” instead of the current one, but this is working only if it is
>>> predictable how many flowfiles you would like to merge. With this strategy
>>> the max bin age makes fully sense and works as expected.
>>>
>>>
>>>
>>> Cheers Josef
>>>
>>>
>>>
>>>
>>>
>>> *From: *Jianan Zhang <wi...@gmail.com>
>>> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
>>> *Date: *Friday, 4 January 2019 at 11:16
>>> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
>>> *Subject: *A question about [MergeContent] processor
>>>
>>>
>>>
>>> Hi all,
>>>
>>> I have a job consist of following steps: first consuming data from
>>> kafka, and then packing data every 5 minutes into one file, finally put the
>>> packed file into hdfs.
>>>
>>> I use the [MergeContent] processor to accomplish the “packing” step. The
>>> properties of MergeContent I configured is list below:
>>>
>>>
>>>
>>> ----------------------
>>>
>>> Merge Strategy: Bin-Packing Algorithm
>>>
>>> Merge Format: Binary Concatenation
>>>
>>> Attribute Strategy: Keep Only Common Attributes
>>>
>>> Correlation Attribute Name: No value set
>>>
>>> Metadata Strategy: Do Not Merge Uncommon Metadata
>>>
>>> Minimum Number of Entries: 1
>>>
>>> Maximum Number of Entries: 999999999
>>>
>>> Minimum Group Size: 255 MB
>>>
>>> Maximum Group Size:No value set
>>>
>>> Max Bin Age: 5 minutes
>>>
>>> Maximum number of Bins: 1
>>>
>>> ----------------------
>>>
>>>
>>>
>>> I found the behavior of the MergeContent processor is very
>>> uncontrollable. There are serveral workflows running on the nifi with the
>>> same configuration of MergeContent processor, some workflows can packing
>>> the data every 5 minutes into one file correctly, but some others can’t. It
>>> even happened that some MergeContent processor generate one flowfile per
>>> record.
>>>
>>>
>>>
>>> I am wondering if I misunderstanding the machanism of MergeContent
>>> processor.
>>>
>>>
>>>
>>> An newbie of nifi, please help me.
>>>
>>>
>>>
>>> Thanks!
>>>
>>> --
Sent from Gmail Mobile

Re: A question about [MergeContent] processor

Posted by Jianan Zhang <wi...@gmail.com>.
Hello Mike,
This link helps a lot, and I found another cause for the problems I met.
I am running my job on a cluster of three nodes, and the MergeContent
processor's "Scheduling--Execution" field is set to "all nodes", that makes
the MergeContent processor will randomly (in my point of view) assigned to
some nodes, if it happens to just one node executing the merge job, then I
can get what I want: every 5 min produce one merged file, otherwise, I will
get several output file.
I know that there is a "primary node" option, but I can't allow all the
merge job running on the primary node (because of the load balancing
issue). If nifi can support a "single node"-like option, or there exists
some solution to handling this.

Thanks for help,
Jianan

On Fri, Jan 4, 2019 at 11:19 PM Michael Moser <mo...@gmail.com> wrote:

> The inner workings of MergeContent is certainly a FAQ.  This message [1]
> to the users list from a long time ago may help.  I think it's still
> accurate.
>
> [1] -
> https://lists.apache.org/thread.html/5ab5d9d0bcd0eef8ace391d00f5f5678427bee4b2fbf1e48d78ea8c8@1445464430@%3Cusers.nifi.apache.org%3E
>
> Regards,
> -- Mike
>
>
> On Fri, Jan 4, 2019 at 6:57 AM <Jo...@swisscom.com> wrote:
>
>> Hi Jianan
>>
>> I just say that as soon as “Minimum Number of Entries” is reached the
>> flow can be flushed out,  and further if the minimum number isn’t reached I
>> would expect that the “Max Bin Age” takes place. Have you tried that?
>>
>> Cheers Josef
>>
>>
>>
>>
>>
>> *From: *Jianan Zhang <wi...@gmail.com>
>> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
>> *Date: *Friday, 4 January 2019 at 12:46
>> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
>> *Subject: *Re: A question about [MergeContent] processor
>>
>>
>>
>> Hi Josef,
>>
>>
>>
>> Thanks for reply. In my opinion the “Minimum Number of Entries” is should
>> not and can not stronger than the “Max Bin Age”. Suppose I have only ONE
>> flowfile from datasource put into MergeContent processor, and I set
>> "Minimum Number of Entries" = 2, then this ONE flowfile will never coming
>> out from nifi, even if it reach the deadline of bin. This is very easy lead
>> to dead lock.
>>
>>
>>
>> And I don't know how to use the “Merge Strategy: Defragment” to merge the
>> flowfile from kafka, I really don't know the speed the producer produce the
>> messge.
>>
>>
>>
>> Jianan Zhang
>>
>>
>>
>> <Jo...@swisscom.com> 于2019年1月4日周五 下午6:43写道:
>>
>> Hi Jianan
>>
>>
>>
>> As you have “Minimum Number of Entries: 1” it is normal that you can see
>> merges with only one flowfile. In my opinion the “Minimum Number of
>> Entries” is stronger than the “Max Bin Age” (first is written bold and
>> second not). Additionally it is called “Max Bin Age” and not “Bin Age”. So
>> as soon as you reach at least 1 flowfile it could be pushed out. However,
>> in my opinion the documentation for “Max Bin Age” is to unspecific (when
>> does it really takes place?), only the developers know exactly the function
>> behind it. Would be great to get more information here…
>>
>>
>>
>> Just my 2 cents. Whenever possible try to use “Merge Strategy:
>> Defragment” instead of the current one, but this is working only if it is
>> predictable how many flowfiles you would like to merge. With this strategy
>> the max bin age makes fully sense and works as expected.
>>
>>
>>
>> Cheers Josef
>>
>>
>>
>>
>>
>> *From: *Jianan Zhang <wi...@gmail.com>
>> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
>> *Date: *Friday, 4 January 2019 at 11:16
>> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
>> *Subject: *A question about [MergeContent] processor
>>
>>
>>
>> Hi all,
>>
>> I have a job consist of following steps: first consuming data from kafka,
>> and then packing data every 5 minutes into one file, finally put the packed
>> file into hdfs.
>>
>> I use the [MergeContent] processor to accomplish the “packing” step. The
>> properties of MergeContent I configured is list below:
>>
>>
>>
>> ----------------------
>>
>> Merge Strategy: Bin-Packing Algorithm
>>
>> Merge Format: Binary Concatenation
>>
>> Attribute Strategy: Keep Only Common Attributes
>>
>> Correlation Attribute Name: No value set
>>
>> Metadata Strategy: Do Not Merge Uncommon Metadata
>>
>> Minimum Number of Entries: 1
>>
>> Maximum Number of Entries: 999999999
>>
>> Minimum Group Size: 255 MB
>>
>> Maximum Group Size:No value set
>>
>> Max Bin Age: 5 minutes
>>
>> Maximum number of Bins: 1
>>
>> ----------------------
>>
>>
>>
>> I found the behavior of the MergeContent processor is very
>> uncontrollable. There are serveral workflows running on the nifi with the
>> same configuration of MergeContent processor, some workflows can packing
>> the data every 5 minutes into one file correctly, but some others can’t. It
>> even happened that some MergeContent processor generate one flowfile per
>> record.
>>
>>
>>
>> I am wondering if I misunderstanding the machanism of MergeContent
>> processor.
>>
>>
>>
>> An newbie of nifi, please help me.
>>
>>
>>
>> Thanks!
>>
>>

Re: A question about [MergeContent] processor

Posted by Michael Moser <mo...@gmail.com>.
The inner workings of MergeContent is certainly a FAQ.  This message [1] to
the users list from a long time ago may help.  I think it's still accurate.

[1] -
https://lists.apache.org/thread.html/5ab5d9d0bcd0eef8ace391d00f5f5678427bee4b2fbf1e48d78ea8c8@1445464430@%3Cusers.nifi.apache.org%3E

Regards,
-- Mike


On Fri, Jan 4, 2019 at 6:57 AM <Jo...@swisscom.com> wrote:

> Hi Jianan
>
> I just say that as soon as “Minimum Number of Entries” is reached the flow
> can be flushed out,  and further if the minimum number isn’t reached I
> would expect that the “Max Bin Age” takes place. Have you tried that?
>
> Cheers Josef
>
>
>
>
>
> *From: *Jianan Zhang <wi...@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Date: *Friday, 4 January 2019 at 12:46
> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Subject: *Re: A question about [MergeContent] processor
>
>
>
> Hi Josef,
>
>
>
> Thanks for reply. In my opinion the “Minimum Number of Entries” is should
> not and can not stronger than the “Max Bin Age”. Suppose I have only ONE
> flowfile from datasource put into MergeContent processor, and I set
> "Minimum Number of Entries" = 2, then this ONE flowfile will never coming
> out from nifi, even if it reach the deadline of bin. This is very easy lead
> to dead lock.
>
>
>
> And I don't know how to use the “Merge Strategy: Defragment” to merge the
> flowfile from kafka, I really don't know the speed the producer produce the
> messge.
>
>
>
> Jianan Zhang
>
>
>
> <Jo...@swisscom.com> 于2019年1月4日周五 下午6:43写道:
>
> Hi Jianan
>
>
>
> As you have “Minimum Number of Entries: 1” it is normal that you can see
> merges with only one flowfile. In my opinion the “Minimum Number of
> Entries” is stronger than the “Max Bin Age” (first is written bold and
> second not). Additionally it is called “Max Bin Age” and not “Bin Age”. So
> as soon as you reach at least 1 flowfile it could be pushed out. However,
> in my opinion the documentation for “Max Bin Age” is to unspecific (when
> does it really takes place?), only the developers know exactly the function
> behind it. Would be great to get more information here…
>
>
>
> Just my 2 cents. Whenever possible try to use “Merge Strategy: Defragment”
> instead of the current one, but this is working only if it is predictable
> how many flowfiles you would like to merge. With this strategy the max bin
> age makes fully sense and works as expected.
>
>
>
> Cheers Josef
>
>
>
>
>
> *From: *Jianan Zhang <wi...@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Date: *Friday, 4 January 2019 at 11:16
> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Subject: *A question about [MergeContent] processor
>
>
>
> Hi all,
>
> I have a job consist of following steps: first consuming data from kafka,
> and then packing data every 5 minutes into one file, finally put the packed
> file into hdfs.
>
> I use the [MergeContent] processor to accomplish the “packing” step. The
> properties of MergeContent I configured is list below:
>
>
>
> ----------------------
>
> Merge Strategy: Bin-Packing Algorithm
>
> Merge Format: Binary Concatenation
>
> Attribute Strategy: Keep Only Common Attributes
>
> Correlation Attribute Name: No value set
>
> Metadata Strategy: Do Not Merge Uncommon Metadata
>
> Minimum Number of Entries: 1
>
> Maximum Number of Entries: 999999999
>
> Minimum Group Size: 255 MB
>
> Maximum Group Size:No value set
>
> Max Bin Age: 5 minutes
>
> Maximum number of Bins: 1
>
> ----------------------
>
>
>
> I found the behavior of the MergeContent processor is very uncontrollable.
> There are serveral workflows running on the nifi with the same
> configuration of MergeContent processor, some workflows can packing the
> data every 5 minutes into one file correctly, but some others can’t. It
> even happened that some MergeContent processor generate one flowfile per
> record.
>
>
>
> I am wondering if I misunderstanding the machanism of MergeContent
> processor.
>
>
>
> An newbie of nifi, please help me.
>
>
>
> Thanks!
>
>

Re: A question about [MergeContent] processor

Posted by Jo...@swisscom.com.
Hi Jianan
I just say that as soon as “Minimum Number of Entries” is reached the flow can be flushed out,  and further if the minimum number isn’t reached I would expect that the “Max Bin Age” takes place. Have you tried that?
Cheers Josef


From: Jianan Zhang <wi...@gmail.com>
Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
Date: Friday, 4 January 2019 at 12:46
To: "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: Re: A question about [MergeContent] processor

Hi Josef,

Thanks for reply. In my opinion the “Minimum Number of Entries” is should not and can not stronger than the “Max Bin Age”. Suppose I have only ONE flowfile from datasource put into MergeContent processor, and I set "Minimum Number of Entries" = 2, then this ONE flowfile will never coming out from nifi, even if it reach the deadline of bin. This is very easy lead to dead lock.

And I don't know how to use the “Merge Strategy: Defragment” to merge the flowfile from kafka, I really don't know the speed the producer produce the messge.

Jianan Zhang

<Jo...@swisscom.com>> 于2019年1月4日周五 下午6:43写道:
Hi Jianan

As you have “Minimum Number of Entries: 1” it is normal that you can see merges with only one flowfile. In my opinion the “Minimum Number of Entries” is stronger than the “Max Bin Age” (first is written bold and second not). Additionally it is called “Max Bin Age” and not “Bin Age”. So as soon as you reach at least 1 flowfile it could be pushed out. However, in my opinion the documentation for “Max Bin Age” is to unspecific (when does it really takes place?), only the developers know exactly the function behind it. Would be great to get more information here…

Just my 2 cents. Whenever possible try to use “Merge Strategy: Defragment” instead of the current one, but this is working only if it is predictable how many flowfiles you would like to merge. With this strategy the max bin age makes fully sense and works as expected.

Cheers Josef


From: Jianan Zhang <wi...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Friday, 4 January 2019 at 11:16
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: A question about [MergeContent] processor

Hi all,
I have a job consist of following steps: first consuming data from kafka, and then packing data every 5 minutes into one file, finally put the packed file into hdfs.
I use the [MergeContent] processor to accomplish the “packing” step. The properties of MergeContent I configured is list below:

----------------------
Merge Strategy: Bin-Packing Algorithm
Merge Format: Binary Concatenation
Attribute Strategy: Keep Only Common Attributes
Correlation Attribute Name: No value set
Metadata Strategy: Do Not Merge Uncommon Metadata
Minimum Number of Entries: 1
Maximum Number of Entries: 999999999
Minimum Group Size: 255 MB
Maximum Group Size:No value set
Max Bin Age: 5 minutes
Maximum number of Bins: 1
----------------------

I found the behavior of the MergeContent processor is very uncontrollable. There are serveral workflows running on the nifi with the same configuration of MergeContent processor, some workflows can packing the data every 5 minutes into one file correctly, but some others can’t. It even happened that some MergeContent processor generate one flowfile per record.

I am wondering if I misunderstanding the machanism of MergeContent processor.

An newbie of nifi, please help me.

Thanks!

Re: A question about [MergeContent] processor

Posted by Jianan Zhang <wi...@gmail.com>.
Hi Josef,

Thanks for reply. In my opinion the “Minimum Number of Entries” is should
not and can not stronger than the “Max Bin Age”. Suppose I have only ONE
flowfile from datasource put into MergeContent processor, and I set
"Minimum Number of Entries" = 2, then this ONE flowfile will never coming
out from nifi, even if it reach the deadline of bin. This is very easy lead
to dead lock.

And I don't know how to use the “Merge Strategy: Defragment” to merge the
flowfile from kafka, I really don't know the speed the producer produce the
messge.

Jianan Zhang

<Jo...@swisscom.com> 于2019年1月4日周五 下午6:43写道:

> Hi Jianan
>
>
>
> As you have “Minimum Number of Entries: 1” it is normal that you can see
> merges with only one flowfile. In my opinion the “Minimum Number of
> Entries” is stronger than the “Max Bin Age” (first is written bold and
> second not). Additionally it is called “Max Bin Age” and not “Bin Age”. So
> as soon as you reach at least 1 flowfile it could be pushed out. However,
> in my opinion the documentation for “Max Bin Age” is to unspecific (when
> does it really takes place?), only the developers know exactly the function
> behind it. Would be great to get more information here…
>
>
>
> Just my 2 cents. Whenever possible try to use “Merge Strategy: Defragment”
> instead of the current one, but this is working only if it is predictable
> how many flowfiles you would like to merge. With this strategy the max bin
> age makes fully sense and works as expected.
>
>
>
> Cheers Josef
>
>
>
>
>
> *From: *Jianan Zhang <wi...@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Date: *Friday, 4 January 2019 at 11:16
> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Subject: *A question about [MergeContent] processor
>
>
>
> Hi all,
>
> I have a job consist of following steps: first consuming data from kafka,
> and then packing data every 5 minutes into one file, finally put the packed
> file into hdfs.
>
> I use the [MergeContent] processor to accomplish the “packing” step. The
> properties of MergeContent I configured is list below:
>
>
>
> ----------------------
>
> Merge Strategy: Bin-Packing Algorithm
>
> Merge Format: Binary Concatenation
>
> Attribute Strategy: Keep Only Common Attributes
>
> Correlation Attribute Name: No value set
>
> Metadata Strategy: Do Not Merge Uncommon Metadata
>
> Minimum Number of Entries: 1
>
> Maximum Number of Entries: 999999999
>
> Minimum Group Size: 255 MB
>
> Maximum Group Size:No value set
>
> Max Bin Age: 5 minutes
>
> Maximum number of Bins: 1
>
> ----------------------
>
>
>
> I found the behavior of the MergeContent processor is very uncontrollable.
> There are serveral workflows running on the nifi with the same
> configuration of MergeContent processor, some workflows can packing the
> data every 5 minutes into one file correctly, but some others can’t. It
> even happened that some MergeContent processor generate one flowfile per
> record.
>
>
>
> I am wondering if I misunderstanding the machanism of MergeContent
> processor.
>
>
>
> An newbie of nifi, please help me.
>
>
>
> Thanks!
>

Re: A question about [MergeContent] processor

Posted by Jo...@swisscom.com.
Hi Jianan

As you have “Minimum Number of Entries: 1” it is normal that you can see merges with only one flowfile. In my opinion the “Minimum Number of Entries” is stronger than the “Max Bin Age” (first is written bold and second not). Additionally it is called “Max Bin Age” and not “Bin Age”. So as soon as you reach at least 1 flowfile it could be pushed out. However, in my opinion the documentation for “Max Bin Age” is to unspecific (when does it really takes place?), only the developers know exactly the function behind it. Would be great to get more information here…

Just my 2 cents. Whenever possible try to use “Merge Strategy: Defragment” instead of the current one, but this is working only if it is predictable how many flowfiles you would like to merge. With this strategy the max bin age makes fully sense and works as expected.

Cheers Josef


From: Jianan Zhang <wi...@gmail.com>
Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
Date: Friday, 4 January 2019 at 11:16
To: "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: A question about [MergeContent] processor

Hi all,
I have a job consist of following steps: first consuming data from kafka, and then packing data every 5 minutes into one file, finally put the packed file into hdfs.
I use the [MergeContent] processor to accomplish the “packing” step. The properties of MergeContent I configured is list below:

----------------------
Merge Strategy: Bin-Packing Algorithm
Merge Format: Binary Concatenation
Attribute Strategy: Keep Only Common Attributes
Correlation Attribute Name: No value set
Metadata Strategy: Do Not Merge Uncommon Metadata
Minimum Number of Entries: 1
Maximum Number of Entries: 999999999
Minimum Group Size: 255 MB
Maximum Group Size:No value set
Max Bin Age: 5 minutes
Maximum number of Bins: 1
----------------------

I found the behavior of the MergeContent processor is very uncontrollable. There are serveral workflows running on the nifi with the same configuration of MergeContent processor, some workflows can packing the data every 5 minutes into one file correctly, but some others can’t. It even happened that some MergeContent processor generate one flowfile per record.

I am wondering if I misunderstanding the machanism of MergeContent processor.

An newbie of nifi, please help me.

Thanks!