You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Michael Xu <mi...@gmail.com> on 2016/08/10 14:54:14 UTC

MergeContent with varying number of entries in bins.

I am sending into the MergeContent processor, payloads that each belong in
a certain group of files in some data I'm working with. Each payload has an
attribute called "groupId" which is an identification number for a
particular group of files. This is the attribute I'm using to bin each
incoming flowfile, and have set the Correlation Attribute Name to groupId.



The problem I'm dealing with right now is that each groupId has a varying
number of files associated with it. As such, I'm not sure how in NiFi to
detect when the MergeContent processor has received all files for a
particular groupId, and once done, release the bin.



Any help with this problem is appreciated, thanks!

Re: MergeContent with varying number of entries in bins.

Posted by Joe Witt <jo...@gmail.com>.
I like what Mark suggests but if you're not sure how many fragments
there will be during the flow for a given group then I think you could
keep doing what you were but set it to some really high min count and
min size and then have a time based kick out that waits long enough
where you're confident all bits would have come in.

This is a great use case and one that requires a lot of though on how
best to tackle it!

On Wed, Aug 10, 2016 at 11:00 AM, Mark Payne <ma...@hotmail.com> wrote:
> Michael,
>
> In the MergeContent processor, you can set the "Merge Strategy" to
> "Defragment." This will tell Merge Content to
> determine its bin thresholds based on the following FlowFile attributes:
>
> fragment.identifier
> fragment.index
> fragment.count
>
> So you'd need to set those 3 attributes on each of the FlowFiles. Rather
> than using the Correlation Attribute Name,
> you'd set the "fragment.identifier" attribute (you can use UpdateAttribute
> to copy the value from the groupId attribute
> to the 'fragment.identifier' attribute if you need to).
>
> The "fragment.index" attribute tells MergeContent how to order the different
> FlowFiles in the merged bin.
>
> The "fragment.count" attribute tells MergeContent how many FlowFiles go this
> bin.
>
> Does that all make sense?
>
> Thanks
> -Mark
>
>
> On Aug 10, 2016, at 10:54 AM, Michael Xu <mi...@gmail.com> wrote:
>
> I am sending into the MergeContent processor, payloads that each belong in a
> certain group of files in some data I'm working with. Each payload has an
> attribute called "groupId" which is an identification number for a
> particular group of files. This is the attribute I'm using to bin each
> incoming flowfile, and have set the Correlation Attribute Name to groupId.
>
>
>
> The problem I'm dealing with right now is that each groupId has a varying
> number of files associated with it. As such, I'm not sure how in NiFi to
> detect when the MergeContent processor has received all files for a
> particular groupId, and once done, release the bin.
>
>
>
> Any help with this problem is appreciated, thanks!
>
>

Re: MergeContent with varying number of entries in bins.

Posted by Michael Xu <mi...@gmail.com>.
Matt,
Thank you for those links, they should give me a good starting point.

Michael


On Wed, Aug 10, 2016 at 4:21 PM, Matt Burgess <ma...@gmail.com> wrote:

> Michael,
>
> There are a handful of examples of ExecuteScript using Javascript
> and/or Jython, on my blog (http://funnifi.blogspot.com) and other
> locations:
>
> Javascript:
> http://funnifi.blogspot.com/2016/03/executescript-json-to-
> json-revisited.html
> https://mail-archives.apache.org/mod_mbox/nifi-users/
> 201603.mbox/%3CCALhfc-WwqmZ7RMkRt2qxfgDnH1feHdM0o_
> hDzTgWWfgn+Z07bw@mail.gmail.com%3E
>
> Jython:
> http://funnifi.blogspot.com/2016/03/executescript-json-to-
> json-revisited_14.html
> https://community.hortonworks.com/articles/35568/python-
> script-in-nifi.html
> https://mail-archives.apache.org/mod_mbox/nifi-users/
> 201602.mbox/%3CCAEV8zdWm7_E-qC1KKHV8eW8CP0HZaEkwjC=
> RdVtQnj+i85cUZw@mail.gmail.com%3E
>
> I'm happy to help get you going with a scripted solution if you like.
>
> Regards,
> Matt
>
> On Wed, Aug 10, 2016 at 4:12 PM, Michael Xu <mi...@gmail.com> wrote:
> > Mark,
> > The expression you suggested seems to be working. I don't think the file
> > names I'm working with will have a comma, so this should be a good
> solution.
> >
> > Joe,
> > Are you referring to the ExecuteScript processor? That looks like a good
> > alternative. However, I couldn't find much information for it in the
> > documentation (https://nifi.apache.org/docs.html). Is there anywhere I
> can
> > find simple examples, especially in Javascript or Python?
> >
> > Thank you,
> > Michael
> >
> > On Wed, Aug 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com> wrote:
> >>
> >> Probably a good idea to use a script in a script processor to extract
> >> the details needed about the splits then feed those results into merge
> >> attribute as you suggested.  This would be the safest/cleanest.
> >>
> >> On Wed, Aug 10, 2016 at 3:42 PM, Mark Payne <ma...@hotmail.com>
> wrote:
> >> > Michael,
> >> >
> >> > Well, sort of...
> >> >
> >> > You could use:
> >> > ${allDelineatedValues('${fileArray}', ','):count()}
> >> >
> >> > So that will split up the fileArray attribute by commas and then count
> >> > them.
> >> > The only issue is that if you were to have a filename with a comma in
> >> > it,
> >> > you'd get the wrong value. Given that your'e not likely to have a
> >> > filename
> >> > with a comma, you may be all right, but it's not really the "cleanest"
> >> > solution...
> >> >
> >> > The Expression language does allow you to evaluate JSONPath against an
> >> > attribute but JSONPath doesn't allow for the nice functions that you
> can
> >> > get
> >> > in XPath and similar.
> >> >
> >> > Anyone else have any better ideas?
> >> >
> >> >
> >> > On Aug 10, 2016, at 3:32 PM, Michael Xu <mi...@gmail.com>
> wrote:
> >> >
> >> > Hi Mark,
> >> >
> >> > Thanks for your response earlier. While trying to implement what you
> >> > suggested in your email, I came across the issue of updating
> >> > fragment.count
> >> > on a per-flowfile basis. I have another attribute called fileArray,
> >> > which is
> >> > a json-compatible array that contains all the files for a particular
> >> > groupId. As an example taken from the Bulletin:
> >> >
> >> > Key: 'fileArray'
> >> >         Value: '["file1.txt","file2.txt","file3.txt"]'
> >> >
> >> > Is it possible in UpdateAttribute to use the Expression Language to
> >> > return
> >> > the length of this array?
> >> >
> >> > Thanks for your help,
> >> > Michael
> >> >
> >> > On Wed, Aug 10, 2016 at 11:00 AM, Mark Payne <ma...@hotmail.com>
> >> > wrote:
> >> >>
> >> >> Michael,
> >> >>
> >> >> In the MergeContent processor, you can set the "Merge Strategy" to
> >> >> "Defragment." This will tell Merge Content to
> >> >> determine its bin thresholds based on the following FlowFile
> >> >> attributes:
> >> >>
> >> >> fragment.identifier
> >> >> fragment.index
> >> >> fragment.count
> >> >>
> >> >> So you'd need to set those 3 attributes on each of the FlowFiles.
> >> >> Rather
> >> >> than using the Correlation Attribute Name,
> >> >> you'd set the "fragment.identifier" attribute (you can use
> >> >> UpdateAttribute
> >> >> to copy the value from the groupId attribute
> >> >> to the 'fragment.identifier' attribute if you need to).
> >> >>
> >> >> The "fragment.index" attribute tells MergeContent how to order the
> >> >> different FlowFiles in the merged bin.
> >> >>
> >> >> The "fragment.count" attribute tells MergeContent how many FlowFiles
> go
> >> >> this bin.
> >> >>
> >> >> Does that all make sense?
> >> >>
> >> >> Thanks
> >> >> -Mark
> >> >>
> >> >>
> >> >> On Aug 10, 2016, at 10:54 AM, Michael Xu <mi...@gmail.com>
> wrote:
> >> >>
> >> >> I am sending into the MergeContent processor, payloads that each
> belong
> >> >> in
> >> >> a certain group of files in some data I'm working with. Each payload
> >> >> has an
> >> >> attribute called "groupId" which is an identification number for a
> >> >> particular group of files. This is the attribute I'm using to bin
> each
> >> >> incoming flowfile, and have set the Correlation Attribute Name to
> >> >> groupId.
> >> >>
> >> >>
> >> >>
> >> >> The problem I'm dealing with right now is that each groupId has a
> >> >> varying
> >> >> number of files associated with it. As such, I'm not sure how in NiFi
> >> >> to
> >> >> detect when the MergeContent processor has received all files for a
> >> >> particular groupId, and once done, release the bin.
> >> >>
> >> >>
> >> >>
> >> >> Any help with this problem is appreciated, thanks!
> >> >>
> >> >>
> >> >
> >> >
> >
> >
>

Re: MergeContent with varying number of entries in bins.

Posted by Matt Burgess <ma...@gmail.com>.
Michael,

There are a handful of examples of ExecuteScript using Javascript
and/or Jython, on my blog (http://funnifi.blogspot.com) and other
locations:

Javascript:
http://funnifi.blogspot.com/2016/03/executescript-json-to-json-revisited.html
https://mail-archives.apache.org/mod_mbox/nifi-users/201603.mbox/%3CCALhfc-WwqmZ7RMkRt2qxfgDnH1feHdM0o_hDzTgWWfgn+Z07bw@mail.gmail.com%3E

Jython:
http://funnifi.blogspot.com/2016/03/executescript-json-to-json-revisited_14.html
https://community.hortonworks.com/articles/35568/python-script-in-nifi.html
https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3CCAEV8zdWm7_E-qC1KKHV8eW8CP0HZaEkwjC=RdVtQnj+i85cUZw@mail.gmail.com%3E

I'm happy to help get you going with a scripted solution if you like.

Regards,
Matt

On Wed, Aug 10, 2016 at 4:12 PM, Michael Xu <mi...@gmail.com> wrote:
> Mark,
> The expression you suggested seems to be working. I don't think the file
> names I'm working with will have a comma, so this should be a good solution.
>
> Joe,
> Are you referring to the ExecuteScript processor? That looks like a good
> alternative. However, I couldn't find much information for it in the
> documentation (https://nifi.apache.org/docs.html). Is there anywhere I can
> find simple examples, especially in Javascript or Python?
>
> Thank you,
> Michael
>
> On Wed, Aug 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com> wrote:
>>
>> Probably a good idea to use a script in a script processor to extract
>> the details needed about the splits then feed those results into merge
>> attribute as you suggested.  This would be the safest/cleanest.
>>
>> On Wed, Aug 10, 2016 at 3:42 PM, Mark Payne <ma...@hotmail.com> wrote:
>> > Michael,
>> >
>> > Well, sort of...
>> >
>> > You could use:
>> > ${allDelineatedValues('${fileArray}', ','):count()}
>> >
>> > So that will split up the fileArray attribute by commas and then count
>> > them.
>> > The only issue is that if you were to have a filename with a comma in
>> > it,
>> > you'd get the wrong value. Given that your'e not likely to have a
>> > filename
>> > with a comma, you may be all right, but it's not really the "cleanest"
>> > solution...
>> >
>> > The Expression language does allow you to evaluate JSONPath against an
>> > attribute but JSONPath doesn't allow for the nice functions that you can
>> > get
>> > in XPath and similar.
>> >
>> > Anyone else have any better ideas?
>> >
>> >
>> > On Aug 10, 2016, at 3:32 PM, Michael Xu <mi...@gmail.com> wrote:
>> >
>> > Hi Mark,
>> >
>> > Thanks for your response earlier. While trying to implement what you
>> > suggested in your email, I came across the issue of updating
>> > fragment.count
>> > on a per-flowfile basis. I have another attribute called fileArray,
>> > which is
>> > a json-compatible array that contains all the files for a particular
>> > groupId. As an example taken from the Bulletin:
>> >
>> > Key: 'fileArray'
>> >         Value: '["file1.txt","file2.txt","file3.txt"]'
>> >
>> > Is it possible in UpdateAttribute to use the Expression Language to
>> > return
>> > the length of this array?
>> >
>> > Thanks for your help,
>> > Michael
>> >
>> > On Wed, Aug 10, 2016 at 11:00 AM, Mark Payne <ma...@hotmail.com>
>> > wrote:
>> >>
>> >> Michael,
>> >>
>> >> In the MergeContent processor, you can set the "Merge Strategy" to
>> >> "Defragment." This will tell Merge Content to
>> >> determine its bin thresholds based on the following FlowFile
>> >> attributes:
>> >>
>> >> fragment.identifier
>> >> fragment.index
>> >> fragment.count
>> >>
>> >> So you'd need to set those 3 attributes on each of the FlowFiles.
>> >> Rather
>> >> than using the Correlation Attribute Name,
>> >> you'd set the "fragment.identifier" attribute (you can use
>> >> UpdateAttribute
>> >> to copy the value from the groupId attribute
>> >> to the 'fragment.identifier' attribute if you need to).
>> >>
>> >> The "fragment.index" attribute tells MergeContent how to order the
>> >> different FlowFiles in the merged bin.
>> >>
>> >> The "fragment.count" attribute tells MergeContent how many FlowFiles go
>> >> this bin.
>> >>
>> >> Does that all make sense?
>> >>
>> >> Thanks
>> >> -Mark
>> >>
>> >>
>> >> On Aug 10, 2016, at 10:54 AM, Michael Xu <mi...@gmail.com> wrote:
>> >>
>> >> I am sending into the MergeContent processor, payloads that each belong
>> >> in
>> >> a certain group of files in some data I'm working with. Each payload
>> >> has an
>> >> attribute called "groupId" which is an identification number for a
>> >> particular group of files. This is the attribute I'm using to bin each
>> >> incoming flowfile, and have set the Correlation Attribute Name to
>> >> groupId.
>> >>
>> >>
>> >>
>> >> The problem I'm dealing with right now is that each groupId has a
>> >> varying
>> >> number of files associated with it. As such, I'm not sure how in NiFi
>> >> to
>> >> detect when the MergeContent processor has received all files for a
>> >> particular groupId, and once done, release the bin.
>> >>
>> >>
>> >>
>> >> Any help with this problem is appreciated, thanks!
>> >>
>> >>
>> >
>> >
>
>

Re: MergeContent with varying number of entries in bins.

Posted by Michael Xu <mi...@gmail.com>.
Mark,
The expression you suggested seems to be working. I don't think the file
names I'm working with will have a comma, so this should be a good
solution.

Joe,
Are you referring to the ExecuteScript processor? That looks like a good
alternative. However, I couldn't find much information for it in the
documentation (https://nifi.apache.org/docs.html). Is there anywhere I can
find simple examples, especially in Javascript or Python?

Thank you,
Michael

On Wed, Aug 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com> wrote:

> Probably a good idea to use a script in a script processor to extract
> the details needed about the splits then feed those results into merge
> attribute as you suggested.  This would be the safest/cleanest.
>
> On Wed, Aug 10, 2016 at 3:42 PM, Mark Payne <ma...@hotmail.com> wrote:
> > Michael,
> >
> > Well, sort of...
> >
> > You could use:
> > ${allDelineatedValues('${fileArray}', ','):count()}
> >
> > So that will split up the fileArray attribute by commas and then count
> them.
> > The only issue is that if you were to have a filename with a comma in it,
> > you'd get the wrong value. Given that your'e not likely to have a
> filename
> > with a comma, you may be all right, but it's not really the "cleanest"
> > solution...
> >
> > The Expression language does allow you to evaluate JSONPath against an
> > attribute but JSONPath doesn't allow for the nice functions that you can
> get
> > in XPath and similar.
> >
> > Anyone else have any better ideas?
> >
> >
> > On Aug 10, 2016, at 3:32 PM, Michael Xu <mi...@gmail.com> wrote:
> >
> > Hi Mark,
> >
> > Thanks for your response earlier. While trying to implement what you
> > suggested in your email, I came across the issue of updating
> fragment.count
> > on a per-flowfile basis. I have another attribute called fileArray,
> which is
> > a json-compatible array that contains all the files for a particular
> > groupId. As an example taken from the Bulletin:
> >
> > Key: 'fileArray'
> >         Value: '["file1.txt","file2.txt","file3.txt"]'
> >
> > Is it possible in UpdateAttribute to use the Expression Language to
> return
> > the length of this array?
> >
> > Thanks for your help,
> > Michael
> >
> > On Wed, Aug 10, 2016 at 11:00 AM, Mark Payne <ma...@hotmail.com>
> wrote:
> >>
> >> Michael,
> >>
> >> In the MergeContent processor, you can set the "Merge Strategy" to
> >> "Defragment." This will tell Merge Content to
> >> determine its bin thresholds based on the following FlowFile attributes:
> >>
> >> fragment.identifier
> >> fragment.index
> >> fragment.count
> >>
> >> So you'd need to set those 3 attributes on each of the FlowFiles. Rather
> >> than using the Correlation Attribute Name,
> >> you'd set the "fragment.identifier" attribute (you can use
> UpdateAttribute
> >> to copy the value from the groupId attribute
> >> to the 'fragment.identifier' attribute if you need to).
> >>
> >> The "fragment.index" attribute tells MergeContent how to order the
> >> different FlowFiles in the merged bin.
> >>
> >> The "fragment.count" attribute tells MergeContent how many FlowFiles go
> >> this bin.
> >>
> >> Does that all make sense?
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >> On Aug 10, 2016, at 10:54 AM, Michael Xu <mi...@gmail.com> wrote:
> >>
> >> I am sending into the MergeContent processor, payloads that each belong
> in
> >> a certain group of files in some data I'm working with. Each payload
> has an
> >> attribute called "groupId" which is an identification number for a
> >> particular group of files. This is the attribute I'm using to bin each
> >> incoming flowfile, and have set the Correlation Attribute Name to
> groupId.
> >>
> >>
> >>
> >> The problem I'm dealing with right now is that each groupId has a
> varying
> >> number of files associated with it. As such, I'm not sure how in NiFi to
> >> detect when the MergeContent processor has received all files for a
> >> particular groupId, and once done, release the bin.
> >>
> >>
> >>
> >> Any help with this problem is appreciated, thanks!
> >>
> >>
> >
> >
>

Re: MergeContent with varying number of entries in bins.

Posted by Joe Witt <jo...@gmail.com>.
Probably a good idea to use a script in a script processor to extract
the details needed about the splits then feed those results into merge
attribute as you suggested.  This would be the safest/cleanest.

On Wed, Aug 10, 2016 at 3:42 PM, Mark Payne <ma...@hotmail.com> wrote:
> Michael,
>
> Well, sort of...
>
> You could use:
> ${allDelineatedValues('${fileArray}', ','):count()}
>
> So that will split up the fileArray attribute by commas and then count them.
> The only issue is that if you were to have a filename with a comma in it,
> you'd get the wrong value. Given that your'e not likely to have a filename
> with a comma, you may be all right, but it's not really the "cleanest"
> solution...
>
> The Expression language does allow you to evaluate JSONPath against an
> attribute but JSONPath doesn't allow for the nice functions that you can get
> in XPath and similar.
>
> Anyone else have any better ideas?
>
>
> On Aug 10, 2016, at 3:32 PM, Michael Xu <mi...@gmail.com> wrote:
>
> Hi Mark,
>
> Thanks for your response earlier. While trying to implement what you
> suggested in your email, I came across the issue of updating fragment.count
> on a per-flowfile basis. I have another attribute called fileArray, which is
> a json-compatible array that contains all the files for a particular
> groupId. As an example taken from the Bulletin:
>
> Key: 'fileArray'
>         Value: '["file1.txt","file2.txt","file3.txt"]'
>
> Is it possible in UpdateAttribute to use the Expression Language to return
> the length of this array?
>
> Thanks for your help,
> Michael
>
> On Wed, Aug 10, 2016 at 11:00 AM, Mark Payne <ma...@hotmail.com> wrote:
>>
>> Michael,
>>
>> In the MergeContent processor, you can set the "Merge Strategy" to
>> "Defragment." This will tell Merge Content to
>> determine its bin thresholds based on the following FlowFile attributes:
>>
>> fragment.identifier
>> fragment.index
>> fragment.count
>>
>> So you'd need to set those 3 attributes on each of the FlowFiles. Rather
>> than using the Correlation Attribute Name,
>> you'd set the "fragment.identifier" attribute (you can use UpdateAttribute
>> to copy the value from the groupId attribute
>> to the 'fragment.identifier' attribute if you need to).
>>
>> The "fragment.index" attribute tells MergeContent how to order the
>> different FlowFiles in the merged bin.
>>
>> The "fragment.count" attribute tells MergeContent how many FlowFiles go
>> this bin.
>>
>> Does that all make sense?
>>
>> Thanks
>> -Mark
>>
>>
>> On Aug 10, 2016, at 10:54 AM, Michael Xu <mi...@gmail.com> wrote:
>>
>> I am sending into the MergeContent processor, payloads that each belong in
>> a certain group of files in some data I'm working with. Each payload has an
>> attribute called "groupId" which is an identification number for a
>> particular group of files. This is the attribute I'm using to bin each
>> incoming flowfile, and have set the Correlation Attribute Name to groupId.
>>
>>
>>
>> The problem I'm dealing with right now is that each groupId has a varying
>> number of files associated with it. As such, I'm not sure how in NiFi to
>> detect when the MergeContent processor has received all files for a
>> particular groupId, and once done, release the bin.
>>
>>
>>
>> Any help with this problem is appreciated, thanks!
>>
>>
>
>

Re: MergeContent with varying number of entries in bins.

Posted by Mark Payne <ma...@hotmail.com>.
Michael,

Well, sort of...

You could use:
${allDelineatedValues('${fileArray}', ','):count()}

So that will split up the fileArray attribute by commas and then count them. The only issue is that if you were to have a filename with a comma in it, you'd get the wrong value. Given that your'e not likely to have a filename with a comma, you may be all right, but it's not really the "cleanest" solution...

The Expression language does allow you to evaluate JSONPath against an attribute but JSONPath doesn't allow for the nice functions that you can get in XPath and similar.

Anyone else have any better ideas?


> On Aug 10, 2016, at 3:32 PM, Michael Xu <mi...@gmail.com> wrote:
> 
> Hi Mark,
> 
> Thanks for your response earlier. While trying to implement what you suggested in your email, I came across the issue of updating fragment.count on a per-flowfile basis. I have another attribute called fileArray, which is a json-compatible array that contains all the files for a particular groupId. As an example taken from the Bulletin:
> 
> Key: 'fileArray'
>         Value: '["file1.txt","file2.txt","file3.txt"]'
> 
> Is it possible in UpdateAttribute to use the Expression Language to return the length of this array?
> 
> Thanks for your help,
> Michael
> 
> On Wed, Aug 10, 2016 at 11:00 AM, Mark Payne <markap14@hotmail.com <ma...@hotmail.com>> wrote:
> Michael,
> 
> In the MergeContent processor, you can set the "Merge Strategy" to "Defragment." This will tell Merge Content to
> determine its bin thresholds based on the following FlowFile attributes:
> 
> fragment.identifier
> fragment.index
> fragment.count
> 
> So you'd need to set those 3 attributes on each of the FlowFiles. Rather than using the Correlation Attribute Name,
> you'd set the "fragment.identifier" attribute (you can use UpdateAttribute to copy the value from the groupId attribute
> to the 'fragment.identifier' attribute if you need to).
> 
> The "fragment.index" attribute tells MergeContent how to order the different FlowFiles in the merged bin.
> 
> The "fragment.count" attribute tells MergeContent how many FlowFiles go this bin.
> 
> Does that all make sense?
> 
> Thanks
> -Mark
> 
> 
>> On Aug 10, 2016, at 10:54 AM, Michael Xu <michaelxu79@gmail.com <ma...@gmail.com>> wrote:
>> 
>> I am sending into the MergeContent processor, payloads that each belong in a certain group of files in some data I'm working with. Each payload has an attribute called "groupId" which is an identification number for a particular group of files. This is the attribute I'm using to bin each incoming flowfile, and have set the Correlation Attribute Name to groupId.
>>  
>> The problem I'm dealing with right now is that each groupId has a varying number of files associated with it. As such, I'm not sure how in NiFi to detect when the MergeContent processor has received all files for a particular groupId, and once done, release the bin.
>>  
>> Any help with this problem is appreciated, thanks!
> 
> 


Re: MergeContent with varying number of entries in bins.

Posted by Michael Xu <mi...@gmail.com>.
Hi Mark,


Thanks for your response earlier. While trying to implement what you
suggested in your email, I came across the issue of updating fragment.count
on a per-flowfile basis. I have another attribute called fileArray, which
is a json-compatible array that contains all the files for a particular
groupId. As an example taken from the Bulletin:


Key: 'fileArray'
        Value: '["file1.txt","file2.txt","file3.txt"]'


Is it possible in UpdateAttribute to use the Expression Language to return
the length of this array?


Thanks for your help,

Michael

On Wed, Aug 10, 2016 at 11:00 AM, Mark Payne <ma...@hotmail.com> wrote:

> Michael,
>
> In the MergeContent processor, you can set the "Merge Strategy" to
> "Defragment." This will tell Merge Content to
> determine its bin thresholds based on the following FlowFile attributes:
>
> fragment.identifier
> fragment.index
> fragment.count
>
> So you'd need to set those 3 attributes on each of the FlowFiles. Rather
> than using the Correlation Attribute Name,
> you'd set the "fragment.identifier" attribute (you can use UpdateAttribute
> to copy the value from the groupId attribute
> to the 'fragment.identifier' attribute if you need to).
>
> The "fragment.index" attribute tells MergeContent how to order the
> different FlowFiles in the merged bin.
>
> The "fragment.count" attribute tells MergeContent how many FlowFiles go
> this bin.
>
> Does that all make sense?
>
> Thanks
> -Mark
>
>
> On Aug 10, 2016, at 10:54 AM, Michael Xu <mi...@gmail.com> wrote:
>
> I am sending into the MergeContent processor, payloads that each belong in
> a certain group of files in some data I'm working with. Each payload has an
> attribute called "groupId" which is an identification number for a
> particular group of files. This is the attribute I'm using to bin each
> incoming flowfile, and have set the Correlation Attribute Name to groupId.
>
>
> The problem I'm dealing with right now is that each groupId has a varying
> number of files associated with it. As such, I'm not sure how in NiFi to
> detect when the MergeContent processor has received all files for a
> particular groupId, and once done, release the bin.
>
>
> Any help with this problem is appreciated, thanks!
>
>
>

Re: MergeContent with varying number of entries in bins.

Posted by Mark Payne <ma...@hotmail.com>.
Michael,

In the MergeContent processor, you can set the "Merge Strategy" to "Defragment." This will tell Merge Content to
determine its bin thresholds based on the following FlowFile attributes:

fragment.identifier
fragment.index
fragment.count

So you'd need to set those 3 attributes on each of the FlowFiles. Rather than using the Correlation Attribute Name,
you'd set the "fragment.identifier" attribute (you can use UpdateAttribute to copy the value from the groupId attribute
to the 'fragment.identifier' attribute if you need to).

The "fragment.index" attribute tells MergeContent how to order the different FlowFiles in the merged bin.

The "fragment.count" attribute tells MergeContent how many FlowFiles go this bin.

Does that all make sense?

Thanks
-Mark


> On Aug 10, 2016, at 10:54 AM, Michael Xu <mi...@gmail.com> wrote:
> 
> I am sending into the MergeContent processor, payloads that each belong in a certain group of files in some data I'm working with. Each payload has an attribute called "groupId" which is an identification number for a particular group of files. This is the attribute I'm using to bin each incoming flowfile, and have set the Correlation Attribute Name to groupId.
>  
> The problem I'm dealing with right now is that each groupId has a varying number of files associated with it. As such, I'm not sure how in NiFi to detect when the MergeContent processor has received all files for a particular groupId, and once done, release the bin.
>  
> Any help with this problem is appreciated, thanks!