You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by na...@bt.com on 2020/12/09 07:58:52 UTC

Joining two or more flow files and merging the content

Hi All,

I've got a case where I need to join two or more separate flow files, based on one or more 's ID within the flow (Multi Record flow file, each record with a unique ID). The content of the second flow file could also contain one or more records each again with a matching ID to the records in the first flow file. The second flow file comes from a different data source to the first and could be made to one record per flow file with the ID as an attribute if this makes it simpler.

It can take some time for there to be a match, so it will need to be able to queue and replay for a configurable period. If after a set period, there is no match release it downstream. When a match is found, the matching record in the secondary flow file needs to be injected into the content of the first flow file record.

I see there is the Lookup Record Processor, but there isn't a lookup service to query flow files in a queue or provenance? I want to avoid writing the secondary flow file to a database, or other storage option and would like to do the merge in flight through NiFi.

Below is a rough sketch of what I am trying to do.

Any suggestions would be much appreciated.

Kind Regards,

Nathan

[cid:image002.png@01D6CE1A.451A9590]





Re: Joining two or more flow files and merging the content

Posted by Mark Payne <ma...@hotmail.com>.
Nathan,

If some records match and others don’t, they will be split up. The records that match will go to the ‘matched’ relationship and the records that don’t will go to the ‘unmatched’ relationship. It doesn’t route a FlowFile at a time, but rather a Record at a time.

Thanks
-Mark

On Dec 16, 2020, at 7:19 AM, nathan.english@bt.com<ma...@bt.com> wrote:

Hi All,

I've managed to get this working as a proof of concept using the Lookup Record Processor, Putt Distributed Cache Map Processor and the Distributed Cache Map Services.

One question I have on the Lookup Record processor is when the routing strategy is set to: 'Route to 'matched' or 'unmatched'’ and the input flow file content contains an array of records. What happens when one record isn't matched in the flow file? Does the whole flow file get sent down the unmatched relationship or just that record that it didn't match? The documentation suggests it may result in one or more flow files, but the provenance is hard to track down the history.

Kind Regards,

Nathan

From: English,N,Nathan,VIR R
Sent: 09 December 2020 10:59
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: Joining two or more flow files and merging the content

Hi All,

I’ve got a case where I need to join two or more separate flow files, based on one or more ‘s ID within the flow (Multi Record flow file, each record with a unique ID). The content of the second flow file could also contain one or more records each again with a matching ID to the records in the first flow file. The second flow file comes from a different data source to the first and could be made to one record per flow file with the ID as an attribute if this makes it simpler.

It can take some time for there to be a match, so it will need to be able to queue and replay for a configurable period. If after a set period, there is no match release it downstream. When a match is found, the matching record in the secondary flow file needs to be injected into the content of the first flow file record.

I see there is the Lookup Record Processor, but there isn’t a lookup service to query flow files in a queue or provenance? I want to avoid writing the secondary flow file to a database, or other storage option and would like to do the merge in flight through NiFi.

Below is a rough sketch of what I am trying to do.

Any suggestions would be much appreciated.

Kind Regards,

Nathan

<image001.png>


RE: Joining two or more flow files and merging the content

Posted by na...@bt.com.
Hi All,

I've managed to get this working as a proof of concept using the Lookup Record Processor, Putt Distributed Cache Map Processor and the Distributed Cache Map Services.

One question I have on the Lookup Record processor is when the routing strategy is set to: 'Route to 'matched' or 'unmatched'' and the input flow file content contains an array of records. What happens when one record isn't matched in the flow file? Does the whole flow file get sent down the unmatched relationship or just that record that it didn't match? The documentation suggests it may result in one or more flow files, but the provenance is hard to track down the history.

Kind Regards,

Nathan

From: English,N,Nathan,VIR R
Sent: 09 December 2020 10:59
To: users@nifi.apache.org
Subject: Joining two or more flow files and merging the content

Hi All,

I've got a case where I need to join two or more separate flow files, based on one or more 's ID within the flow (Multi Record flow file, each record with a unique ID). The content of the second flow file could also contain one or more records each again with a matching ID to the records in the first flow file. The second flow file comes from a different data source to the first and could be made to one record per flow file with the ID as an attribute if this makes it simpler.

It can take some time for there to be a match, so it will need to be able to queue and replay for a configurable period. If after a set period, there is no match release it downstream. When a match is found, the matching record in the secondary flow file needs to be injected into the content of the first flow file record.

I see there is the Lookup Record Processor, but there isn't a lookup service to query flow files in a queue or provenance? I want to avoid writing the secondary flow file to a database, or other storage option and would like to do the merge in flight through NiFi.

Below is a rough sketch of what I am trying to do.

Any suggestions would be much appreciated.

Kind Regards,

Nathan

[cid:image001.png@01D6D3BD.A0BBF300]