You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Kiran <b....@gmail.com> on 2017/02/15 21:58:40 UTC

Re[2]: MergeContent across a NiFi cluster

Thanks for the reply Joe.

I'm glad I wasn't missing something obvious. I'm afraid I'm stuck with 
file size limitation but I'll have a word with the guys who configure 
the load balancer to see what affinity options they have.

Thanks

Brian

------ Original Message ------
From: "Joe Witt" <jo...@gmail.com>
To: users@nifi.apache.org; "Kiran" <b....@gmail.com>
Sent: 15/02/2017 21:36:41
Subject: Re: MergeContent across a NiFi cluster

>Brian,
>
>Great use case and you're right we don't have an easy way of handling 
>this now.  If you do indeed have a load balancer in front of the 
>receiving nifi cluster and it can support affinity of some kind then it 
>is possible you can set a header in HTTP Post I believe which would 
>come from a flowfile attribute which would be on each split and would 
>be the hash of its full object.  If the load balancer ensured all 
>splits (based on that header matching) were on the same machine then 
>you'd be in business.  There are some load balancers that do this (i'm 
>thinking of a commercial one).  But, I admit that is a lot of moving 
>parts to keep in mind.  We need to improve our site-to-site feature to 
>do things like automatically split content for you and handle the 
>partitioning/affinity logic I suggested.  You might also consider 
>avoiding the splitting for now to keep things super simple though I 
>recognize that exposes alternative tradeoffs.
>
>Great case for us to work on/rally around though.
>
>Thanks
>Joe
>
>On Wed, Feb 15, 2017 at 4:29 PM, Kiran <b....@gmail.com> 
>wrote:
>>Hello,
>>
>>I need to send data from one organisation to another but there are 
>>data
>>size limits between them (this isn't my choice and has been enforced 
>>on
>>me). I've got a 4 node NiFi cluster in each organisation.
>>
>>The sending NiFi cluster has the following data flow:
>>Ingest the data by various means
>>    -> Compress Data using CompressContent
>>      -> If file size > X amount I use SplitContent
>>        -> HTTPS POST to load balancer sitting in front of the NiFi
>>cluster in the other organisation
>>
>>On the receiving NiFi cluster I wanted to:
>>-> Receive the data
>>    -> MergeContent
>>      -> Do what ever else with the data...
>>
>>The problem I can't get round is that if I split the content into 3
>>fragments and send them to the receiving NiFi instance because it's
>>behind a load balancer I can't guarantee that the 3 fragments are
>>received by the same node.
>>
>>Q1) I'm assuming that for MergeContent to work all the fragments of a
>>single piece of data have to arrive on the same NiFi node or is there 
>>a
>>option to have it working across a cluster?
>>
>>Q2) How long does the MergeContent processor wait for all the 
>>fragments?
>>If one of the fragments gets lost does it timeout after a certain
>>period?
>>
>>I was thinking one way to solve this of to have the HTTPListener on 
>>the
>>receiving NiFi only listening on the primary node which would ensure 
>>all
>>the fragments arrive on the same node. The downside would be that I 
>>end
>>up with idle NiFi nodes.
>>
>>Is there anything obvious that I'm missed that would solve my issue?
>>
>>Thanks in advance,
>>
>>Brian
>>
>>Virus-free. www.avast.com
>

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus