You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Kiran <b....@gmail.com> on 2017/02/15 21:29:07 UTC

MergeContent across a NiFi cluster

Hello,

I need to send data from one organisation to another but there are data
size limits between them (this isn't my choice and has been enforced on
me). I've got a 4 node NiFi cluster in each organisation.

The sending NiFi cluster has the following data flow:
Ingest the data by various means
    -> Compress Data using CompressContent
      -> If file size > X amount I use SplitContent
        -> HTTPS POST to load balancer sitting in front of the NiFi
cluster in the other organisation

On the receiving NiFi cluster I wanted to:
-> Receive the data
    -> MergeContent
      -> Do what ever else with the data...

The problem I can't get round is that if I split the content into 3
fragments and send them to the receiving NiFi instance because it's
behind a load balancer I can't guarantee that the 3 fragments are
received by the same node.

Q1) I'm assuming that for MergeContent to work all the fragments of a
single piece of data have to arrive on the same NiFi node or is there a
option to have it working across a cluster?

Q2) How long does the MergeContent processor wait for all the fragments?
If one of the fragments gets lost does it timeout after a certain
period?

I was thinking one way to solve this of to have the HTTPListener on the
receiving NiFi only listening on the primary node which would ensure all
the fragments arrive on the same node. The downside would be that I end
up with idle NiFi nodes.

Is there anything obvious that I'm missed that would solve my issue?

Thanks in advance,

Brian

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re[2]: MergeContent across a NiFi cluster

Posted by Kiran <b....@gmail.com>.
Thanks for the reply Joe.

I'm glad I wasn't missing something obvious. I'm afraid I'm stuck with 
file size limitation but I'll have a word with the guys who configure 
the load balancer to see what affinity options they have.

Thanks

Brian

------ Original Message ------
From: "Joe Witt" <jo...@gmail.com>
To: users@nifi.apache.org; "Kiran" <b....@gmail.com>
Sent: 15/02/2017 21:36:41
Subject: Re: MergeContent across a NiFi cluster

>Brian,
>
>Great use case and you're right we don't have an easy way of handling 
>this now.  If you do indeed have a load balancer in front of the 
>receiving nifi cluster and it can support affinity of some kind then it 
>is possible you can set a header in HTTP Post I believe which would 
>come from a flowfile attribute which would be on each split and would 
>be the hash of its full object.  If the load balancer ensured all 
>splits (based on that header matching) were on the same machine then 
>you'd be in business.  There are some load balancers that do this (i'm 
>thinking of a commercial one).  But, I admit that is a lot of moving 
>parts to keep in mind.  We need to improve our site-to-site feature to 
>do things like automatically split content for you and handle the 
>partitioning/affinity logic I suggested.  You might also consider 
>avoiding the splitting for now to keep things super simple though I 
>recognize that exposes alternative tradeoffs.
>
>Great case for us to work on/rally around though.
>
>Thanks
>Joe
>
>On Wed, Feb 15, 2017 at 4:29 PM, Kiran <b....@gmail.com> 
>wrote:
>>Hello,
>>
>>I need to send data from one organisation to another but there are 
>>data
>>size limits between them (this isn't my choice and has been enforced 
>>on
>>me). I've got a 4 node NiFi cluster in each organisation.
>>
>>The sending NiFi cluster has the following data flow:
>>Ingest the data by various means
>>    -> Compress Data using CompressContent
>>      -> If file size > X amount I use SplitContent
>>        -> HTTPS POST to load balancer sitting in front of the NiFi
>>cluster in the other organisation
>>
>>On the receiving NiFi cluster I wanted to:
>>-> Receive the data
>>    -> MergeContent
>>      -> Do what ever else with the data...
>>
>>The problem I can't get round is that if I split the content into 3
>>fragments and send them to the receiving NiFi instance because it's
>>behind a load balancer I can't guarantee that the 3 fragments are
>>received by the same node.
>>
>>Q1) I'm assuming that for MergeContent to work all the fragments of a
>>single piece of data have to arrive on the same NiFi node or is there 
>>a
>>option to have it working across a cluster?
>>
>>Q2) How long does the MergeContent processor wait for all the 
>>fragments?
>>If one of the fragments gets lost does it timeout after a certain
>>period?
>>
>>I was thinking one way to solve this of to have the HTTPListener on 
>>the
>>receiving NiFi only listening on the primary node which would ensure 
>>all
>>the fragments arrive on the same node. The downside would be that I 
>>end
>>up with idle NiFi nodes.
>>
>>Is there anything obvious that I'm missed that would solve my issue?
>>
>>Thanks in advance,
>>
>>Brian
>>
>>Virus-free. www.avast.com
>

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: MergeContent across a NiFi cluster

Posted by Joe Witt <jo...@gmail.com>.
Brian,

Great use case and you're right we don't have an easy way of handling this
now.  If you do indeed have a load balancer in front of the receiving nifi
cluster and it can support affinity of some kind then it is possible you
can set a header in HTTP Post I believe which would come from a flowfile
attribute which would be on each split and would be the hash of its full
object.  If the load balancer ensured all splits (based on that header
matching) were on the same machine then you'd be in business.  There are
some load balancers that do this (i'm thinking of a commercial one).  But,
I admit that is a lot of moving parts to keep in mind.  We need to improve
our site-to-site feature to do things like automatically split content for
you and handle the partitioning/affinity logic I suggested.  You might also
consider avoiding the splitting for now to keep things super simple though
I recognize that exposes alternative tradeoffs.

Great case for us to work on/rally around though.

Thanks
Joe

On Wed, Feb 15, 2017 at 4:29 PM, Kiran <b....@gmail.com>
wrote:

> Hello,
>
> I need to send data from one organisation to another but there are data
> size limits between them (this isn't my choice and has been enforced on
> me). I've got a 4 node NiFi cluster in each organisation.
>
> The sending NiFi cluster has the following data flow:
> Ingest the data by various means
>    -> Compress Data using CompressContent
>      -> If file size > X amount I use SplitContent
>        -> HTTPS POST to load balancer sitting in front of the NiFi
> cluster in the other organisation
>
> On the receiving NiFi cluster I wanted to:
> -> Receive the data
>    -> MergeContent
>      -> Do what ever else with the data...
>
> The problem I can't get round is that if I split the content into 3
> fragments and send them to the receiving NiFi instance because it's
> behind a load balancer I can't guarantee that the 3 fragments are
> received by the same node.
>
> Q1) I'm assuming that for MergeContent to work all the fragments of a
> single piece of data have to arrive on the same NiFi node or is there a
> option to have it working across a cluster?
>
> Q2) How long does the MergeContent processor wait for all the fragments?
> If one of the fragments gets lost does it timeout after a certain
> period?
>
> I was thinking one way to solve this of to have the HTTPListener on the
> receiving NiFi only listening on the primary node which would ensure all
> the fragments arrive on the same node. The downside would be that I end
> up with idle NiFi nodes.
>
> Is there anything obvious that I'm missed that would solve my issue?
>
> Thanks in advance,
>
> Brian
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=icon> Virus-free.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=link>
>