You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Paresh Shah <Pa...@lifelock.com> on 2016/02/04 19:53:53 UTC

Smart Load Balancing behavior replicating data to different cluster nodes.

We have the pipeline as follows.

Sender Side
  P1..->Pn -> RPG ( connected to ForwardInputPort )

Receiver Side
ForwardInputPort -> ProcessGroup

        Inside the processGroup
       InputPort-> P1 …-> Pn

We see that we sent 10 flow files to the remote cluster. There we see the same set of flow files being sent to different nodes of the cluster. We are confirming this by looking at the DataProvnance and the fileNames in there. They are the same ones for two nodes of the cluster.

This behaviour seems to incorrect.

Paresh

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Smart Load Balancing behavior replicating data to different cluster nodes.

Posted by Matthew Clarke <ma...@gmail.com>.
Paresh,
      Please look in your sending NiFi logs for the following log line:

INFO [Timer-Driven Process Thread-6]
o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
Node[<*FQDN-node1>*:0] will receive 33.333333333333336% of data
Node[<*FQDN-node2>*:0] will receive 33.333333333333336% of data
Node[<*FQDN-node3>*:0] will receive 33.333333333333336% of data

Do the hostnames in those lines look correct?

Did you configure all three S2S line on every Node in your receiving
cluster?

# Site to Site properties
nifi.remote.input.socket.host=<*FQDN-node1>                <-- not
recommended that this is left blank. This should be set to an IP or FQDN
that is resolvable and reachable by any sending NiFi.

  Leaving this blank may cause host to resolve to localhost. This will
result in sending system trying to send to itself. *
nifi.remote.input.socket.port=<*some port>                      <-- use
netstat to verifiy that this port shows up as a LISTEN port*
nifi.remote.input.secure=true                                           <--
verify that this is only set to true if the NiFi has been configured to run
securely (HTTPS)


Thanks,
Matt

On Thu, Feb 4, 2016 at 2:09 PM, Paresh Shah <Pa...@lifelock.com>
wrote:

> The problem is not quite what we thought.
>
> What we see is that we generate a bunch of flowFiles on the sender
> processor and then commit. But all these get delivered to the same remote
> node. What we are expecting is they should be split depending on the no of
> nodes and the loads on each node.
>
>
> Paresh
>
> On 2/4/16, 10:53 AM, "Paresh Shah" <Pa...@lifelock.com> wrote:
>
> >We have the pipeline as follows.
> >
> >Sender Side
> >  P1..->Pn -> RPG ( connected to ForwardInputPort )
> >
> >Receiver Side
> >ForwardInputPort -> ProcessGroup
> >
> >        Inside the processGroup
> >       InputPort-> P1 Š-> Pn
> >
> >We see that we sent 10 flow files to the remote cluster. There we see the
> >same set of flow files being sent to different nodes of the cluster. We
> >are confirming this by looking at the DataProvnance and the fileNames in
> >there. They are the same ones for two nodes of the cluster.
> >
> >This behaviour seems to incorrect.
> >
> >Paresh
> >
> >________________________________
> >The information contained in this transmission may contain privileged and
> >confidential information. It is intended only for the use of the
> >person(s) named above. If you are not the intended recipient, you are
> >hereby notified that any review, dissemination, distribution or
> >duplication of this communication is strictly prohibited. If you are not
> >the intended recipient, please contact the sender by reply email and
> >destroy all copies of the original message.
> >________________________________
>
> ________________________________
>  The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ________________________________
>

Re: Smart Load Balancing behavior replicating data to different cluster nodes.

Posted by Paresh Shah <Pa...@lifelock.com>.
The problem is not quite what we thought.

What we see is that we generate a bunch of flowFiles on the sender
processor and then commit. But all these get delivered to the same remote
node. What we are expecting is they should be split depending on the no of
nodes and the loads on each node.


Paresh

On 2/4/16, 10:53 AM, "Paresh Shah" <Pa...@lifelock.com> wrote:

>We have the pipeline as follows.
>
>Sender Side
>  P1..->Pn -> RPG ( connected to ForwardInputPort )
>
>Receiver Side
>ForwardInputPort -> ProcessGroup
>
>        Inside the processGroup
>       InputPort-> P1 Š-> Pn
>
>We see that we sent 10 flow files to the remote cluster. There we see the
>same set of flow files being sent to different nodes of the cluster. We
>are confirming this by looking at the DataProvnance and the fileNames in
>there. They are the same ones for two nodes of the cluster.
>
>This behaviour seems to incorrect.
>
>Paresh
>
>________________________________
>The information contained in this transmission may contain privileged and
>confidential information. It is intended only for the use of the
>person(s) named above. If you are not the intended recipient, you are
>hereby notified that any review, dissemination, distribution or
>duplication of this communication is strictly prohibited. If you are not
>the intended recipient, please contact the sender by reply email and
>destroy all copies of the original message.
>________________________________

________________________________
 The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________