You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Chakrader Dewaragatla <Ch...@lifelock.com> on 2016/01/09 00:27:33 UTC

Re: Nifi cluster features - Questions

Mark – I have setup a two node cluster and tried the following .
 GenrateFlowfile processor (Run only on primary node) —> DistributionLoad processor (RoundRobin)   —> PutFile

>> The GetFile/PutFile will run on all nodes (unless you schedule it to run on primary node only).
>From your above comment, It should put file on two nodes. It put files on primary node only. Any thoughts ?

Thanks,
-Chakri

From: Mark Payne <ma...@hotmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Wednesday, October 7, 2015 at 11:28 AM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

Correct - when NiFi instances are clustered, they do not transfer data between the nodes. This is very different
than you might expect from something like Storm or Spark, as the key goals and design are quite different.
We have discussed providing the ability to allow the user to indicate that they want to have the framework
do load balancing for specific connections in the background, but it's still in more of a discussion phase.

Site-to-Site is simply the capability that we have developed to transfer data between one instance of
NiFi and another instance of NiFi. So currently, if we want to do load balancing across the cluster, we would
create a site-to-site connection (by dragging a Remote Process Group onto the graph) and give that
site-to-site connection the URL of our cluster. That way, you can push data to your own cluster, effectively
providing a load balancing capability.

If you were to just run ListenHTTP without setting it to Primary Node, then every node in the cluster will be listening
for incoming HTTP connections. So you could then use a simple load balancer in front of NiFi to distribute the load
across your cluster.

Does this help? If you have any more questions we're happy to help!

Thanks
-Mark


On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Mark - Thanks for the notes.

>> The other option would be to have a ListenHTTP processor run on Primary Node only and then use Site-to-Site to distribute the data to other nodes.
Lets say I have 5 node cluster and ListenHTTP processor on Primary node, collected data on primary node is not transfered to other nodes by default for processing despite all nodes are part of one cluster?
If ListenHTTP processor is running  as a dafult (with out explicit setting to run on primary node), how does the data transferred to rest of the nodes? Does site-to-site come in play when I make one processor to run on primary node ?

Thanks,
-Chakri

From: Mark Payne <ma...@hotmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Wednesday, October 7, 2015 at 7:00 AM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Hello Chakro,

When you create a cluster of NiFi instances, each node in the cluster is acting independently and in exactly
the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the same flow. However, they will be
pulling in different data and therefore operating on different data.

So if you pull in 10 1-gig files from S3, each of those files will be processed on the node that pulled the data
in. NiFi does not currently shuffle data around between nodes in the cluster (you can use site-to-site to do
this if you want to, but it won't happen automatically). If you set the number of Concurrent Tasks to 5, then
you will have up to 5 threads running for that processor on each node.

The only exception to this is the Primary Node. You can schedule a Processor to run only on the Primary Node
by right-clicking on the Processor, and going to the Configure menu. In the Scheduling tab, you can change
the Scheduling Strategy to Primary Node Only. In this case, that Processor will only be triggered to run on
whichever node is elected the Primary Node (this can be changed in the Cluster management screen by clicking
the appropriate icon in the top-right corner of the UI).

The GetFile/PutFile will run on all nodes (unless you schedule it to run on primary node only).

If you are attempting to have a single input running HTTP and then push that out across the entire cluster to
process the data, you would have a few options. First, you could just use an HTTP Load Balancer in front of NiFi.
The other option would be to have a ListenHTTP processor run on Primary Node only and then use Site-to-Site
to distribute the data to other nodes.

For more info on site-to-site, you can see the Site-to-Site section of the User Guide at
http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site

If you have any more questions, let us know!

Thanks
-Mark

On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Nifi Team – I would like to understand the advantages of Nifi clustering setup.

Questions :

 - How does workflow work on multiple nodes ? Does it share the resources intra nodes ?
Lets say I need to pull data 10 1Gig files from S3, how does work load distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?

 - How to “isolate” the processor to the master node (or one node)?

- Getfile/Putfile processors on cluster setup, does it get/put on primary node ? How do I force processor to look in one of the slave node?

- How can we have a workflow where the input side we want to receive requests (http) and then the rest of the pipeline need to run in parallel on all the nodes ?

Thanks,
-Chakro

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Nifi cluster features - Questions

Posted by Chakrader Dewaragatla <Ch...@lifelock.com>.
Mark/Matthew – I was able to run site to site, on problem node I had extra space after port number :) . “10080 “

Thanks,
-Chakri

From: Chakrader Dewaragatla <ch...@lifelock.com>>
Date: Tuesday, January 12, 2016 at 12:13 AM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Thanks Matthew and Mark. Below examples are very helpful.
I still need to debug why site-to-site is sending data to one slave instead of two.

-Chakri

From: Matthew Clarke <ma...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Monday, January 11, 2016 at 2:05 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,
     Your original option was to use a RPG (Site-to-Site) to redistribute data received on the primary Node only to every node in your cluster in a load-balanced fashion. In this setup every connected Node will get a portion of the data and there is no way to specify which node gets what particular piece of data.  This dataflwo would look like this:
[Inline image 1]
    Data is received by the listenHTTP processor on the primary Node only.  It is then sent to a RPG that was added using the URL of the NCM for your 10 Node cluster (the cluster also running the before mentioned primary node listenHTTP processor). There will also be an input port which runs on every Node.  each Node will receive a load-balanced distribution of the data sent to that RPG.

Hope these illustration help.

Thanks,
Matt

On Mon, Jan 11, 2016 at 4:50 PM, Matthew Clarke <ma...@gmail.com>> wrote:
Chakri,
        All data is received on the primary Node only via the initial listenHTTP.  Some routing tales place to send some data to a particular 5 nodes and other data to the other 5 nodes.  The postHTTP processor are configured to send to a specific Node in your cluster using the same target port number. A single ListenHTTP processor lives then runs on every Node configured to use that target port number.

Thanks,
Matt

On Mon, Jan 11, 2016 at 4:47 PM, Matthew Clarke <ma...@gmail.com>> wrote:
Chakri,
            What Mark is saying is NiFI Remote Process Group (RPG) also known as Site-to-Site will load-balance delivery data to all nodes in a cluster.  It can not be configured to balance data to only a subset of a nodes in a cluster.  If this is the strategy you want to deploy, a different approach must be taken (one that does not use Site-to-Site).  Here is a NiFI diagram of one such approach using your example of a 10 node cluster:

[Inline image 1]



On Mon, Jan 11, 2016 at 4:16 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:
Mark - Correct me if I understood right.

Curl post from some application —> Configure Listen http (on primary node) --> Post http with Data flow file (On primary node?)  --> Post to site-to-site end point —> This intern distribute load to both slaves.

Thanks,
-Chakri

From: Mark Payne <ma...@hotmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Monday, January 11, 2016 at 12:29 PM

To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

At this time, your only options are to run the processors on all nodes or a single node (Primary Node). There's no way to really group nodes together and say "only run on this set of nodes."

One option is to have a ListenHTTP Processor and then push data to that NiFi via PostHTTP (configure it to send FlowFile attributes along). By doing this, you could set up the sending NiFi
to only deliver data to two nodes. You could then have a different set of data going to a different two nodes, etc. by the way that you configure which data goes to which PostHTTP Processor.

Does this give you what you need?


On Jan 11, 2016, at 3:20 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Thanks Mark. I will look into it.

Couple of questions:


  *
Going back to my earlier question, In a nifi cluster with two slaves and NCM how do I make two slaves accept and process the incoming flowfile in distibuted fashion. Site to site is the only way to go ?
In our use case, we have http listener running on primary node and putfile processor should run on two slaves in distributed fashion.

It is more like a new (or existing) feature.
 - In a nifi cluster setup, can we group the machines and set site-to-site to individual group.
 For instance I have 10 node cluster, can I group them into 5 groups with two nodes each. Run processors on dedicated group (using site to site or other means).

Thanks,
-Chakri

From: Mark Payne <ma...@hotmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Monday, January 11, 2016 at 5:24 AM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

This line in the logs is particularly interesting (on primary node):

2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
Node[i-c894e249.dev.aws.lifelock.ad:0<http://i-c894e249.dev.aws.lifelock.ad:0>] will receive 100.0% of data

This indicates that all of the site-to-site data will go to the host i-c894e249.dev.aws.lifelock.ad<http://i-c894e249.dev.aws.lifelock.ad>. Moreover, because that is the only node listed, this means
that the NCM responded, indicating that this is the only node in the cluster that is currently connected and has site-to-site enabled. Can you double-check the nifi.properties
file on the Primary Node and verify that the "nifi.remote.input.socket.port" is property is specified, and that the "nifi.remote.input.secure" property is set to "false"?
Of note is that if the "nifi.remote.input.secure" property is set to true, but keystore and truststore are not specified, then site-to-site will be disabled (there would be a warning
in the log in this case).

If you can verify that both of those properties are set properly on both nodes, then we can delve in further, but probably best to start by double-checking the easy things :)

Thanks
-Mark


On Jan 10, 2016, at 5:55 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Bryan – Here are the logs :
I have 5 sec flow file.

On primary node (No data coming in)

2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis
2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:39,450 INFO [pool-26-thread-16] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1 threads
2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
Node[i-c894e249.dev.aws.lifelock.ad:0<http://i-c894e249.dev.aws.lifelock.ad:0>] will receive 100.0% of data
2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis
2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]<http://10.228.68.73:8080/nifi%5D> Successfully sent [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec
2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis
2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]<http://10.228.68.73:8080/nifi%5D> Successfully sent [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 51 milliseconds at a rate of 391 bytes/sec
2016-01-10 22:52:45,092 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73
2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes) in 61 millis
2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis
2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 301
2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/295.prov in 111 milliseconds
2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 8 records
2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]<http://10.228.68.73:8080/nifi%5D> Successfully sent [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec
2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis
2016-01-10 22:52:54,326 INFO [NiFi Web Server-22] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa] to run
2016-01-10 22:52:54,353 INFO [NiFi Web Server-26] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run
2016-01-10 22:52:54,377 INFO [NiFi Web Server-25] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run
2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis
2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis


On Secondary node (Data coming in)

2016-01-10 22:52:43,896 INFO [pool-18-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 88 milliseconds
2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec
2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20] at location /root/putt/275243509297560
2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 17037
2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/17031.prov in 56 milliseconds
2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 10 records
2016-01-10 22:52:45,034 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73
2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes) in 76 millis
2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis
2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec
2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20] at location /root/putt/275248510432074
2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis
From: Bryan Bende <bb...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Sunday, January 10, 2016 at 2:43 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

Glad you got site-to-site working.

Regarding the data distribution, I'm not sure why it is behaving that way. I just did a similar test running ncm, node1, and node2 all on my local machine, with GenerateFlowFile running every 10 seconds, and Input Port going to a LogAttribute, and I see it alternating between node1 and node2 logs every 10 seconds.

Is there anything in your primary node logs (primary_node/logs/nifi-app.log) when you see the data on the other node?

-Bryan


On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com>> wrote:
Chakri,

Would love to hear what you've learned and how that differed from the
docs themselves.  Site-to-site has proven difficult to setup so we're
clearly not there yet in having the right operator/admin experience.

Thanks
Joe

On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
<Ch...@lifelock.com>> wrote:
> I was able to get site-to-site work.
> I tried to follow your instructions to send data distribute across the
> nodes.
>
> GenerateFlowFile (On Primary) —> RPG
> RPG —> Input Port   —> Putfile (Time driven scheduling)
>
> However, data is only written to one slave (Secondary slave). Primary slave
> has not data.
>
> Image screenshot :
> http://tinyurl.com/jjvjtmq
>
> From: Chakrader Dewaragatla <ch...@lifelock.com>>
> Date: Sunday, January 10, 2016 at 11:26 AM
>
> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
>
> Bryan – Thanks – I am trying to setup site-to-site.
> I have two slaves and one NCM.
>
> My properties as follows :
>
> On both Slaves:
>
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> On NCM:
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> When I try drop remote process group (with http://<NCM<http://%3CNCM> IP>:8080/nifi), I see
> error as follows for two nodes.
>
> [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
> [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
>
> Do you have insight why its trying to connecting 8080 on slaves ? When do
> 10880 port come into the picture ? I remember try setting site to site few
> months back and succeeded.
>
> Thanks,
> -Chakri
>
>
>
> From: Bryan Bende <bb...@gmail.com>>
> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Date: Saturday, January 9, 2016 at 11:22 AM
> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
>
> The sending node (where the remote process group is) will distribute the
> data evenly across the two nodes, so an individual file will only be sent to
> one of the nodes. You could think of it as if a separate NiFi instance was
> sending directly to a two node cluster, it would be evenly distributing the
> data across the two nodes. In this case it just so happens to all be with in
> the same cluster.
>
> The most common use case for this scenario is the List and Fetch processors
> like HDFS. You can perform the listing on primary node, and then distribute
> the results so the fetching takes place on all nodes.
>
> On Saturday, January 9, 2016, Chakrader Dewaragatla
> <Ch...@lifelock.com>> wrote:
>>
>> Bryan – Thanks, how do the nodes distribute the load for a input port. As
>> port is open and listening on two nodes,  does it copy same files on both
>> the nodes?
>> I need to try this setup to see the results, appreciate your help.
>>
>> Thanks,
>> -Chakri
>>
>> From: Bryan Bende <bb...@gmail.com>>
>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>> Date: Friday, January 8, 2016 at 3:44 PM
>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Hi Chakri,
>>
>> I believe the DistributeLoad processor is more for load balancing when
>> sending to downstream systems. For example, if you had two HTTP endpoints,
>> you could have the first relationship from DistributeLoad going to a
>> PostHTTP that posts to endpoint #1, and the second relationship going to a
>> second PostHTTP that goes to endpoint #2.
>>
>> If you want to distribute the data with in the cluster, then you need to
>> use site-to-site. The way you do this is the following...
>>
>> - Add an Input Port connected to your PutFile.
>> - Add GenerateFlowFile scheduled on primary node only, connected to a
>> Remote Process Group. The Remote Process Group should be connected to the
>> Input Port from the previous step.
>>
>> So both nodes have an input port listening for data, but only the primary
>> node produces a FlowFile and sends it to the RPG which then re-distributes
>> it back to one of the Input Ports.
>>
>> In order for this to work you need to set nifi.remote.input.socket.port in
>> nifi.properties to some available port, and you probably want
>> nifi.remote.input.secure=false for testing.
>>
>> -Bryan
>>
>>
>> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>> <Ch...@lifelock.com>> wrote:
>>>
>>> Mark – I have setup a two node cluster and tried the following .
>>>  GenrateFlowfile processor (Run only on primary node) —> DistributionLoad
>>> processor (RoundRobin)   —> PutFile
>>>
>>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
>>> >> run on primary node only).
>>> From your above comment, It should put file on two nodes. It put files on
>>> primary node only. Any thoughts ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>>
>>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Date: Wednesday, October 7, 2015 at 11:28 AM
>>>
>>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Chakri,
>>>
>>> Correct - when NiFi instances are clustered, they do not transfer data
>>> between the nodes. This is very different
>>> than you might expect from something like Storm or Spark, as the key
>>> goals and design are quite different.
>>> We have discussed providing the ability to allow the user to indicate
>>> that they want to have the framework
>>> do load balancing for specific connections in the background, but it's
>>> still in more of a discussion phase.
>>>
>>> Site-to-Site is simply the capability that we have developed to transfer
>>> data between one instance of
>>> NiFi and another instance of NiFi. So currently, if we want to do load
>>> balancing across the cluster, we would
>>> create a site-to-site connection (by dragging a Remote Process Group onto
>>> the graph) and give that
>>> site-to-site connection the URL of our cluster. That way, you can push
>>> data to your own cluster, effectively
>>> providing a load balancing capability.
>>>
>>> If you were to just run ListenHTTP without setting it to Primary Node,
>>> then every node in the cluster will be listening
>>> for incoming HTTP connections. So you could then use a simple load
>>> balancer in front of NiFi to distribute the load
>>> across your cluster.
>>>
>>> Does this help? If you have any more questions we're happy to help!
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com>> wrote:
>>>
>>> Mark - Thanks for the notes.
>>>
>>> >> The other option would be to have a ListenHTTP processor run on
>>> >> Primary Node only and then use Site-to-Site to distribute the data to other
>>> >> nodes.
>>> Lets say I have 5 node cluster and ListenHTTP processor on Primary node,
>>> collected data on primary node is not transfered to other nodes by default
>>> for processing despite all nodes are part of one cluster?
>>> If ListenHTTP processor is running  as a dafult (with out explicit
>>> setting to run on primary node), how does the data transferred to rest of
>>> the nodes? Does site-to-site come in play when I make one processor to run
>>> on primary node ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>>
>>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Date: Wednesday, October 7, 2015 at 7:00 AM
>>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Hello Chakro,
>>>
>>> When you create a cluster of NiFi instances, each node in the cluster is
>>> acting independently and in exactly
>>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the
>>> same flow. However, they will be
>>> pulling in different data and therefore operating on different data.
>>>
>>> So if you pull in 10 1-gig files from S3, each of those files will be
>>> processed on the node that pulled the data
>>> in. NiFi does not currently shuffle data around between nodes in the
>>> cluster (you can use site-to-site to do
>>> this if you want to, but it won't happen automatically). If you set the
>>> number of Concurrent Tasks to 5, then
>>> you will have up to 5 threads running for that processor on each node.
>>>
>>> The only exception to this is the Primary Node. You can schedule a
>>> Processor to run only on the Primary Node
>>> by right-clicking on the Processor, and going to the Configure menu. In
>>> the Scheduling tab, you can change
>>> the Scheduling Strategy to Primary Node Only. In this case, that
>>> Processor will only be triggered to run on
>>> whichever node is elected the Primary Node (this can be changed in the
>>> Cluster management screen by clicking
>>> the appropriate icon in the top-right corner of the UI).
>>>
>>> The GetFile/PutFile will run on all nodes (unless you schedule it to run
>>> on primary node only).
>>>
>>> If you are attempting to have a single input running HTTP and then push
>>> that out across the entire cluster to
>>> process the data, you would have a few options. First, you could just use
>>> an HTTP Load Balancer in front of NiFi.
>>> The other option would be to have a ListenHTTP processor run on Primary
>>> Node only and then use Site-to-Site
>>> to distribute the data to other nodes.
>>>
>>> For more info on site-to-site, you can see the Site-to-Site section of
>>> the User Guide at
>>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>>>
>>> If you have any more questions, let us know!
>>>
>>> Thanks
>>> -Mark
>>>
>>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com>> wrote:
>>>
>>> Nifi Team – I would like to understand the advantages of Nifi clustering
>>> setup.
>>>
>>> Questions :
>>>
>>>  - How does workflow work on multiple nodes ? Does it share the resources
>>> intra nodes ?
>>> Lets say I need to pull data 10 1Gig files from S3, how does work load
>>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?
>>>
>>>  - How to “isolate” the processor to the master node (or one node)?
>>>
>>> - Getfile/Putfile processors on cluster setup, does it get/put on primary
>>> node ? How do I force processor to look in one of the slave node?
>>>
>>> - How can we have a workflow where the input side we want to receive
>>> requests (http) and then the rest of the pipeline need to run in parallel on
>>> all the nodes ?
>>>
>>> Thanks,
>>> -Chakro
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>
>>
>> ________________________________
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended recipient,
>> please contact the sender by reply email and destroy all copies of the
>> original message.
>> ________________________________
>
>
>
> --
> Sent from Gmail Mobile
> ________________________________
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of the
> original message.
> ________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________



________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Nifi cluster features - Questions

Posted by Chakrader Dewaragatla <Ch...@lifelock.com>.
Thanks Matthew and Mark. Below examples are very helpful.
I still need to debug why site-to-site is sending data to one slave instead of two.

-Chakri

From: Matthew Clarke <ma...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Monday, January 11, 2016 at 2:05 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,
     Your original option was to use a RPG (Site-to-Site) to redistribute data received on the primary Node only to every node in your cluster in a load-balanced fashion. In this setup every connected Node will get a portion of the data and there is no way to specify which node gets what particular piece of data.  This dataflwo would look like this:
[Inline image 1]
    Data is received by the listenHTTP processor on the primary Node only.  It is then sent to a RPG that was added using the URL of the NCM for your 10 Node cluster (the cluster also running the before mentioned primary node listenHTTP processor). There will also be an input port which runs on every Node.  each Node will receive a load-balanced distribution of the data sent to that RPG.

Hope these illustration help.

Thanks,
Matt

On Mon, Jan 11, 2016 at 4:50 PM, Matthew Clarke <ma...@gmail.com>> wrote:
Chakri,
        All data is received on the primary Node only via the initial listenHTTP.  Some routing tales place to send some data to a particular 5 nodes and other data to the other 5 nodes.  The postHTTP processor are configured to send to a specific Node in your cluster using the same target port number. A single ListenHTTP processor lives then runs on every Node configured to use that target port number.

Thanks,
Matt

On Mon, Jan 11, 2016 at 4:47 PM, Matthew Clarke <ma...@gmail.com>> wrote:
Chakri,
            What Mark is saying is NiFI Remote Process Group (RPG) also known as Site-to-Site will load-balance delivery data to all nodes in a cluster.  It can not be configured to balance data to only a subset of a nodes in a cluster.  If this is the strategy you want to deploy, a different approach must be taken (one that does not use Site-to-Site).  Here is a NiFI diagram of one such approach using your example of a 10 node cluster:

[Inline image 1]



On Mon, Jan 11, 2016 at 4:16 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:
Mark - Correct me if I understood right.

Curl post from some application —> Configure Listen http (on primary node) --> Post http with Data flow file (On primary node?)  --> Post to site-to-site end point —> This intern distribute load to both slaves.

Thanks,
-Chakri

From: Mark Payne <ma...@hotmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Monday, January 11, 2016 at 12:29 PM

To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

At this time, your only options are to run the processors on all nodes or a single node (Primary Node). There's no way to really group nodes together and say "only run on this set of nodes."

One option is to have a ListenHTTP Processor and then push data to that NiFi via PostHTTP (configure it to send FlowFile attributes along). By doing this, you could set up the sending NiFi
to only deliver data to two nodes. You could then have a different set of data going to a different two nodes, etc. by the way that you configure which data goes to which PostHTTP Processor.

Does this give you what you need?


On Jan 11, 2016, at 3:20 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Thanks Mark. I will look into it.

Couple of questions:


  *
Going back to my earlier question, In a nifi cluster with two slaves and NCM how do I make two slaves accept and process the incoming flowfile in distibuted fashion. Site to site is the only way to go ?
In our use case, we have http listener running on primary node and putfile processor should run on two slaves in distributed fashion.

It is more like a new (or existing) feature.
 - In a nifi cluster setup, can we group the machines and set site-to-site to individual group.
 For instance I have 10 node cluster, can I group them into 5 groups with two nodes each. Run processors on dedicated group (using site to site or other means).

Thanks,
-Chakri

From: Mark Payne <ma...@hotmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Monday, January 11, 2016 at 5:24 AM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

This line in the logs is particularly interesting (on primary node):

2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
Node[i-c894e249.dev.aws.lifelock.ad:0<http://i-c894e249.dev.aws.lifelock.ad:0>] will receive 100.0% of data

This indicates that all of the site-to-site data will go to the host i-c894e249.dev.aws.lifelock.ad<http://i-c894e249.dev.aws.lifelock.ad>. Moreover, because that is the only node listed, this means
that the NCM responded, indicating that this is the only node in the cluster that is currently connected and has site-to-site enabled. Can you double-check the nifi.properties
file on the Primary Node and verify that the "nifi.remote.input.socket.port" is property is specified, and that the "nifi.remote.input.secure" property is set to "false"?
Of note is that if the "nifi.remote.input.secure" property is set to true, but keystore and truststore are not specified, then site-to-site will be disabled (there would be a warning
in the log in this case).

If you can verify that both of those properties are set properly on both nodes, then we can delve in further, but probably best to start by double-checking the easy things :)

Thanks
-Mark


On Jan 10, 2016, at 5:55 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Bryan – Here are the logs :
I have 5 sec flow file.

On primary node (No data coming in)

2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis
2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:39,450 INFO [pool-26-thread-16] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1 threads
2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
Node[i-c894e249.dev.aws.lifelock.ad:0<http://i-c894e249.dev.aws.lifelock.ad:0>] will receive 100.0% of data
2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis
2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]<http://10.228.68.73:8080/nifi%5D> Successfully sent [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec
2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis
2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]<http://10.228.68.73:8080/nifi%5D> Successfully sent [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 51 milliseconds at a rate of 391 bytes/sec
2016-01-10 22:52:45,092 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73
2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes) in 61 millis
2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis
2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 301
2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/295.prov in 111 milliseconds
2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 8 records
2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]<http://10.228.68.73:8080/nifi%5D> Successfully sent [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec
2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis
2016-01-10 22:52:54,326 INFO [NiFi Web Server-22] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa] to run
2016-01-10 22:52:54,353 INFO [NiFi Web Server-26] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run
2016-01-10 22:52:54,377 INFO [NiFi Web Server-25] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run
2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis
2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis


On Secondary node (Data coming in)

2016-01-10 22:52:43,896 INFO [pool-18-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 88 milliseconds
2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec
2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20] at location /root/putt/275243509297560
2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 17037
2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/17031.prov in 56 milliseconds
2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 10 records
2016-01-10 22:52:45,034 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73
2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes) in 76 millis
2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis
2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec
2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20] at location /root/putt/275248510432074
2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis
From: Bryan Bende <bb...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Sunday, January 10, 2016 at 2:43 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

Glad you got site-to-site working.

Regarding the data distribution, I'm not sure why it is behaving that way. I just did a similar test running ncm, node1, and node2 all on my local machine, with GenerateFlowFile running every 10 seconds, and Input Port going to a LogAttribute, and I see it alternating between node1 and node2 logs every 10 seconds.

Is there anything in your primary node logs (primary_node/logs/nifi-app.log) when you see the data on the other node?

-Bryan


On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com>> wrote:
Chakri,

Would love to hear what you've learned and how that differed from the
docs themselves.  Site-to-site has proven difficult to setup so we're
clearly not there yet in having the right operator/admin experience.

Thanks
Joe

On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
<Ch...@lifelock.com>> wrote:
> I was able to get site-to-site work.
> I tried to follow your instructions to send data distribute across the
> nodes.
>
> GenerateFlowFile (On Primary) —> RPG
> RPG —> Input Port   —> Putfile (Time driven scheduling)
>
> However, data is only written to one slave (Secondary slave). Primary slave
> has not data.
>
> Image screenshot :
> http://tinyurl.com/jjvjtmq
>
> From: Chakrader Dewaragatla <ch...@lifelock.com>>
> Date: Sunday, January 10, 2016 at 11:26 AM
>
> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
>
> Bryan – Thanks – I am trying to setup site-to-site.
> I have two slaves and one NCM.
>
> My properties as follows :
>
> On both Slaves:
>
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> On NCM:
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> When I try drop remote process group (with http://<NCM<http://%3CNCM> IP>:8080/nifi), I see
> error as follows for two nodes.
>
> [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
> [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
>
> Do you have insight why its trying to connecting 8080 on slaves ? When do
> 10880 port come into the picture ? I remember try setting site to site few
> months back and succeeded.
>
> Thanks,
> -Chakri
>
>
>
> From: Bryan Bende <bb...@gmail.com>>
> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Date: Saturday, January 9, 2016 at 11:22 AM
> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
>
> The sending node (where the remote process group is) will distribute the
> data evenly across the two nodes, so an individual file will only be sent to
> one of the nodes. You could think of it as if a separate NiFi instance was
> sending directly to a two node cluster, it would be evenly distributing the
> data across the two nodes. In this case it just so happens to all be with in
> the same cluster.
>
> The most common use case for this scenario is the List and Fetch processors
> like HDFS. You can perform the listing on primary node, and then distribute
> the results so the fetching takes place on all nodes.
>
> On Saturday, January 9, 2016, Chakrader Dewaragatla
> <Ch...@lifelock.com>> wrote:
>>
>> Bryan – Thanks, how do the nodes distribute the load for a input port. As
>> port is open and listening on two nodes,  does it copy same files on both
>> the nodes?
>> I need to try this setup to see the results, appreciate your help.
>>
>> Thanks,
>> -Chakri
>>
>> From: Bryan Bende <bb...@gmail.com>>
>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>> Date: Friday, January 8, 2016 at 3:44 PM
>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Hi Chakri,
>>
>> I believe the DistributeLoad processor is more for load balancing when
>> sending to downstream systems. For example, if you had two HTTP endpoints,
>> you could have the first relationship from DistributeLoad going to a
>> PostHTTP that posts to endpoint #1, and the second relationship going to a
>> second PostHTTP that goes to endpoint #2.
>>
>> If you want to distribute the data with in the cluster, then you need to
>> use site-to-site. The way you do this is the following...
>>
>> - Add an Input Port connected to your PutFile.
>> - Add GenerateFlowFile scheduled on primary node only, connected to a
>> Remote Process Group. The Remote Process Group should be connected to the
>> Input Port from the previous step.
>>
>> So both nodes have an input port listening for data, but only the primary
>> node produces a FlowFile and sends it to the RPG which then re-distributes
>> it back to one of the Input Ports.
>>
>> In order for this to work you need to set nifi.remote.input.socket.port in
>> nifi.properties to some available port, and you probably want
>> nifi.remote.input.secure=false for testing.
>>
>> -Bryan
>>
>>
>> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>> <Ch...@lifelock.com>> wrote:
>>>
>>> Mark – I have setup a two node cluster and tried the following .
>>>  GenrateFlowfile processor (Run only on primary node) —> DistributionLoad
>>> processor (RoundRobin)   —> PutFile
>>>
>>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
>>> >> run on primary node only).
>>> From your above comment, It should put file on two nodes. It put files on
>>> primary node only. Any thoughts ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>>
>>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Date: Wednesday, October 7, 2015 at 11:28 AM
>>>
>>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Chakri,
>>>
>>> Correct - when NiFi instances are clustered, they do not transfer data
>>> between the nodes. This is very different
>>> than you might expect from something like Storm or Spark, as the key
>>> goals and design are quite different.
>>> We have discussed providing the ability to allow the user to indicate
>>> that they want to have the framework
>>> do load balancing for specific connections in the background, but it's
>>> still in more of a discussion phase.
>>>
>>> Site-to-Site is simply the capability that we have developed to transfer
>>> data between one instance of
>>> NiFi and another instance of NiFi. So currently, if we want to do load
>>> balancing across the cluster, we would
>>> create a site-to-site connection (by dragging a Remote Process Group onto
>>> the graph) and give that
>>> site-to-site connection the URL of our cluster. That way, you can push
>>> data to your own cluster, effectively
>>> providing a load balancing capability.
>>>
>>> If you were to just run ListenHTTP without setting it to Primary Node,
>>> then every node in the cluster will be listening
>>> for incoming HTTP connections. So you could then use a simple load
>>> balancer in front of NiFi to distribute the load
>>> across your cluster.
>>>
>>> Does this help? If you have any more questions we're happy to help!
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com>> wrote:
>>>
>>> Mark - Thanks for the notes.
>>>
>>> >> The other option would be to have a ListenHTTP processor run on
>>> >> Primary Node only and then use Site-to-Site to distribute the data to other
>>> >> nodes.
>>> Lets say I have 5 node cluster and ListenHTTP processor on Primary node,
>>> collected data on primary node is not transfered to other nodes by default
>>> for processing despite all nodes are part of one cluster?
>>> If ListenHTTP processor is running  as a dafult (with out explicit
>>> setting to run on primary node), how does the data transferred to rest of
>>> the nodes? Does site-to-site come in play when I make one processor to run
>>> on primary node ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>>
>>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Date: Wednesday, October 7, 2015 at 7:00 AM
>>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Hello Chakro,
>>>
>>> When you create a cluster of NiFi instances, each node in the cluster is
>>> acting independently and in exactly
>>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the
>>> same flow. However, they will be
>>> pulling in different data and therefore operating on different data.
>>>
>>> So if you pull in 10 1-gig files from S3, each of those files will be
>>> processed on the node that pulled the data
>>> in. NiFi does not currently shuffle data around between nodes in the
>>> cluster (you can use site-to-site to do
>>> this if you want to, but it won't happen automatically). If you set the
>>> number of Concurrent Tasks to 5, then
>>> you will have up to 5 threads running for that processor on each node.
>>>
>>> The only exception to this is the Primary Node. You can schedule a
>>> Processor to run only on the Primary Node
>>> by right-clicking on the Processor, and going to the Configure menu. In
>>> the Scheduling tab, you can change
>>> the Scheduling Strategy to Primary Node Only. In this case, that
>>> Processor will only be triggered to run on
>>> whichever node is elected the Primary Node (this can be changed in the
>>> Cluster management screen by clicking
>>> the appropriate icon in the top-right corner of the UI).
>>>
>>> The GetFile/PutFile will run on all nodes (unless you schedule it to run
>>> on primary node only).
>>>
>>> If you are attempting to have a single input running HTTP and then push
>>> that out across the entire cluster to
>>> process the data, you would have a few options. First, you could just use
>>> an HTTP Load Balancer in front of NiFi.
>>> The other option would be to have a ListenHTTP processor run on Primary
>>> Node only and then use Site-to-Site
>>> to distribute the data to other nodes.
>>>
>>> For more info on site-to-site, you can see the Site-to-Site section of
>>> the User Guide at
>>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>>>
>>> If you have any more questions, let us know!
>>>
>>> Thanks
>>> -Mark
>>>
>>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com>> wrote:
>>>
>>> Nifi Team – I would like to understand the advantages of Nifi clustering
>>> setup.
>>>
>>> Questions :
>>>
>>>  - How does workflow work on multiple nodes ? Does it share the resources
>>> intra nodes ?
>>> Lets say I need to pull data 10 1Gig files from S3, how does work load
>>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?
>>>
>>>  - How to “isolate” the processor to the master node (or one node)?
>>>
>>> - Getfile/Putfile processors on cluster setup, does it get/put on primary
>>> node ? How do I force processor to look in one of the slave node?
>>>
>>> - How can we have a workflow where the input side we want to receive
>>> requests (http) and then the rest of the pipeline need to run in parallel on
>>> all the nodes ?
>>>
>>> Thanks,
>>> -Chakro
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>
>>
>> ________________________________
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended recipient,
>> please contact the sender by reply email and destroy all copies of the
>> original message.
>> ________________________________
>
>
>
> --
> Sent from Gmail Mobile
> ________________________________
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of the
> original message.
> ________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________



________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Nifi cluster features - Questions

Posted by Matthew Clarke <ma...@gmail.com>.
Chakri,
     Your original option was to use a RPG (Site-to-Site) to redistribute
data received on the primary Node only to every node in your cluster in a
load-balanced fashion. In this setup every connected Node will get a
portion of the data and there is no way to specify which node gets what
particular piece of data.  This dataflwo would look like this:
[image: Inline image 1]
    Data is received by the listenHTTP processor on the primary Node only.
It is then sent to a RPG that was added using the URL of the NCM for your
10 Node cluster (the cluster also running the before mentioned primary node
listenHTTP processor). There will also be an input port which runs on every
Node.  each Node will receive a load-balanced distribution of the data sent
to that RPG.

Hope these illustration help.

Thanks,
Matt

On Mon, Jan 11, 2016 at 4:50 PM, Matthew Clarke <ma...@gmail.com>
wrote:

> Chakri,
>         All data is received on the primary Node only via the initial
> listenHTTP.  Some routing tales place to send some data to a particular 5
> nodes and other data to the other 5 nodes.  The postHTTP processor are
> configured to send to a specific Node in your cluster using the same target
> port number. A single ListenHTTP processor lives then runs on every Node
> configured to use that target port number.
>
> Thanks,
> Matt
>
> On Mon, Jan 11, 2016 at 4:47 PM, Matthew Clarke <matt.clarke.138@gmail.com
> > wrote:
>
>> Chakri,
>>             What Mark is saying is NiFI Remote Process Group (RPG) also
>> known as Site-to-Site will load-balance delivery data to all nodes in a
>> cluster.  It can not be configured to balance data to only a subset of a
>> nodes in a cluster.  If this is the strategy you want to deploy, a
>> different approach must be taken (one that does not use Site-to-Site).
>> Here is a NiFI diagram of one such approach using your example of a 10 node
>> cluster:
>>
>> [image: Inline image 1]
>>
>>
>>
>> On Mon, Jan 11, 2016 at 4:16 PM, Chakrader Dewaragatla <
>> Chakrader.Dewaragatla@lifelock.com> wrote:
>>
>>> Mark - Correct me if I understood right.
>>>
>>> Curl post from some application —> Configure Listen http (on primary
>>> node) --> Post http with Data flow file (On primary node?)  --> Post to
>>> site-to-site end point —> This intern distribute load to both slaves.
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>
>>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> Date: Monday, January 11, 2016 at 12:29 PM
>>>
>>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Chakri,
>>>
>>> At this time, your only options are to run the processors on all nodes
>>> or a single node (Primary Node). There's no way to really group nodes
>>> together and say "only run on this set of nodes."
>>>
>>> One option is to have a ListenHTTP Processor and then push data to that
>>> NiFi via PostHTTP (configure it to send FlowFile attributes along). By
>>> doing this, you could set up the sending NiFi
>>> to only deliver data to two nodes. You could then have a different set
>>> of data going to a different two nodes, etc. by the way that you configure
>>> which data goes to which PostHTTP Processor.
>>>
>>> Does this give you what you need?
>>>
>>>
>>> On Jan 11, 2016, at 3:20 PM, Chakrader Dewaragatla <
>>> Chakrader.Dewaragatla@lifelock.com> wrote:
>>>
>>> Thanks Mark. I will look into it.
>>>
>>> Couple of questions:
>>>
>>>
>>>    - Going back to my earlier question, In a nifi cluster with two
>>>    slaves and NCM how do I make two slaves accept and process the incoming
>>>    flowfile in distibuted fashion. Site to site is the only way to go ?
>>>    In our use case, we have http listener running on primary node and
>>>    putfile processor should run on two slaves in distributed fashion.
>>>
>>>    It is more like a new (or existing) feature.
>>>     - In a nifi cluster setup, can we group the machines and set
>>>    site-to-site to individual group.
>>>     For instance I have 10 node cluster, can I group them into 5 groups
>>>    with two nodes each. Run processors on dedicated group (using site to site
>>>    or other means).
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>
>>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> Date: Monday, January 11, 2016 at 5:24 AM
>>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Chakri,
>>>
>>> This line in the logs is particularly interesting (on primary node):
>>>
>>> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7]
>>> o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
>>> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
>>>
>>>
>>> This indicates that all of the site-to-site data will go to the host
>>> i-c894e249.dev.aws.lifelock.ad. Moreover, because that is the only node
>>> listed, this means
>>> that the NCM responded, indicating that this is the only node in the
>>> cluster that is currently connected and has site-to-site enabled. Can you
>>> double-check the nifi.properties
>>> file on the Primary Node and verify that the "
>>> nifi.remote.input.socket.port" is property is specified, and that the "
>>> nifi.remote.input.secure" property is set to "false"?
>>> Of note is that if the "nifi.remote.input.secure" property is set to
>>> true, but keystore and truststore are not specified, then site-to-site will
>>> be disabled (there would be a warning
>>> in the log in this case).
>>>
>>> If you can verify that both of those properties are set properly on both
>>> nodes, then we can delve in further, but probably best to start by
>>> double-checking the easy things :)
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>> On Jan 10, 2016, at 5:55 PM, Chakrader Dewaragatla <
>>> Chakrader.Dewaragatla@lifelock.com> wrote:
>>>
>>> Bryan – Here are the logs :
>>> I have 5 sec flow file.
>>>
>>> On primary node (No data coming in)
>>>
>>> 2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1]
>>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>>> 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis
>>> 2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2]
>>> o.a.nifi.controller.StandardFlowService Saved flow controller
>>> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
>>> pending = false
>>> 2016-01-10 22:52:39,450 INFO [pool-26-thread-16]
>>> o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled
>>> GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1
>>> threads
>>> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7]
>>> o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
>>> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
>>> 2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2]
>>> o.a.nifi.controller.StandardFlowService Saved flow controller
>>> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
>>> pending = false
>>> 2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2]
>>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>>> 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis
>>> 2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7]
>>> o.a.nifi.remote.StandardRemoteGroupPort
>>> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
>>> Successfully sent
>>> [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim
>>> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
>>> section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]]
>>> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50
>>> milliseconds at a rate of 392 bytes/sec
>>> 2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1]
>>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>>> 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis
>>> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1]
>>> o.a.nifi.remote.StandardRemoteGroupPort
>>> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
>>> Successfully sent
>>> [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim
>>> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
>>> section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]]
>>> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 51
>>> milliseconds at a rate of 391 bytes/sec
>>> 2016-01-10 22:52:45,092 INFO [Process NCM Request-2]
>>> o.a.n.c.p.impl.SocketProtocolListener Received request
>>> 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73
>>> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2]
>>> o.a.nifi.controller.StandardFlowService Received flow request message from
>>> manager.
>>> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2]
>>> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
>>> 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes)
>>> in 61 millis
>>> 2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1]
>>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>>> 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis
>>> 2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3]
>>> o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers
>>> for events starting with ID 301
>>> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2]
>>> o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files
>>> (6 records) into single Provenance Log File
>>> ./provenance_repository/295.prov in 111 milliseconds
>>> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2]
>>> o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance
>>> Event file containing 8 records
>>> 2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10]
>>> o.a.nifi.remote.StandardRemoteGroupPort
>>> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
>>> Successfully sent
>>> [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim
>>> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
>>> section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]]
>>> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50
>>> milliseconds at a rate of 392 bytes/sec
>>> 2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3]
>>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>>> 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis
>>> 2016-01-10 22:52:54,326 INFO [NiFi Web Server-22]
>>> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
>>> StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa]
>>> to run
>>> 2016-01-10 22:52:54,353 INFO [NiFi Web Server-26]
>>> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
>>> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run
>>> 2016-01-10 22:52:54,377 INFO [NiFi Web Server-25]
>>> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
>>> GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run
>>> 2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2]
>>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>>> 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis
>>> 2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2]
>>> o.a.nifi.controller.StandardFlowService Saved flow controller
>>> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
>>> pending = false
>>> 2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1]
>>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>>> 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis
>>>
>>>
>>> On Secondary node (Data coming in)
>>>
>>> 2016-01-10 22:52:43,896 INFO [pool-18-thread-1]
>>> o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile
>>> Repository with 0 records in 88 milliseconds
>>> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3]
>>> o.a.n.r.p.s.SocketFlowFileServerProtocol
>>> SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573]
>>> Successfully received
>>> [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim
>>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>>> section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]]
>>> (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds
>>> at a rate of 387 bytes/sec
>>> 2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1]
>>> o.a.nifi.processors.standard.PutFile
>>> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of
>>> StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim
>>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>>> section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]
>>> at location /root/putt/275243509297560
>>> 2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3]
>>> o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers
>>> for events starting with ID 17037
>>> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1]
>>> o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files
>>> (6 records) into single Provenance Log File
>>> ./provenance_repository/17031.prov in 56 milliseconds
>>> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1]
>>> o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance
>>> Event file containing 10 records
>>> 2016-01-10 22:52:45,034 INFO [Process NCM Request-2]
>>> o.a.n.c.p.impl.SocketProtocolListener Received request
>>> e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73
>>> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2]
>>> o.a.nifi.controller.StandardFlowService Received flow request message from
>>> manager.
>>> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2]
>>> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
>>> e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes)
>>> in 76 millis
>>> 2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2]
>>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>>> 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis
>>> 2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6]
>>> o.a.n.r.p.s.SocketFlowFileServerProtocol
>>> SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573]
>>> Successfully received
>>> [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim
>>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>>> section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]]
>>> (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds
>>> at a rate of 387 bytes/sec
>>> 2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8]
>>> o.a.nifi.processors.standard.PutFile
>>> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of
>>> StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim
>>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>>> section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]
>>> at location /root/putt/275248510432074
>>> 2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1]
>>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>>> 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis
>>> From: Bryan Bende <bb...@gmail.com>
>>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> Date: Sunday, January 10, 2016 at 2:43 PM
>>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Chakri,
>>>
>>> Glad you got site-to-site working.
>>>
>>> Regarding the data distribution, I'm not sure why it is behaving that
>>> way. I just did a similar test running ncm, node1, and node2 all on my
>>> local machine, with GenerateFlowFile running every 10 seconds, and Input
>>> Port going to a LogAttribute, and I see it alternating between node1 and
>>> node2 logs every 10 seconds.
>>>
>>> Is there anything in your primary node logs
>>> (primary_node/logs/nifi-app.log) when you see the data on the other node?
>>>
>>> -Bryan
>>>
>>>
>>> On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com> wrote:
>>>
>>>> Chakri,
>>>>
>>>> Would love to hear what you've learned and how that differed from the
>>>> docs themselves.  Site-to-site has proven difficult to setup so we're
>>>> clearly not there yet in having the right operator/admin experience.
>>>>
>>>> Thanks
>>>> Joe
>>>>
>>>> On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
>>>> <Ch...@lifelock.com> wrote:
>>>> > I was able to get site-to-site work.
>>>> > I tried to follow your instructions to send data distribute across the
>>>> > nodes.
>>>> >
>>>> > GenerateFlowFile (On Primary) —> RPG
>>>> > RPG —> Input Port   —> Putfile (Time driven scheduling)
>>>> >
>>>> > However, data is only written to one slave (Secondary slave). Primary
>>>> slave
>>>> > has not data.
>>>> >
>>>> > Image screenshot :
>>>> > http://tinyurl.com/jjvjtmq
>>>> >
>>>> > From: Chakrader Dewaragatla <ch...@lifelock.com>
>>>> > Date: Sunday, January 10, 2016 at 11:26 AM
>>>> >
>>>> > To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>>> > Subject: Re: Nifi cluster features - Questions
>>>> >
>>>> > Bryan – Thanks – I am trying to setup site-to-site.
>>>> > I have two slaves and one NCM.
>>>> >
>>>> > My properties as follows :
>>>> >
>>>> > On both Slaves:
>>>> >
>>>> > nifi.remote.input.socket.port=10880
>>>> > nifi.remote.input.secure=false
>>>> >
>>>> > On NCM:
>>>> > nifi.remote.input.socket.port=10880
>>>> > nifi.remote.input.secure=false
>>>> >
>>>> > When I try drop remote process group (with http://<NCM
>>>> IP>:8080/nifi), I see
>>>> > error as follows for two nodes.
>>>> >
>>>> > [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
>>>> > communication
>>>> > [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
>>>> > communication
>>>> >
>>>> > Do you have insight why its trying to connecting 8080 on slaves ?
>>>> When do
>>>> > 10880 port come into the picture ? I remember try setting site to
>>>> site few
>>>> > months back and succeeded.
>>>> >
>>>> > Thanks,
>>>> > -Chakri
>>>> >
>>>> >
>>>> >
>>>> > From: Bryan Bende <bb...@gmail.com>
>>>> > Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>>> > Date: Saturday, January 9, 2016 at 11:22 AM
>>>> > To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>>> > Subject: Re: Nifi cluster features - Questions
>>>> >
>>>> > The sending node (where the remote process group is) will distribute
>>>> the
>>>> > data evenly across the two nodes, so an individual file will only be
>>>> sent to
>>>> > one of the nodes. You could think of it as if a separate NiFi
>>>> instance was
>>>> > sending directly to a two node cluster, it would be evenly
>>>> distributing the
>>>> > data across the two nodes. In this case it just so happens to all be
>>>> with in
>>>> > the same cluster.
>>>> >
>>>> > The most common use case for this scenario is the List and Fetch
>>>> processors
>>>> > like HDFS. You can perform the listing on primary node, and then
>>>> distribute
>>>> > the results so the fetching takes place on all nodes.
>>>> >
>>>> > On Saturday, January 9, 2016, Chakrader Dewaragatla
>>>> > <Ch...@lifelock.com> wrote:
>>>> >>
>>>> >> Bryan – Thanks, how do the nodes distribute the load for a input
>>>> port. As
>>>> >> port is open and listening on two nodes,  does it copy same files on
>>>> both
>>>> >> the nodes?
>>>> >> I need to try this setup to see the results, appreciate your help.
>>>> >>
>>>> >> Thanks,
>>>> >> -Chakri
>>>> >>
>>>> >> From: Bryan Bende <bb...@gmail.com>
>>>> >> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>>> >> Date: Friday, January 8, 2016 at 3:44 PM
>>>> >> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>>> >> Subject: Re: Nifi cluster features - Questions
>>>> >>
>>>> >> Hi Chakri,
>>>> >>
>>>> >> I believe the DistributeLoad processor is more for load balancing
>>>> when
>>>> >> sending to downstream systems. For example, if you had two HTTP
>>>> endpoints,
>>>> >> you could have the first relationship from DistributeLoad going to a
>>>> >> PostHTTP that posts to endpoint #1, and the second relationship
>>>> going to a
>>>> >> second PostHTTP that goes to endpoint #2.
>>>> >>
>>>> >> If you want to distribute the data with in the cluster, then you
>>>> need to
>>>> >> use site-to-site. The way you do this is the following...
>>>> >>
>>>> >> - Add an Input Port connected to your PutFile.
>>>> >> - Add GenerateFlowFile scheduled on primary node only, connected to a
>>>> >> Remote Process Group. The Remote Process Group should be connected
>>>> to the
>>>> >> Input Port from the previous step.
>>>> >>
>>>> >> So both nodes have an input port listening for data, but only the
>>>> primary
>>>> >> node produces a FlowFile and sends it to the RPG which then
>>>> re-distributes
>>>> >> it back to one of the Input Ports.
>>>> >>
>>>> >> In order for this to work you need to set
>>>> nifi.remote.input.socket.port in
>>>> >> nifi.properties to some available port, and you probably want
>>>> >> nifi.remote.input.secure=false for testing.
>>>> >>
>>>> >> -Bryan
>>>> >>
>>>> >>
>>>> >> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>>>> >> <Ch...@lifelock.com> wrote:
>>>> >>>
>>>> >>> Mark – I have setup a two node cluster and tried the following .
>>>> >>>  GenrateFlowfile processor (Run only on primary node) —>
>>>> DistributionLoad
>>>> >>> processor (RoundRobin)   —> PutFile
>>>> >>>
>>>> >>> >> The GetFile/PutFile will run on all nodes (unless you schedule
>>>> it to
>>>> >>> >> run on primary node only).
>>>> >>> From your above comment, It should put file on two nodes. It put
>>>> files on
>>>> >>> primary node only. Any thoughts ?
>>>> >>>
>>>> >>> Thanks,
>>>> >>> -Chakri
>>>> >>>
>>>> >>> From: Mark Payne <ma...@hotmail.com>
>>>> >>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>>> >>> Date: Wednesday, October 7, 2015 at 11:28 AM
>>>> >>>
>>>> >>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>>> >>> Subject: Re: Nifi cluster features - Questions
>>>> >>>
>>>> >>> Chakri,
>>>> >>>
>>>> >>> Correct - when NiFi instances are clustered, they do not transfer
>>>> data
>>>> >>> between the nodes. This is very different
>>>> >>> than you might expect from something like Storm or Spark, as the key
>>>> >>> goals and design are quite different.
>>>> >>> We have discussed providing the ability to allow the user to
>>>> indicate
>>>> >>> that they want to have the framework
>>>> >>> do load balancing for specific connections in the background, but
>>>> it's
>>>> >>> still in more of a discussion phase.
>>>> >>>
>>>> >>> Site-to-Site is simply the capability that we have developed to
>>>> transfer
>>>> >>> data between one instance of
>>>> >>> NiFi and another instance of NiFi. So currently, if we want to do
>>>> load
>>>> >>> balancing across the cluster, we would
>>>> >>> create a site-to-site connection (by dragging a Remote Process
>>>> Group onto
>>>> >>> the graph) and give that
>>>> >>> site-to-site connection the URL of our cluster. That way, you can
>>>> push
>>>> >>> data to your own cluster, effectively
>>>> >>> providing a load balancing capability.
>>>> >>>
>>>> >>> If you were to just run ListenHTTP without setting it to Primary
>>>> Node,
>>>> >>> then every node in the cluster will be listening
>>>> >>> for incoming HTTP connections. So you could then use a simple load
>>>> >>> balancer in front of NiFi to distribute the load
>>>> >>> across your cluster.
>>>> >>>
>>>> >>> Does this help? If you have any more questions we're happy to help!
>>>> >>>
>>>> >>> Thanks
>>>> >>> -Mark
>>>> >>>
>>>> >>>
>>>> >>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>>>> >>> <Ch...@lifelock.com> wrote:
>>>> >>>
>>>> >>> Mark - Thanks for the notes.
>>>> >>>
>>>> >>> >> The other option would be to have a ListenHTTP processor run on
>>>> >>> >> Primary Node only and then use Site-to-Site to distribute the
>>>> data to other
>>>> >>> >> nodes.
>>>> >>> Lets say I have 5 node cluster and ListenHTTP processor on Primary
>>>> node,
>>>> >>> collected data on primary node is not transfered to other nodes by
>>>> default
>>>> >>> for processing despite all nodes are part of one cluster?
>>>> >>> If ListenHTTP processor is running  as a dafult (with out explicit
>>>> >>> setting to run on primary node), how does the data transferred to
>>>> rest of
>>>> >>> the nodes? Does site-to-site come in play when I make one processor
>>>> to run
>>>> >>> on primary node ?
>>>> >>>
>>>> >>> Thanks,
>>>> >>> -Chakri
>>>> >>>
>>>> >>> From: Mark Payne <ma...@hotmail.com>
>>>> >>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>>> >>> Date: Wednesday, October 7, 2015 at 7:00 AM
>>>> >>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>>> >>> Subject: Re: Nifi cluster features - Questions
>>>> >>>
>>>> >>> Hello Chakro,
>>>> >>>
>>>> >>> When you create a cluster of NiFi instances, each node in the
>>>> cluster is
>>>> >>> acting independently and in exactly
>>>> >>> the same way. I.e., if you have 5 nodes, all 5 nodes will run
>>>> exactly the
>>>> >>> same flow. However, they will be
>>>> >>> pulling in different data and therefore operating on different data.
>>>> >>>
>>>> >>> So if you pull in 10 1-gig files from S3, each of those files will
>>>> be
>>>> >>> processed on the node that pulled the data
>>>> >>> in. NiFi does not currently shuffle data around between nodes in the
>>>> >>> cluster (you can use site-to-site to do
>>>> >>> this if you want to, but it won't happen automatically). If you set
>>>> the
>>>> >>> number of Concurrent Tasks to 5, then
>>>> >>> you will have up to 5 threads running for that processor on each
>>>> node.
>>>> >>>
>>>> >>> The only exception to this is the Primary Node. You can schedule a
>>>> >>> Processor to run only on the Primary Node
>>>> >>> by right-clicking on the Processor, and going to the Configure
>>>> menu. In
>>>> >>> the Scheduling tab, you can change
>>>> >>> the Scheduling Strategy to Primary Node Only. In this case, that
>>>> >>> Processor will only be triggered to run on
>>>> >>> whichever node is elected the Primary Node (this can be changed in
>>>> the
>>>> >>> Cluster management screen by clicking
>>>> >>> the appropriate icon in the top-right corner of the UI).
>>>> >>>
>>>> >>> The GetFile/PutFile will run on all nodes (unless you schedule it
>>>> to run
>>>> >>> on primary node only).
>>>> >>>
>>>> >>> If you are attempting to have a single input running HTTP and then
>>>> push
>>>> >>> that out across the entire cluster to
>>>> >>> process the data, you would have a few options. First, you could
>>>> just use
>>>> >>> an HTTP Load Balancer in front of NiFi.
>>>> >>> The other option would be to have a ListenHTTP processor run on
>>>> Primary
>>>> >>> Node only and then use Site-to-Site
>>>> >>> to distribute the data to other nodes.
>>>> >>>
>>>> >>> For more info on site-to-site, you can see the Site-to-Site section
>>>> of
>>>> >>> the User Guide at
>>>> >>>
>>>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>>>> >>>
>>>> >>> If you have any more questions, let us know!
>>>> >>>
>>>> >>> Thanks
>>>> >>> -Mark
>>>> >>>
>>>> >>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>>>> >>> <Ch...@lifelock.com> wrote:
>>>> >>>
>>>> >>> Nifi Team – I would like to understand the advantages of Nifi
>>>> clustering
>>>> >>> setup.
>>>> >>>
>>>> >>> Questions :
>>>> >>>
>>>> >>>  - How does workflow work on multiple nodes ? Does it share the
>>>> resources
>>>> >>> intra nodes ?
>>>> >>> Lets say I need to pull data 10 1Gig files from S3, how does work
>>>> load
>>>> >>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks
>>>> per node ?
>>>> >>>
>>>> >>>  - How to “isolate” the processor to the master node (or one node)?
>>>> >>>
>>>> >>> - Getfile/Putfile processors on cluster setup, does it get/put on
>>>> primary
>>>> >>> node ? How do I force processor to look in one of the slave node?
>>>> >>>
>>>> >>> - How can we have a workflow where the input side we want to receive
>>>> >>> requests (http) and then the rest of the pipeline need to run in
>>>> parallel on
>>>> >>> all the nodes ?
>>>> >>>
>>>> >>> Thanks,
>>>> >>> -Chakro
>>>> >>>
>>>> >>> ________________________________
>>>> >>> The information contained in this transmission may contain
>>>> privileged and
>>>> >>> confidential information. It is intended only for the use of the
>>>> person(s)
>>>> >>> named above. If you are not the intended recipient, you are hereby
>>>> notified
>>>> >>> that any review, dissemination, distribution or duplication of this
>>>> >>> communication is strictly prohibited. If you are not the intended
>>>> recipient,
>>>> >>> please contact the sender by reply email and destroy all copies of
>>>> the
>>>> >>> original message.
>>>> >>> ________________________________
>>>> >>>
>>>> >>>
>>>> >>> ________________________________
>>>> >>> The information contained in this transmission may contain
>>>> privileged and
>>>> >>> confidential information. It is intended only for the use of the
>>>> person(s)
>>>> >>> named above. If you are not the intended recipient, you are hereby
>>>> notified
>>>> >>> that any review, dissemination, distribution or duplication of this
>>>> >>> communication is strictly prohibited. If you are not the intended
>>>> recipient,
>>>> >>> please contact the sender by reply email and destroy all copies of
>>>> the
>>>> >>> original message.
>>>> >>> ________________________________
>>>> >>>
>>>> >>>
>>>> >>> ________________________________
>>>> >>> The information contained in this transmission may contain
>>>> privileged and
>>>> >>> confidential information. It is intended only for the use of the
>>>> person(s)
>>>> >>> named above. If you are not the intended recipient, you are hereby
>>>> notified
>>>> >>> that any review, dissemination, distribution or duplication of this
>>>> >>> communication is strictly prohibited. If you are not the intended
>>>> recipient,
>>>> >>> please contact the sender by reply email and destroy all copies of
>>>> the
>>>> >>> original message.
>>>> >>> ________________________________
>>>> >>
>>>> >>
>>>> >> ________________________________
>>>> >> The information contained in this transmission may contain
>>>> privileged and
>>>> >> confidential information. It is intended only for the use of the
>>>> person(s)
>>>> >> named above. If you are not the intended recipient, you are hereby
>>>> notified
>>>> >> that any review, dissemination, distribution or duplication of this
>>>> >> communication is strictly prohibited. If you are not the intended
>>>> recipient,
>>>> >> please contact the sender by reply email and destroy all copies of
>>>> the
>>>> >> original message.
>>>> >> ________________________________
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Sent from Gmail Mobile
>>>> > ________________________________
>>>> > The information contained in this transmission may contain privileged
>>>> and
>>>> > confidential information. It is intended only for the use of the
>>>> person(s)
>>>> > named above. If you are not the intended recipient, you are hereby
>>>> notified
>>>> > that any review, dissemination, distribution or duplication of this
>>>> > communication is strictly prohibited. If you are not the intended
>>>> recipient,
>>>> > please contact the sender by reply email and destroy all copies of the
>>>> > original message.
>>>> > ________________________________
>>>>
>>>
>>> ------------------------------
>>> The information contained in this transmission may contain privileged
>>> and confidential information. It is intended only for the use of the
>>> person(s) named above. If you are not the intended recipient, you are
>>> hereby notified that any review, dissemination, distribution or duplication
>>> of this communication is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender by reply email and destroy all copies
>>> of the original message.
>>> ------------------------------
>>>
>>>
>>> ------------------------------
>>> The information contained in this transmission may contain privileged
>>> and confidential information. It is intended only for the use of the
>>> person(s) named above. If you are not the intended recipient, you are
>>> hereby notified that any review, dissemination, distribution or duplication
>>> of this communication is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender by reply email and destroy all copies
>>> of the original message.
>>> ------------------------------
>>>
>>>
>>> ------------------------------
>>> The information contained in this transmission may contain privileged
>>> and confidential information. It is intended only for the use of the
>>> person(s) named above. If you are not the intended recipient, you are
>>> hereby notified that any review, dissemination, distribution or duplication
>>> of this communication is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender by reply email and destroy all copies
>>> of the original message.
>>> ------------------------------
>>>
>>
>>
>

Re: Nifi cluster features - Questions

Posted by Matthew Clarke <ma...@gmail.com>.
Chakri,
        All data is received on the primary Node only via the initial
listenHTTP.  Some routing tales place to send some data to a particular 5
nodes and other data to the other 5 nodes.  The postHTTP processor are
configured to send to a specific Node in your cluster using the same target
port number. A single ListenHTTP processor lives then runs on every Node
configured to use that target port number.

Thanks,
Matt

On Mon, Jan 11, 2016 at 4:47 PM, Matthew Clarke <ma...@gmail.com>
wrote:

> Chakri,
>             What Mark is saying is NiFI Remote Process Group (RPG) also
> known as Site-to-Site will load-balance delivery data to all nodes in a
> cluster.  It can not be configured to balance data to only a subset of a
> nodes in a cluster.  If this is the strategy you want to deploy, a
> different approach must be taken (one that does not use Site-to-Site).
> Here is a NiFI diagram of one such approach using your example of a 10 node
> cluster:
>
> [image: Inline image 1]
>
>
>
> On Mon, Jan 11, 2016 at 4:16 PM, Chakrader Dewaragatla <
> Chakrader.Dewaragatla@lifelock.com> wrote:
>
>> Mark - Correct me if I understood right.
>>
>> Curl post from some application —> Configure Listen http (on primary
>> node) --> Post http with Data flow file (On primary node?)  --> Post to
>> site-to-site end point —> This intern distribute load to both slaves.
>>
>> Thanks,
>> -Chakri
>>
>> From: Mark Payne <ma...@hotmail.com>
>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> Date: Monday, January 11, 2016 at 12:29 PM
>>
>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Chakri,
>>
>> At this time, your only options are to run the processors on all nodes or
>> a single node (Primary Node). There's no way to really group nodes together
>> and say "only run on this set of nodes."
>>
>> One option is to have a ListenHTTP Processor and then push data to that
>> NiFi via PostHTTP (configure it to send FlowFile attributes along). By
>> doing this, you could set up the sending NiFi
>> to only deliver data to two nodes. You could then have a different set of
>> data going to a different two nodes, etc. by the way that you configure
>> which data goes to which PostHTTP Processor.
>>
>> Does this give you what you need?
>>
>>
>> On Jan 11, 2016, at 3:20 PM, Chakrader Dewaragatla <
>> Chakrader.Dewaragatla@lifelock.com> wrote:
>>
>> Thanks Mark. I will look into it.
>>
>> Couple of questions:
>>
>>
>>    - Going back to my earlier question, In a nifi cluster with two
>>    slaves and NCM how do I make two slaves accept and process the incoming
>>    flowfile in distibuted fashion. Site to site is the only way to go ?
>>    In our use case, we have http listener running on primary node and
>>    putfile processor should run on two slaves in distributed fashion.
>>
>>    It is more like a new (or existing) feature.
>>     - In a nifi cluster setup, can we group the machines and set
>>    site-to-site to individual group.
>>     For instance I have 10 node cluster, can I group them into 5 groups
>>    with two nodes each. Run processors on dedicated group (using site to site
>>    or other means).
>>
>> Thanks,
>> -Chakri
>>
>> From: Mark Payne <ma...@hotmail.com>
>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> Date: Monday, January 11, 2016 at 5:24 AM
>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Chakri,
>>
>> This line in the logs is particularly interesting (on primary node):
>>
>> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7]
>> o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
>> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
>>
>>
>> This indicates that all of the site-to-site data will go to the host
>> i-c894e249.dev.aws.lifelock.ad. Moreover, because that is the only node
>> listed, this means
>> that the NCM responded, indicating that this is the only node in the
>> cluster that is currently connected and has site-to-site enabled. Can you
>> double-check the nifi.properties
>> file on the Primary Node and verify that the "
>> nifi.remote.input.socket.port" is property is specified, and that the "
>> nifi.remote.input.secure" property is set to "false"?
>> Of note is that if the "nifi.remote.input.secure" property is set to
>> true, but keystore and truststore are not specified, then site-to-site will
>> be disabled (there would be a warning
>> in the log in this case).
>>
>> If you can verify that both of those properties are set properly on both
>> nodes, then we can delve in further, but probably best to start by
>> double-checking the easy things :)
>>
>> Thanks
>> -Mark
>>
>>
>> On Jan 10, 2016, at 5:55 PM, Chakrader Dewaragatla <
>> Chakrader.Dewaragatla@lifelock.com> wrote:
>>
>> Bryan – Here are the logs :
>> I have 5 sec flow file.
>>
>> On primary node (No data coming in)
>>
>> 2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis
>> 2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2]
>> o.a.nifi.controller.StandardFlowService Saved flow controller
>> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
>> pending = false
>> 2016-01-10 22:52:39,450 INFO [pool-26-thread-16]
>> o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled
>> GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1
>> threads
>> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7]
>> o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
>> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
>> 2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2]
>> o.a.nifi.controller.StandardFlowService Saved flow controller
>> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
>> pending = false
>> 2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis
>> 2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7]
>> o.a.nifi.remote.StandardRemoteGroupPort
>> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
>> Successfully sent
>> [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
>> section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]]
>> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50
>> milliseconds at a rate of 392 bytes/sec
>> 2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis
>> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1]
>> o.a.nifi.remote.StandardRemoteGroupPort
>> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
>> Successfully sent
>> [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
>> section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]]
>> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 51
>> milliseconds at a rate of 391 bytes/sec
>> 2016-01-10 22:52:45,092 INFO [Process NCM Request-2]
>> o.a.n.c.p.impl.SocketProtocolListener Received request
>> 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73
>> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2]
>> o.a.nifi.controller.StandardFlowService Received flow request message from
>> manager.
>> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2]
>> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
>> 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes)
>> in 61 millis
>> 2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis
>> 2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3]
>> o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers
>> for events starting with ID 301
>> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2]
>> o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files
>> (6 records) into single Provenance Log File
>> ./provenance_repository/295.prov in 111 milliseconds
>> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2]
>> o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance
>> Event file containing 8 records
>> 2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10]
>> o.a.nifi.remote.StandardRemoteGroupPort
>> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
>> Successfully sent
>> [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
>> section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]]
>> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50
>> milliseconds at a rate of 392 bytes/sec
>> 2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis
>> 2016-01-10 22:52:54,326 INFO [NiFi Web Server-22]
>> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
>> StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa]
>> to run
>> 2016-01-10 22:52:54,353 INFO [NiFi Web Server-26]
>> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
>> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run
>> 2016-01-10 22:52:54,377 INFO [NiFi Web Server-25]
>> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
>> GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run
>> 2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis
>> 2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2]
>> o.a.nifi.controller.StandardFlowService Saved flow controller
>> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
>> pending = false
>> 2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis
>>
>>
>> On Secondary node (Data coming in)
>>
>> 2016-01-10 22:52:43,896 INFO [pool-18-thread-1]
>> o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile
>> Repository with 0 records in 88 milliseconds
>> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3]
>> o.a.n.r.p.s.SocketFlowFileServerProtocol
>> SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573]
>> Successfully received
>> [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>> section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]]
>> (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds
>> at a rate of 387 bytes/sec
>> 2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1]
>> o.a.nifi.processors.standard.PutFile
>> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of
>> StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>> section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]
>> at location /root/putt/275243509297560
>> 2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3]
>> o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers
>> for events starting with ID 17037
>> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1]
>> o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files
>> (6 records) into single Provenance Log File
>> ./provenance_repository/17031.prov in 56 milliseconds
>> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1]
>> o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance
>> Event file containing 10 records
>> 2016-01-10 22:52:45,034 INFO [Process NCM Request-2]
>> o.a.n.c.p.impl.SocketProtocolListener Received request
>> e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73
>> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2]
>> o.a.nifi.controller.StandardFlowService Received flow request message from
>> manager.
>> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2]
>> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
>> e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes)
>> in 76 millis
>> 2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis
>> 2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6]
>> o.a.n.r.p.s.SocketFlowFileServerProtocol
>> SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573]
>> Successfully received
>> [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>> section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]]
>> (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds
>> at a rate of 387 bytes/sec
>> 2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8]
>> o.a.nifi.processors.standard.PutFile
>> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of
>> StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
>> section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]
>> at location /root/putt/275248510432074
>> 2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1]
>> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
>> 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis
>> From: Bryan Bende <bb...@gmail.com>
>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> Date: Sunday, January 10, 2016 at 2:43 PM
>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Chakri,
>>
>> Glad you got site-to-site working.
>>
>> Regarding the data distribution, I'm not sure why it is behaving that
>> way. I just did a similar test running ncm, node1, and node2 all on my
>> local machine, with GenerateFlowFile running every 10 seconds, and Input
>> Port going to a LogAttribute, and I see it alternating between node1 and
>> node2 logs every 10 seconds.
>>
>> Is there anything in your primary node logs
>> (primary_node/logs/nifi-app.log) when you see the data on the other node?
>>
>> -Bryan
>>
>>
>> On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com> wrote:
>>
>>> Chakri,
>>>
>>> Would love to hear what you've learned and how that differed from the
>>> docs themselves.  Site-to-site has proven difficult to setup so we're
>>> clearly not there yet in having the right operator/admin experience.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com> wrote:
>>> > I was able to get site-to-site work.
>>> > I tried to follow your instructions to send data distribute across the
>>> > nodes.
>>> >
>>> > GenerateFlowFile (On Primary) —> RPG
>>> > RPG —> Input Port   —> Putfile (Time driven scheduling)
>>> >
>>> > However, data is only written to one slave (Secondary slave). Primary
>>> slave
>>> > has not data.
>>> >
>>> > Image screenshot :
>>> > http://tinyurl.com/jjvjtmq
>>> >
>>> > From: Chakrader Dewaragatla <ch...@lifelock.com>
>>> > Date: Sunday, January 10, 2016 at 11:26 AM
>>> >
>>> > To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> > Subject: Re: Nifi cluster features - Questions
>>> >
>>> > Bryan – Thanks – I am trying to setup site-to-site.
>>> > I have two slaves and one NCM.
>>> >
>>> > My properties as follows :
>>> >
>>> > On both Slaves:
>>> >
>>> > nifi.remote.input.socket.port=10880
>>> > nifi.remote.input.secure=false
>>> >
>>> > On NCM:
>>> > nifi.remote.input.socket.port=10880
>>> > nifi.remote.input.secure=false
>>> >
>>> > When I try drop remote process group (with http://<NCM
>>> IP>:8080/nifi), I see
>>> > error as follows for two nodes.
>>> >
>>> > [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
>>> > communication
>>> > [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
>>> > communication
>>> >
>>> > Do you have insight why its trying to connecting 8080 on slaves ? When
>>> do
>>> > 10880 port come into the picture ? I remember try setting site to site
>>> few
>>> > months back and succeeded.
>>> >
>>> > Thanks,
>>> > -Chakri
>>> >
>>> >
>>> >
>>> > From: Bryan Bende <bb...@gmail.com>
>>> > Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> > Date: Saturday, January 9, 2016 at 11:22 AM
>>> > To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> > Subject: Re: Nifi cluster features - Questions
>>> >
>>> > The sending node (where the remote process group is) will distribute
>>> the
>>> > data evenly across the two nodes, so an individual file will only be
>>> sent to
>>> > one of the nodes. You could think of it as if a separate NiFi instance
>>> was
>>> > sending directly to a two node cluster, it would be evenly
>>> distributing the
>>> > data across the two nodes. In this case it just so happens to all be
>>> with in
>>> > the same cluster.
>>> >
>>> > The most common use case for this scenario is the List and Fetch
>>> processors
>>> > like HDFS. You can perform the listing on primary node, and then
>>> distribute
>>> > the results so the fetching takes place on all nodes.
>>> >
>>> > On Saturday, January 9, 2016, Chakrader Dewaragatla
>>> > <Ch...@lifelock.com> wrote:
>>> >>
>>> >> Bryan – Thanks, how do the nodes distribute the load for a input
>>> port. As
>>> >> port is open and listening on two nodes,  does it copy same files on
>>> both
>>> >> the nodes?
>>> >> I need to try this setup to see the results, appreciate your help.
>>> >>
>>> >> Thanks,
>>> >> -Chakri
>>> >>
>>> >> From: Bryan Bende <bb...@gmail.com>
>>> >> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> >> Date: Friday, January 8, 2016 at 3:44 PM
>>> >> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> >> Subject: Re: Nifi cluster features - Questions
>>> >>
>>> >> Hi Chakri,
>>> >>
>>> >> I believe the DistributeLoad processor is more for load balancing when
>>> >> sending to downstream systems. For example, if you had two HTTP
>>> endpoints,
>>> >> you could have the first relationship from DistributeLoad going to a
>>> >> PostHTTP that posts to endpoint #1, and the second relationship going
>>> to a
>>> >> second PostHTTP that goes to endpoint #2.
>>> >>
>>> >> If you want to distribute the data with in the cluster, then you need
>>> to
>>> >> use site-to-site. The way you do this is the following...
>>> >>
>>> >> - Add an Input Port connected to your PutFile.
>>> >> - Add GenerateFlowFile scheduled on primary node only, connected to a
>>> >> Remote Process Group. The Remote Process Group should be connected to
>>> the
>>> >> Input Port from the previous step.
>>> >>
>>> >> So both nodes have an input port listening for data, but only the
>>> primary
>>> >> node produces a FlowFile and sends it to the RPG which then
>>> re-distributes
>>> >> it back to one of the Input Ports.
>>> >>
>>> >> In order for this to work you need to set
>>> nifi.remote.input.socket.port in
>>> >> nifi.properties to some available port, and you probably want
>>> >> nifi.remote.input.secure=false for testing.
>>> >>
>>> >> -Bryan
>>> >>
>>> >>
>>> >> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>>> >> <Ch...@lifelock.com> wrote:
>>> >>>
>>> >>> Mark – I have setup a two node cluster and tried the following .
>>> >>>  GenrateFlowfile processor (Run only on primary node) —>
>>> DistributionLoad
>>> >>> processor (RoundRobin)   —> PutFile
>>> >>>
>>> >>> >> The GetFile/PutFile will run on all nodes (unless you schedule it
>>> to
>>> >>> >> run on primary node only).
>>> >>> From your above comment, It should put file on two nodes. It put
>>> files on
>>> >>> primary node only. Any thoughts ?
>>> >>>
>>> >>> Thanks,
>>> >>> -Chakri
>>> >>>
>>> >>> From: Mark Payne <ma...@hotmail.com>
>>> >>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> >>> Date: Wednesday, October 7, 2015 at 11:28 AM
>>> >>>
>>> >>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> >>> Subject: Re: Nifi cluster features - Questions
>>> >>>
>>> >>> Chakri,
>>> >>>
>>> >>> Correct - when NiFi instances are clustered, they do not transfer
>>> data
>>> >>> between the nodes. This is very different
>>> >>> than you might expect from something like Storm or Spark, as the key
>>> >>> goals and design are quite different.
>>> >>> We have discussed providing the ability to allow the user to indicate
>>> >>> that they want to have the framework
>>> >>> do load balancing for specific connections in the background, but
>>> it's
>>> >>> still in more of a discussion phase.
>>> >>>
>>> >>> Site-to-Site is simply the capability that we have developed to
>>> transfer
>>> >>> data between one instance of
>>> >>> NiFi and another instance of NiFi. So currently, if we want to do
>>> load
>>> >>> balancing across the cluster, we would
>>> >>> create a site-to-site connection (by dragging a Remote Process Group
>>> onto
>>> >>> the graph) and give that
>>> >>> site-to-site connection the URL of our cluster. That way, you can
>>> push
>>> >>> data to your own cluster, effectively
>>> >>> providing a load balancing capability.
>>> >>>
>>> >>> If you were to just run ListenHTTP without setting it to Primary
>>> Node,
>>> >>> then every node in the cluster will be listening
>>> >>> for incoming HTTP connections. So you could then use a simple load
>>> >>> balancer in front of NiFi to distribute the load
>>> >>> across your cluster.
>>> >>>
>>> >>> Does this help? If you have any more questions we're happy to help!
>>> >>>
>>> >>> Thanks
>>> >>> -Mark
>>> >>>
>>> >>>
>>> >>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>>> >>> <Ch...@lifelock.com> wrote:
>>> >>>
>>> >>> Mark - Thanks for the notes.
>>> >>>
>>> >>> >> The other option would be to have a ListenHTTP processor run on
>>> >>> >> Primary Node only and then use Site-to-Site to distribute the
>>> data to other
>>> >>> >> nodes.
>>> >>> Lets say I have 5 node cluster and ListenHTTP processor on Primary
>>> node,
>>> >>> collected data on primary node is not transfered to other nodes by
>>> default
>>> >>> for processing despite all nodes are part of one cluster?
>>> >>> If ListenHTTP processor is running  as a dafult (with out explicit
>>> >>> setting to run on primary node), how does the data transferred to
>>> rest of
>>> >>> the nodes? Does site-to-site come in play when I make one processor
>>> to run
>>> >>> on primary node ?
>>> >>>
>>> >>> Thanks,
>>> >>> -Chakri
>>> >>>
>>> >>> From: Mark Payne <ma...@hotmail.com>
>>> >>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> >>> Date: Wednesday, October 7, 2015 at 7:00 AM
>>> >>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> >>> Subject: Re: Nifi cluster features - Questions
>>> >>>
>>> >>> Hello Chakro,
>>> >>>
>>> >>> When you create a cluster of NiFi instances, each node in the
>>> cluster is
>>> >>> acting independently and in exactly
>>> >>> the same way. I.e., if you have 5 nodes, all 5 nodes will run
>>> exactly the
>>> >>> same flow. However, they will be
>>> >>> pulling in different data and therefore operating on different data.
>>> >>>
>>> >>> So if you pull in 10 1-gig files from S3, each of those files will be
>>> >>> processed on the node that pulled the data
>>> >>> in. NiFi does not currently shuffle data around between nodes in the
>>> >>> cluster (you can use site-to-site to do
>>> >>> this if you want to, but it won't happen automatically). If you set
>>> the
>>> >>> number of Concurrent Tasks to 5, then
>>> >>> you will have up to 5 threads running for that processor on each
>>> node.
>>> >>>
>>> >>> The only exception to this is the Primary Node. You can schedule a
>>> >>> Processor to run only on the Primary Node
>>> >>> by right-clicking on the Processor, and going to the Configure menu.
>>> In
>>> >>> the Scheduling tab, you can change
>>> >>> the Scheduling Strategy to Primary Node Only. In this case, that
>>> >>> Processor will only be triggered to run on
>>> >>> whichever node is elected the Primary Node (this can be changed in
>>> the
>>> >>> Cluster management screen by clicking
>>> >>> the appropriate icon in the top-right corner of the UI).
>>> >>>
>>> >>> The GetFile/PutFile will run on all nodes (unless you schedule it to
>>> run
>>> >>> on primary node only).
>>> >>>
>>> >>> If you are attempting to have a single input running HTTP and then
>>> push
>>> >>> that out across the entire cluster to
>>> >>> process the data, you would have a few options. First, you could
>>> just use
>>> >>> an HTTP Load Balancer in front of NiFi.
>>> >>> The other option would be to have a ListenHTTP processor run on
>>> Primary
>>> >>> Node only and then use Site-to-Site
>>> >>> to distribute the data to other nodes.
>>> >>>
>>> >>> For more info on site-to-site, you can see the Site-to-Site section
>>> of
>>> >>> the User Guide at
>>> >>>
>>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>>> >>>
>>> >>> If you have any more questions, let us know!
>>> >>>
>>> >>> Thanks
>>> >>> -Mark
>>> >>>
>>> >>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>>> >>> <Ch...@lifelock.com> wrote:
>>> >>>
>>> >>> Nifi Team – I would like to understand the advantages of Nifi
>>> clustering
>>> >>> setup.
>>> >>>
>>> >>> Questions :
>>> >>>
>>> >>>  - How does workflow work on multiple nodes ? Does it share the
>>> resources
>>> >>> intra nodes ?
>>> >>> Lets say I need to pull data 10 1Gig files from S3, how does work
>>> load
>>> >>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks
>>> per node ?
>>> >>>
>>> >>>  - How to “isolate” the processor to the master node (or one node)?
>>> >>>
>>> >>> - Getfile/Putfile processors on cluster setup, does it get/put on
>>> primary
>>> >>> node ? How do I force processor to look in one of the slave node?
>>> >>>
>>> >>> - How can we have a workflow where the input side we want to receive
>>> >>> requests (http) and then the rest of the pipeline need to run in
>>> parallel on
>>> >>> all the nodes ?
>>> >>>
>>> >>> Thanks,
>>> >>> -Chakro
>>> >>>
>>> >>> ________________________________
>>> >>> The information contained in this transmission may contain
>>> privileged and
>>> >>> confidential information. It is intended only for the use of the
>>> person(s)
>>> >>> named above. If you are not the intended recipient, you are hereby
>>> notified
>>> >>> that any review, dissemination, distribution or duplication of this
>>> >>> communication is strictly prohibited. If you are not the intended
>>> recipient,
>>> >>> please contact the sender by reply email and destroy all copies of
>>> the
>>> >>> original message.
>>> >>> ________________________________
>>> >>>
>>> >>>
>>> >>> ________________________________
>>> >>> The information contained in this transmission may contain
>>> privileged and
>>> >>> confidential information. It is intended only for the use of the
>>> person(s)
>>> >>> named above. If you are not the intended recipient, you are hereby
>>> notified
>>> >>> that any review, dissemination, distribution or duplication of this
>>> >>> communication is strictly prohibited. If you are not the intended
>>> recipient,
>>> >>> please contact the sender by reply email and destroy all copies of
>>> the
>>> >>> original message.
>>> >>> ________________________________
>>> >>>
>>> >>>
>>> >>> ________________________________
>>> >>> The information contained in this transmission may contain
>>> privileged and
>>> >>> confidential information. It is intended only for the use of the
>>> person(s)
>>> >>> named above. If you are not the intended recipient, you are hereby
>>> notified
>>> >>> that any review, dissemination, distribution or duplication of this
>>> >>> communication is strictly prohibited. If you are not the intended
>>> recipient,
>>> >>> please contact the sender by reply email and destroy all copies of
>>> the
>>> >>> original message.
>>> >>> ________________________________
>>> >>
>>> >>
>>> >> ________________________________
>>> >> The information contained in this transmission may contain privileged
>>> and
>>> >> confidential information. It is intended only for the use of the
>>> person(s)
>>> >> named above. If you are not the intended recipient, you are hereby
>>> notified
>>> >> that any review, dissemination, distribution or duplication of this
>>> >> communication is strictly prohibited. If you are not the intended
>>> recipient,
>>> >> please contact the sender by reply email and destroy all copies of the
>>> >> original message.
>>> >> ________________________________
>>> >
>>> >
>>> >
>>> > --
>>> > Sent from Gmail Mobile
>>> > ________________________________
>>> > The information contained in this transmission may contain privileged
>>> and
>>> > confidential information. It is intended only for the use of the
>>> person(s)
>>> > named above. If you are not the intended recipient, you are hereby
>>> notified
>>> > that any review, dissemination, distribution or duplication of this
>>> > communication is strictly prohibited. If you are not the intended
>>> recipient,
>>> > please contact the sender by reply email and destroy all copies of the
>>> > original message.
>>> > ________________________________
>>>
>>
>> ------------------------------
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>> ------------------------------
>>
>>
>> ------------------------------
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>> ------------------------------
>>
>>
>> ------------------------------
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>> ------------------------------
>>
>
>

Re: Nifi cluster features - Questions

Posted by Matthew Clarke <ma...@gmail.com>.
Chakri,
            What Mark is saying is NiFI Remote Process Group (RPG) also
known as Site-to-Site will load-balance delivery data to all nodes in a
cluster.  It can not be configured to balance data to only a subset of a
nodes in a cluster.  If this is the strategy you want to deploy, a
different approach must be taken (one that does not use Site-to-Site).
Here is a NiFI diagram of one such approach using your example of a 10 node
cluster:

[image: Inline image 1]



On Mon, Jan 11, 2016 at 4:16 PM, Chakrader Dewaragatla <
Chakrader.Dewaragatla@lifelock.com> wrote:

> Mark - Correct me if I understood right.
>
> Curl post from some application —> Configure Listen http (on primary node)
> --> Post http with Data flow file (On primary node?)  --> Post to
> site-to-site end point —> This intern distribute load to both slaves.
>
> Thanks,
> -Chakri
>
> From: Mark Payne <ma...@hotmail.com>
> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Date: Monday, January 11, 2016 at 12:29 PM
>
> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Subject: Re: Nifi cluster features - Questions
>
> Chakri,
>
> At this time, your only options are to run the processors on all nodes or
> a single node (Primary Node). There's no way to really group nodes together
> and say "only run on this set of nodes."
>
> One option is to have a ListenHTTP Processor and then push data to that
> NiFi via PostHTTP (configure it to send FlowFile attributes along). By
> doing this, you could set up the sending NiFi
> to only deliver data to two nodes. You could then have a different set of
> data going to a different two nodes, etc. by the way that you configure
> which data goes to which PostHTTP Processor.
>
> Does this give you what you need?
>
>
> On Jan 11, 2016, at 3:20 PM, Chakrader Dewaragatla <
> Chakrader.Dewaragatla@lifelock.com> wrote:
>
> Thanks Mark. I will look into it.
>
> Couple of questions:
>
>
>    - Going back to my earlier question, In a nifi cluster with two slaves
>    and NCM how do I make two slaves accept and process the incoming flowfile
>    in distibuted fashion. Site to site is the only way to go ?
>    In our use case, we have http listener running on primary node and
>    putfile processor should run on two slaves in distributed fashion.
>
>    It is more like a new (or existing) feature.
>     - In a nifi cluster setup, can we group the machines and set
>    site-to-site to individual group.
>     For instance I have 10 node cluster, can I group them into 5 groups
>    with two nodes each. Run processors on dedicated group (using site to site
>    or other means).
>
> Thanks,
> -Chakri
>
> From: Mark Payne <ma...@hotmail.com>
> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Date: Monday, January 11, 2016 at 5:24 AM
> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Subject: Re: Nifi cluster features - Questions
>
> Chakri,
>
> This line in the logs is particularly interesting (on primary node):
>
> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7]
> o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
>
>
> This indicates that all of the site-to-site data will go to the host
> i-c894e249.dev.aws.lifelock.ad. Moreover, because that is the only node
> listed, this means
> that the NCM responded, indicating that this is the only node in the
> cluster that is currently connected and has site-to-site enabled. Can you
> double-check the nifi.properties
> file on the Primary Node and verify that the "
> nifi.remote.input.socket.port" is property is specified, and that the "
> nifi.remote.input.secure" property is set to "false"?
> Of note is that if the "nifi.remote.input.secure" property is set to
> true, but keystore and truststore are not specified, then site-to-site will
> be disabled (there would be a warning
> in the log in this case).
>
> If you can verify that both of those properties are set properly on both
> nodes, then we can delve in further, but probably best to start by
> double-checking the easy things :)
>
> Thanks
> -Mark
>
>
> On Jan 10, 2016, at 5:55 PM, Chakrader Dewaragatla <
> Chakrader.Dewaragatla@lifelock.com> wrote:
>
> Bryan – Here are the logs :
> I have 5 sec flow file.
>
> On primary node (No data coming in)
>
> 2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis
> 2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2]
> o.a.nifi.controller.StandardFlowService Saved flow controller
> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
> pending = false
> 2016-01-10 22:52:39,450 INFO [pool-26-thread-16]
> o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled
> GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1
> threads
> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7]
> o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
> 2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2]
> o.a.nifi.controller.StandardFlowService Saved flow controller
> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
> pending = false
> 2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis
> 2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7]
> o.a.nifi.remote.StandardRemoteGroupPort
> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
> Successfully sent
> [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
> section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]]
> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50
> milliseconds at a rate of 392 bytes/sec
> 2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis
> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1]
> o.a.nifi.remote.StandardRemoteGroupPort
> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
> Successfully sent
> [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
> section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]]
> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 51
> milliseconds at a rate of 391 bytes/sec
> 2016-01-10 22:52:45,092 INFO [Process NCM Request-2]
> o.a.n.c.p.impl.SocketProtocolListener Received request
> 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73
> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2]
> o.a.nifi.controller.StandardFlowService Received flow request message from
> manager.
> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2]
> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
> 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes)
> in 61 millis
> 2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis
> 2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3]
> o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers
> for events starting with ID 301
> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2]
> o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files
> (6 records) into single Provenance Log File
> ./provenance_repository/295.prov in 111 milliseconds
> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2]
> o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance
> Event file containing 8 records
> 2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10]
> o.a.nifi.remote.StandardRemoteGroupPort
> RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi]
> Successfully sent
> [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default,
> section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]]
> (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50
> milliseconds at a rate of 392 bytes/sec
> 2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis
> 2016-01-10 22:52:54,326 INFO [NiFi Web Server-22]
> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
> StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa]
> to run
> 2016-01-10 22:52:54,353 INFO [NiFi Web Server-26]
> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run
> 2016-01-10 22:52:54,377 INFO [NiFi Web Server-25]
> o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling
> GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run
> 2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis
> 2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2]
> o.a.nifi.controller.StandardFlowService Saved flow controller
> org.apache.nifi.controller.FlowController@5dff8cbf // Another save
> pending = false
> 2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis
>
>
> On Secondary node (Data coming in)
>
> 2016-01-10 22:52:43,896 INFO [pool-18-thread-1]
> o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile
> Repository with 0 records in 88 milliseconds
> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3]
> o.a.n.r.p.s.SocketFlowFileServerProtocol
> SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573]
> Successfully received
> [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
> section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]]
> (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds
> at a rate of 387 bytes/sec
> 2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1]
> o.a.nifi.processors.standard.PutFile
> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of
> StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
> section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]
> at location /root/putt/275243509297560
> 2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3]
> o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers
> for events starting with ID 17037
> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1]
> o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files
> (6 records) into single Provenance Log File
> ./provenance_repository/17031.prov in 56 milliseconds
> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1]
> o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance
> Event file containing 10 records
> 2016-01-10 22:52:45,034 INFO [Process NCM Request-2]
> o.a.n.c.p.impl.SocketProtocolListener Received request
> e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73
> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2]
> o.a.nifi.controller.StandardFlowService Received flow request message from
> manager.
> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2]
> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
> e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes)
> in 76 millis
> 2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis
> 2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6]
> o.a.n.r.p.s.SocketFlowFileServerProtocol
> SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573]
> Successfully received
> [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
> section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]]
> (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds
> at a rate of 387 bytes/sec
> 2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8]
> o.a.nifi.processors.standard.PutFile
> PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of
> StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default,
> section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]
> at location /root/putt/275248510432074
> 2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1]
> org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10
> 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis
> From: Bryan Bende <bb...@gmail.com>
> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Date: Sunday, January 10, 2016 at 2:43 PM
> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Subject: Re: Nifi cluster features - Questions
>
> Chakri,
>
> Glad you got site-to-site working.
>
> Regarding the data distribution, I'm not sure why it is behaving that way.
> I just did a similar test running ncm, node1, and node2 all on my local
> machine, with GenerateFlowFile running every 10 seconds, and Input Port
> going to a LogAttribute, and I see it alternating between node1 and node2
> logs every 10 seconds.
>
> Is there anything in your primary node logs
> (primary_node/logs/nifi-app.log) when you see the data on the other node?
>
> -Bryan
>
>
> On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com> wrote:
>
>> Chakri,
>>
>> Would love to hear what you've learned and how that differed from the
>> docs themselves.  Site-to-site has proven difficult to setup so we're
>> clearly not there yet in having the right operator/admin experience.
>>
>> Thanks
>> Joe
>>
>> On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
>> <Ch...@lifelock.com> wrote:
>> > I was able to get site-to-site work.
>> > I tried to follow your instructions to send data distribute across the
>> > nodes.
>> >
>> > GenerateFlowFile (On Primary) —> RPG
>> > RPG —> Input Port   —> Putfile (Time driven scheduling)
>> >
>> > However, data is only written to one slave (Secondary slave). Primary
>> slave
>> > has not data.
>> >
>> > Image screenshot :
>> > http://tinyurl.com/jjvjtmq
>> >
>> > From: Chakrader Dewaragatla <ch...@lifelock.com>
>> > Date: Sunday, January 10, 2016 at 11:26 AM
>> >
>> > To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> > Subject: Re: Nifi cluster features - Questions
>> >
>> > Bryan – Thanks – I am trying to setup site-to-site.
>> > I have two slaves and one NCM.
>> >
>> > My properties as follows :
>> >
>> > On both Slaves:
>> >
>> > nifi.remote.input.socket.port=10880
>> > nifi.remote.input.secure=false
>> >
>> > On NCM:
>> > nifi.remote.input.socket.port=10880
>> > nifi.remote.input.secure=false
>> >
>> > When I try drop remote process group (with http://<NCM IP>:8080/nifi),
>> I see
>> > error as follows for two nodes.
>> >
>> > [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
>> > communication
>> > [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
>> > communication
>> >
>> > Do you have insight why its trying to connecting 8080 on slaves ? When
>> do
>> > 10880 port come into the picture ? I remember try setting site to site
>> few
>> > months back and succeeded.
>> >
>> > Thanks,
>> > -Chakri
>> >
>> >
>> >
>> > From: Bryan Bende <bb...@gmail.com>
>> > Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> > Date: Saturday, January 9, 2016 at 11:22 AM
>> > To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> > Subject: Re: Nifi cluster features - Questions
>> >
>> > The sending node (where the remote process group is) will distribute the
>> > data evenly across the two nodes, so an individual file will only be
>> sent to
>> > one of the nodes. You could think of it as if a separate NiFi instance
>> was
>> > sending directly to a two node cluster, it would be evenly distributing
>> the
>> > data across the two nodes. In this case it just so happens to all be
>> with in
>> > the same cluster.
>> >
>> > The most common use case for this scenario is the List and Fetch
>> processors
>> > like HDFS. You can perform the listing on primary node, and then
>> distribute
>> > the results so the fetching takes place on all nodes.
>> >
>> > On Saturday, January 9, 2016, Chakrader Dewaragatla
>> > <Ch...@lifelock.com> wrote:
>> >>
>> >> Bryan – Thanks, how do the nodes distribute the load for a input port.
>> As
>> >> port is open and listening on two nodes,  does it copy same files on
>> both
>> >> the nodes?
>> >> I need to try this setup to see the results, appreciate your help.
>> >>
>> >> Thanks,
>> >> -Chakri
>> >>
>> >> From: Bryan Bende <bb...@gmail.com>
>> >> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> >> Date: Friday, January 8, 2016 at 3:44 PM
>> >> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> >> Subject: Re: Nifi cluster features - Questions
>> >>
>> >> Hi Chakri,
>> >>
>> >> I believe the DistributeLoad processor is more for load balancing when
>> >> sending to downstream systems. For example, if you had two HTTP
>> endpoints,
>> >> you could have the first relationship from DistributeLoad going to a
>> >> PostHTTP that posts to endpoint #1, and the second relationship going
>> to a
>> >> second PostHTTP that goes to endpoint #2.
>> >>
>> >> If you want to distribute the data with in the cluster, then you need
>> to
>> >> use site-to-site. The way you do this is the following...
>> >>
>> >> - Add an Input Port connected to your PutFile.
>> >> - Add GenerateFlowFile scheduled on primary node only, connected to a
>> >> Remote Process Group. The Remote Process Group should be connected to
>> the
>> >> Input Port from the previous step.
>> >>
>> >> So both nodes have an input port listening for data, but only the
>> primary
>> >> node produces a FlowFile and sends it to the RPG which then
>> re-distributes
>> >> it back to one of the Input Ports.
>> >>
>> >> In order for this to work you need to set
>> nifi.remote.input.socket.port in
>> >> nifi.properties to some available port, and you probably want
>> >> nifi.remote.input.secure=false for testing.
>> >>
>> >> -Bryan
>> >>
>> >>
>> >> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>> >> <Ch...@lifelock.com> wrote:
>> >>>
>> >>> Mark – I have setup a two node cluster and tried the following .
>> >>>  GenrateFlowfile processor (Run only on primary node) —>
>> DistributionLoad
>> >>> processor (RoundRobin)   —> PutFile
>> >>>
>> >>> >> The GetFile/PutFile will run on all nodes (unless you schedule it
>> to
>> >>> >> run on primary node only).
>> >>> From your above comment, It should put file on two nodes. It put
>> files on
>> >>> primary node only. Any thoughts ?
>> >>>
>> >>> Thanks,
>> >>> -Chakri
>> >>>
>> >>> From: Mark Payne <ma...@hotmail.com>
>> >>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> >>> Date: Wednesday, October 7, 2015 at 11:28 AM
>> >>>
>> >>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> >>> Subject: Re: Nifi cluster features - Questions
>> >>>
>> >>> Chakri,
>> >>>
>> >>> Correct - when NiFi instances are clustered, they do not transfer data
>> >>> between the nodes. This is very different
>> >>> than you might expect from something like Storm or Spark, as the key
>> >>> goals and design are quite different.
>> >>> We have discussed providing the ability to allow the user to indicate
>> >>> that they want to have the framework
>> >>> do load balancing for specific connections in the background, but it's
>> >>> still in more of a discussion phase.
>> >>>
>> >>> Site-to-Site is simply the capability that we have developed to
>> transfer
>> >>> data between one instance of
>> >>> NiFi and another instance of NiFi. So currently, if we want to do load
>> >>> balancing across the cluster, we would
>> >>> create a site-to-site connection (by dragging a Remote Process Group
>> onto
>> >>> the graph) and give that
>> >>> site-to-site connection the URL of our cluster. That way, you can push
>> >>> data to your own cluster, effectively
>> >>> providing a load balancing capability.
>> >>>
>> >>> If you were to just run ListenHTTP without setting it to Primary Node,
>> >>> then every node in the cluster will be listening
>> >>> for incoming HTTP connections. So you could then use a simple load
>> >>> balancer in front of NiFi to distribute the load
>> >>> across your cluster.
>> >>>
>> >>> Does this help? If you have any more questions we're happy to help!
>> >>>
>> >>> Thanks
>> >>> -Mark
>> >>>
>> >>>
>> >>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>> >>> <Ch...@lifelock.com> wrote:
>> >>>
>> >>> Mark - Thanks for the notes.
>> >>>
>> >>> >> The other option would be to have a ListenHTTP processor run on
>> >>> >> Primary Node only and then use Site-to-Site to distribute the data
>> to other
>> >>> >> nodes.
>> >>> Lets say I have 5 node cluster and ListenHTTP processor on Primary
>> node,
>> >>> collected data on primary node is not transfered to other nodes by
>> default
>> >>> for processing despite all nodes are part of one cluster?
>> >>> If ListenHTTP processor is running  as a dafult (with out explicit
>> >>> setting to run on primary node), how does the data transferred to
>> rest of
>> >>> the nodes? Does site-to-site come in play when I make one processor
>> to run
>> >>> on primary node ?
>> >>>
>> >>> Thanks,
>> >>> -Chakri
>> >>>
>> >>> From: Mark Payne <ma...@hotmail.com>
>> >>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> >>> Date: Wednesday, October 7, 2015 at 7:00 AM
>> >>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> >>> Subject: Re: Nifi cluster features - Questions
>> >>>
>> >>> Hello Chakro,
>> >>>
>> >>> When you create a cluster of NiFi instances, each node in the cluster
>> is
>> >>> acting independently and in exactly
>> >>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly
>> the
>> >>> same flow. However, they will be
>> >>> pulling in different data and therefore operating on different data.
>> >>>
>> >>> So if you pull in 10 1-gig files from S3, each of those files will be
>> >>> processed on the node that pulled the data
>> >>> in. NiFi does not currently shuffle data around between nodes in the
>> >>> cluster (you can use site-to-site to do
>> >>> this if you want to, but it won't happen automatically). If you set
>> the
>> >>> number of Concurrent Tasks to 5, then
>> >>> you will have up to 5 threads running for that processor on each node.
>> >>>
>> >>> The only exception to this is the Primary Node. You can schedule a
>> >>> Processor to run only on the Primary Node
>> >>> by right-clicking on the Processor, and going to the Configure menu.
>> In
>> >>> the Scheduling tab, you can change
>> >>> the Scheduling Strategy to Primary Node Only. In this case, that
>> >>> Processor will only be triggered to run on
>> >>> whichever node is elected the Primary Node (this can be changed in the
>> >>> Cluster management screen by clicking
>> >>> the appropriate icon in the top-right corner of the UI).
>> >>>
>> >>> The GetFile/PutFile will run on all nodes (unless you schedule it to
>> run
>> >>> on primary node only).
>> >>>
>> >>> If you are attempting to have a single input running HTTP and then
>> push
>> >>> that out across the entire cluster to
>> >>> process the data, you would have a few options. First, you could just
>> use
>> >>> an HTTP Load Balancer in front of NiFi.
>> >>> The other option would be to have a ListenHTTP processor run on
>> Primary
>> >>> Node only and then use Site-to-Site
>> >>> to distribute the data to other nodes.
>> >>>
>> >>> For more info on site-to-site, you can see the Site-to-Site section of
>> >>> the User Guide at
>> >>>
>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>> >>>
>> >>> If you have any more questions, let us know!
>> >>>
>> >>> Thanks
>> >>> -Mark
>> >>>
>> >>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>> >>> <Ch...@lifelock.com> wrote:
>> >>>
>> >>> Nifi Team – I would like to understand the advantages of Nifi
>> clustering
>> >>> setup.
>> >>>
>> >>> Questions :
>> >>>
>> >>>  - How does workflow work on multiple nodes ? Does it share the
>> resources
>> >>> intra nodes ?
>> >>> Lets say I need to pull data 10 1Gig files from S3, how does work load
>> >>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per
>> node ?
>> >>>
>> >>>  - How to “isolate” the processor to the master node (or one node)?
>> >>>
>> >>> - Getfile/Putfile processors on cluster setup, does it get/put on
>> primary
>> >>> node ? How do I force processor to look in one of the slave node?
>> >>>
>> >>> - How can we have a workflow where the input side we want to receive
>> >>> requests (http) and then the rest of the pipeline need to run in
>> parallel on
>> >>> all the nodes ?
>> >>>
>> >>> Thanks,
>> >>> -Chakro
>> >>>
>> >>> ________________________________
>> >>> The information contained in this transmission may contain privileged
>> and
>> >>> confidential information. It is intended only for the use of the
>> person(s)
>> >>> named above. If you are not the intended recipient, you are hereby
>> notified
>> >>> that any review, dissemination, distribution or duplication of this
>> >>> communication is strictly prohibited. If you are not the intended
>> recipient,
>> >>> please contact the sender by reply email and destroy all copies of the
>> >>> original message.
>> >>> ________________________________
>> >>>
>> >>>
>> >>> ________________________________
>> >>> The information contained in this transmission may contain privileged
>> and
>> >>> confidential information. It is intended only for the use of the
>> person(s)
>> >>> named above. If you are not the intended recipient, you are hereby
>> notified
>> >>> that any review, dissemination, distribution or duplication of this
>> >>> communication is strictly prohibited. If you are not the intended
>> recipient,
>> >>> please contact the sender by reply email and destroy all copies of the
>> >>> original message.
>> >>> ________________________________
>> >>>
>> >>>
>> >>> ________________________________
>> >>> The information contained in this transmission may contain privileged
>> and
>> >>> confidential information. It is intended only for the use of the
>> person(s)
>> >>> named above. If you are not the intended recipient, you are hereby
>> notified
>> >>> that any review, dissemination, distribution or duplication of this
>> >>> communication is strictly prohibited. If you are not the intended
>> recipient,
>> >>> please contact the sender by reply email and destroy all copies of the
>> >>> original message.
>> >>> ________________________________
>> >>
>> >>
>> >> ________________________________
>> >> The information contained in this transmission may contain privileged
>> and
>> >> confidential information. It is intended only for the use of the
>> person(s)
>> >> named above. If you are not the intended recipient, you are hereby
>> notified
>> >> that any review, dissemination, distribution or duplication of this
>> >> communication is strictly prohibited. If you are not the intended
>> recipient,
>> >> please contact the sender by reply email and destroy all copies of the
>> >> original message.
>> >> ________________________________
>> >
>> >
>> >
>> > --
>> > Sent from Gmail Mobile
>> > ________________________________
>> > The information contained in this transmission may contain privileged
>> and
>> > confidential information. It is intended only for the use of the
>> person(s)
>> > named above. If you are not the intended recipient, you are hereby
>> notified
>> > that any review, dissemination, distribution or duplication of this
>> > communication is strictly prohibited. If you are not the intended
>> recipient,
>> > please contact the sender by reply email and destroy all copies of the
>> > original message.
>> > ________________________________
>>
>
> ------------------------------
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>
>
> ------------------------------
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>
>
> ------------------------------
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>

Re: Nifi cluster features - Questions

Posted by Chakrader Dewaragatla <Ch...@lifelock.com>.
Mark - Correct me if I understood right.

Curl post from some application —> Configure Listen http (on primary node) --> Post http with Data flow file (On primary node?)  --> Post to site-to-site end point —> This intern distribute load to both slaves.

Thanks,
-Chakri

From: Mark Payne <ma...@hotmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Monday, January 11, 2016 at 12:29 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

At this time, your only options are to run the processors on all nodes or a single node (Primary Node). There's no way to really group nodes together and say "only run on this set of nodes."

One option is to have a ListenHTTP Processor and then push data to that NiFi via PostHTTP (configure it to send FlowFile attributes along). By doing this, you could set up the sending NiFi
to only deliver data to two nodes. You could then have a different set of data going to a different two nodes, etc. by the way that you configure which data goes to which PostHTTP Processor.

Does this give you what you need?


On Jan 11, 2016, at 3:20 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Thanks Mark. I will look into it.

Couple of questions:


  *
Going back to my earlier question, In a nifi cluster with two slaves and NCM how do I make two slaves accept and process the incoming flowfile in distibuted fashion. Site to site is the only way to go ?
In our use case, we have http listener running on primary node and putfile processor should run on two slaves in distributed fashion.

It is more like a new (or existing) feature.
 - In a nifi cluster setup, can we group the machines and set site-to-site to individual group.
 For instance I have 10 node cluster, can I group them into 5 groups with two nodes each. Run processors on dedicated group (using site to site or other means).

Thanks,
-Chakri

From: Mark Payne <ma...@hotmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Monday, January 11, 2016 at 5:24 AM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

This line in the logs is particularly interesting (on primary node):

2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data

This indicates that all of the site-to-site data will go to the host i-c894e249.dev.aws.lifelock.ad. Moreover, because that is the only node listed, this means
that the NCM responded, indicating that this is the only node in the cluster that is currently connected and has site-to-site enabled. Can you double-check the nifi.properties
file on the Primary Node and verify that the "nifi.remote.input.socket.port" is property is specified, and that the "nifi.remote.input.secure" property is set to "false"?
Of note is that if the "nifi.remote.input.secure" property is set to true, but keystore and truststore are not specified, then site-to-site will be disabled (there would be a warning
in the log in this case).

If you can verify that both of those properties are set properly on both nodes, then we can delve in further, but probably best to start by double-checking the easy things :)

Thanks
-Mark


On Jan 10, 2016, at 5:55 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Bryan – Here are the logs :
I have 5 sec flow file.

On primary node (No data coming in)

2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis
2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:39,450 INFO [pool-26-thread-16] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1 threads
2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis
2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec
2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis
2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 51 milliseconds at a rate of 391 bytes/sec
2016-01-10 22:52:45,092 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73
2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes) in 61 millis
2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis
2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 301
2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/295.prov in 111 milliseconds
2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 8 records
2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec
2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis
2016-01-10 22:52:54,326 INFO [NiFi Web Server-22] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa] to run
2016-01-10 22:52:54,353 INFO [NiFi Web Server-26] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run
2016-01-10 22:52:54,377 INFO [NiFi Web Server-25] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run
2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis
2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis


On Secondary node (Data coming in)

2016-01-10 22:52:43,896 INFO [pool-18-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 88 milliseconds
2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec
2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20] at location /root/putt/275243509297560
2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 17037
2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/17031.prov in 56 milliseconds
2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 10 records
2016-01-10 22:52:45,034 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73
2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes) in 76 millis
2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis
2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec
2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20] at location /root/putt/275248510432074
2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis
From: Bryan Bende <bb...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Sunday, January 10, 2016 at 2:43 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

Glad you got site-to-site working.

Regarding the data distribution, I'm not sure why it is behaving that way. I just did a similar test running ncm, node1, and node2 all on my local machine, with GenerateFlowFile running every 10 seconds, and Input Port going to a LogAttribute, and I see it alternating between node1 and node2 logs every 10 seconds.

Is there anything in your primary node logs (primary_node/logs/nifi-app.log) when you see the data on the other node?

-Bryan


On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com>> wrote:
Chakri,

Would love to hear what you've learned and how that differed from the
docs themselves.  Site-to-site has proven difficult to setup so we're
clearly not there yet in having the right operator/admin experience.

Thanks
Joe

On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
<Ch...@lifelock.com>> wrote:
> I was able to get site-to-site work.
> I tried to follow your instructions to send data distribute across the
> nodes.
>
> GenerateFlowFile (On Primary) —> RPG
> RPG —> Input Port   —> Putfile (Time driven scheduling)
>
> However, data is only written to one slave (Secondary slave). Primary slave
> has not data.
>
> Image screenshot :
> http://tinyurl.com/jjvjtmq
>
> From: Chakrader Dewaragatla <ch...@lifelock.com>>
> Date: Sunday, January 10, 2016 at 11:26 AM
>
> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
>
> Bryan – Thanks – I am trying to setup site-to-site.
> I have two slaves and one NCM.
>
> My properties as follows :
>
> On both Slaves:
>
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> On NCM:
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> When I try drop remote process group (with http://<NCM IP>:8080/nifi), I see
> error as follows for two nodes.
>
> [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
> [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
>
> Do you have insight why its trying to connecting 8080 on slaves ? When do
> 10880 port come into the picture ? I remember try setting site to site few
> months back and succeeded.
>
> Thanks,
> -Chakri
>
>
>
> From: Bryan Bende <bb...@gmail.com>>
> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Date: Saturday, January 9, 2016 at 11:22 AM
> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
>
> The sending node (where the remote process group is) will distribute the
> data evenly across the two nodes, so an individual file will only be sent to
> one of the nodes. You could think of it as if a separate NiFi instance was
> sending directly to a two node cluster, it would be evenly distributing the
> data across the two nodes. In this case it just so happens to all be with in
> the same cluster.
>
> The most common use case for this scenario is the List and Fetch processors
> like HDFS. You can perform the listing on primary node, and then distribute
> the results so the fetching takes place on all nodes.
>
> On Saturday, January 9, 2016, Chakrader Dewaragatla
> <Ch...@lifelock.com>> wrote:
>>
>> Bryan – Thanks, how do the nodes distribute the load for a input port. As
>> port is open and listening on two nodes,  does it copy same files on both
>> the nodes?
>> I need to try this setup to see the results, appreciate your help.
>>
>> Thanks,
>> -Chakri
>>
>> From: Bryan Bende <bb...@gmail.com>>
>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>> Date: Friday, January 8, 2016 at 3:44 PM
>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Hi Chakri,
>>
>> I believe the DistributeLoad processor is more for load balancing when
>> sending to downstream systems. For example, if you had two HTTP endpoints,
>> you could have the first relationship from DistributeLoad going to a
>> PostHTTP that posts to endpoint #1, and the second relationship going to a
>> second PostHTTP that goes to endpoint #2.
>>
>> If you want to distribute the data with in the cluster, then you need to
>> use site-to-site. The way you do this is the following...
>>
>> - Add an Input Port connected to your PutFile.
>> - Add GenerateFlowFile scheduled on primary node only, connected to a
>> Remote Process Group. The Remote Process Group should be connected to the
>> Input Port from the previous step.
>>
>> So both nodes have an input port listening for data, but only the primary
>> node produces a FlowFile and sends it to the RPG which then re-distributes
>> it back to one of the Input Ports.
>>
>> In order for this to work you need to set nifi.remote.input.socket.port in
>> nifi.properties to some available port, and you probably want
>> nifi.remote.input.secure=false for testing.
>>
>> -Bryan
>>
>>
>> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>> <Ch...@lifelock.com>> wrote:
>>>
>>> Mark – I have setup a two node cluster and tried the following .
>>>  GenrateFlowfile processor (Run only on primary node) —> DistributionLoad
>>> processor (RoundRobin)   —> PutFile
>>>
>>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
>>> >> run on primary node only).
>>> From your above comment, It should put file on two nodes. It put files on
>>> primary node only. Any thoughts ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>>
>>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Date: Wednesday, October 7, 2015 at 11:28 AM
>>>
>>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Chakri,
>>>
>>> Correct - when NiFi instances are clustered, they do not transfer data
>>> between the nodes. This is very different
>>> than you might expect from something like Storm or Spark, as the key
>>> goals and design are quite different.
>>> We have discussed providing the ability to allow the user to indicate
>>> that they want to have the framework
>>> do load balancing for specific connections in the background, but it's
>>> still in more of a discussion phase.
>>>
>>> Site-to-Site is simply the capability that we have developed to transfer
>>> data between one instance of
>>> NiFi and another instance of NiFi. So currently, if we want to do load
>>> balancing across the cluster, we would
>>> create a site-to-site connection (by dragging a Remote Process Group onto
>>> the graph) and give that
>>> site-to-site connection the URL of our cluster. That way, you can push
>>> data to your own cluster, effectively
>>> providing a load balancing capability.
>>>
>>> If you were to just run ListenHTTP without setting it to Primary Node,
>>> then every node in the cluster will be listening
>>> for incoming HTTP connections. So you could then use a simple load
>>> balancer in front of NiFi to distribute the load
>>> across your cluster.
>>>
>>> Does this help? If you have any more questions we're happy to help!
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com>> wrote:
>>>
>>> Mark - Thanks for the notes.
>>>
>>> >> The other option would be to have a ListenHTTP processor run on
>>> >> Primary Node only and then use Site-to-Site to distribute the data to other
>>> >> nodes.
>>> Lets say I have 5 node cluster and ListenHTTP processor on Primary node,
>>> collected data on primary node is not transfered to other nodes by default
>>> for processing despite all nodes are part of one cluster?
>>> If ListenHTTP processor is running  as a dafult (with out explicit
>>> setting to run on primary node), how does the data transferred to rest of
>>> the nodes? Does site-to-site come in play when I make one processor to run
>>> on primary node ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>>
>>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Date: Wednesday, October 7, 2015 at 7:00 AM
>>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Hello Chakro,
>>>
>>> When you create a cluster of NiFi instances, each node in the cluster is
>>> acting independently and in exactly
>>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the
>>> same flow. However, they will be
>>> pulling in different data and therefore operating on different data.
>>>
>>> So if you pull in 10 1-gig files from S3, each of those files will be
>>> processed on the node that pulled the data
>>> in. NiFi does not currently shuffle data around between nodes in the
>>> cluster (you can use site-to-site to do
>>> this if you want to, but it won't happen automatically). If you set the
>>> number of Concurrent Tasks to 5, then
>>> you will have up to 5 threads running for that processor on each node.
>>>
>>> The only exception to this is the Primary Node. You can schedule a
>>> Processor to run only on the Primary Node
>>> by right-clicking on the Processor, and going to the Configure menu. In
>>> the Scheduling tab, you can change
>>> the Scheduling Strategy to Primary Node Only. In this case, that
>>> Processor will only be triggered to run on
>>> whichever node is elected the Primary Node (this can be changed in the
>>> Cluster management screen by clicking
>>> the appropriate icon in the top-right corner of the UI).
>>>
>>> The GetFile/PutFile will run on all nodes (unless you schedule it to run
>>> on primary node only).
>>>
>>> If you are attempting to have a single input running HTTP and then push
>>> that out across the entire cluster to
>>> process the data, you would have a few options. First, you could just use
>>> an HTTP Load Balancer in front of NiFi.
>>> The other option would be to have a ListenHTTP processor run on Primary
>>> Node only and then use Site-to-Site
>>> to distribute the data to other nodes.
>>>
>>> For more info on site-to-site, you can see the Site-to-Site section of
>>> the User Guide at
>>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>>>
>>> If you have any more questions, let us know!
>>>
>>> Thanks
>>> -Mark
>>>
>>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com>> wrote:
>>>
>>> Nifi Team – I would like to understand the advantages of Nifi clustering
>>> setup.
>>>
>>> Questions :
>>>
>>>  - How does workflow work on multiple nodes ? Does it share the resources
>>> intra nodes ?
>>> Lets say I need to pull data 10 1Gig files from S3, how does work load
>>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?
>>>
>>>  - How to “isolate” the processor to the master node (or one node)?
>>>
>>> - Getfile/Putfile processors on cluster setup, does it get/put on primary
>>> node ? How do I force processor to look in one of the slave node?
>>>
>>> - How can we have a workflow where the input side we want to receive
>>> requests (http) and then the rest of the pipeline need to run in parallel on
>>> all the nodes ?
>>>
>>> Thanks,
>>> -Chakro
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>
>>
>> ________________________________
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended recipient,
>> please contact the sender by reply email and destroy all copies of the
>> original message.
>> ________________________________
>
>
>
> --
> Sent from Gmail Mobile
> ________________________________
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of the
> original message.
> ________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Nifi cluster features - Questions

Posted by Mark Payne <ma...@hotmail.com>.
Chakri,

At this time, your only options are to run the processors on all nodes or a single node (Primary Node). There's no way to really group nodes together and say "only run on this set of nodes."

One option is to have a ListenHTTP Processor and then push data to that NiFi via PostHTTP (configure it to send FlowFile attributes along). By doing this, you could set up the sending NiFi
to only deliver data to two nodes. You could then have a different set of data going to a different two nodes, etc. by the way that you configure which data goes to which PostHTTP Processor.

Does this give you what you need?


> On Jan 11, 2016, at 3:20 PM, Chakrader Dewaragatla <Ch...@lifelock.com> wrote:
> 
> Thanks Mark. I will look into it.
> 
> Couple of questions: 
> 
> Going back to my earlier question, In a nifi cluster with two slaves and NCM how do I make two slaves accept and process the incoming flowfile in distibuted fashion. Site to site is the only way to go ? 
> In our use case, we have http listener running on primary node and putfile processor should run on two slaves in distributed fashion. 
> 
> It is more like a new (or existing) feature. 
>  - In a nifi cluster setup, can we group the machines and set site-to-site to individual group. 
>  For instance I have 10 node cluster, can I group them into 5 groups with two nodes each. Run processors on dedicated group (using site to site or other means).
> Thanks,
> -Chakri
> 
> From: Mark Payne <markap14@hotmail.com <ma...@hotmail.com>>
> Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> Date: Monday, January 11, 2016 at 5:24 AM
> To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
> 
> Chakri,
> 
> This line in the logs is particularly interesting (on primary node):
> 
>> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
>> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
> 
> This indicates that all of the site-to-site data will go to the host i-c894e249.dev.aws.lifelock.ad. Moreover, because that is the only node listed, this means
> that the NCM responded, indicating that this is the only node in the cluster that is currently connected and has site-to-site enabled. Can you double-check the nifi.properties
> file on the Primary Node and verify that the "nifi.remote.input.socket.port" is property is specified, and that the "nifi.remote.input.secure" property is set to "false"?
> Of note is that if the "nifi.remote.input.secure" property is set to true, but keystore and truststore are not specified, then site-to-site will be disabled (there would be a warning
> in the log in this case).
> 
> If you can verify that both of those properties are set properly on both nodes, then we can delve in further, but probably best to start by double-checking the easy things :)
> 
> Thanks
> -Mark
> 
> 
>> On Jan 10, 2016, at 5:55 PM, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
>> 
>> Bryan – Here are the logs : 
>> I have 5 sec flow file.
>> 
>> On primary node (No data coming in)
>> 
>> 2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis
>> 2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
>> 2016-01-10 22:52:39,450 INFO [pool-26-thread-16] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1 threads
>> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
>> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
>> 2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
>> 2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis
>> 2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] <http://10.228.68.73:8080/nifi]> Successfully sent [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 <nifi://i-c894e249.dev.aws.lifelock.ad:10880> in 50 milliseconds at a rate of 392 bytes/sec
>> 2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis
>> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] <http://10.228.68.73:8080/nifi]> Successfully sent [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 <nifi://i-c894e249.dev.aws.lifelock.ad:10880> in 51 milliseconds at a rate of 391 bytes/sec
>> 2016-01-10 22:52:45,092 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73
>> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
>> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes) in 61 millis
>> 2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis
>> 2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 301
>> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/295.prov in 111 milliseconds
>> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 8 records
>> 2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] <http://10.228.68.73:8080/nifi]> Successfully sent [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 <nifi://i-c894e249.dev.aws.lifelock.ad:10880> in 50 milliseconds at a rate of 392 bytes/sec
>> 2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis
>> 2016-01-10 22:52:54,326 INFO [NiFi Web Server-22] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa] to run
>> 2016-01-10 22:52:54,353 INFO [NiFi Web Server-26] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run
>> 2016-01-10 22:52:54,377 INFO [NiFi Web Server-25] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run
>> 2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis
>> 2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
>> 2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis
>> 
>> 
>> On Secondary node (Data coming in)
>> 
>> 2016-01-10 22:52:43,896 INFO [pool-18-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 88 milliseconds
>> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] <nifi://10.228.68.106:40611]> in 51 milliseconds at a rate of 387 bytes/sec
>> 2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20] at location /root/putt/275243509297560
>> 2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 17037
>> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/17031.prov in 56 milliseconds
>> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 10 records
>> 2016-01-10 22:52:45,034 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73
>> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
>> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes) in 76 millis
>> 2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis
>> 2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] <nifi://10.228.68.106:40611]> in 51 milliseconds at a rate of 387 bytes/sec
>> 2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20] at location /root/putt/275248510432074
>> 2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis
>> From: Bryan Bende <bbende@gmail.com <ma...@gmail.com>>
>> Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> Date: Sunday, January 10, 2016 at 2:43 PM
>> To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> Subject: Re: Nifi cluster features - Questions
>> 
>> Chakri,
>> 
>> Glad you got site-to-site working.
>> 
>> Regarding the data distribution, I'm not sure why it is behaving that way. I just did a similar test running ncm, node1, and node2 all on my local machine, with GenerateFlowFile running every 10 seconds, and Input Port going to a LogAttribute, and I see it alternating between node1 and node2 logs every 10 seconds.
>> 
>> Is there anything in your primary node logs (primary_node/logs/nifi-app.log) when you see the data on the other node? 
>> 
>> -Bryan
>> 
>> 
>> On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <joe.witt@gmail.com <ma...@gmail.com>> wrote:
>> Chakri,
>> 
>> Would love to hear what you've learned and how that differed from the
>> docs themselves.  Site-to-site has proven difficult to setup so we're
>> clearly not there yet in having the right operator/admin experience.
>> 
>> Thanks
>> Joe
>> 
>> On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
>> <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
>> > I was able to get site-to-site work.
>> > I tried to follow your instructions to send data distribute across the
>> > nodes.
>> >
>> > GenerateFlowFile (On Primary) —> RPG
>> > RPG —> Input Port   —> Putfile (Time driven scheduling)
>> >
>> > However, data is only written to one slave (Secondary slave). Primary slave
>> > has not data.
>> >
>> > Image screenshot :
>> > http://tinyurl.com/jjvjtmq <http://tinyurl.com/jjvjtmq>
>> >
>> > From: Chakrader Dewaragatla <chakrader.dewaragatla@lifelock.com <ma...@lifelock.com>>
>> > Date: Sunday, January 10, 2016 at 11:26 AM
>> >
>> > To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> > Subject: Re: Nifi cluster features - Questions
>> >
>> > Bryan – Thanks – I am trying to setup site-to-site.
>> > I have two slaves and one NCM.
>> >
>> > My properties as follows :
>> >
>> > On both Slaves:
>> >
>> > nifi.remote.input.socket.port=10880
>> > nifi.remote.input.secure=false
>> >
>> > On NCM:
>> > nifi.remote.input.socket.port=10880
>> > nifi.remote.input.secure=false
>> >
>> > When I try drop remote process group (with http://<NCM <http://<NCM> IP>:8080/nifi), I see
>> > error as follows for two nodes.
>> >
>> > [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
>> > communication
>> > [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
>> > communication
>> >
>> > Do you have insight why its trying to connecting 8080 on slaves ? When do
>> > 10880 port come into the picture ? I remember try setting site to site few
>> > months back and succeeded.
>> >
>> > Thanks,
>> > -Chakri
>> >
>> >
>> >
>> > From: Bryan Bende <bbende@gmail.com <ma...@gmail.com>>
>> > Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> > Date: Saturday, January 9, 2016 at 11:22 AM
>> > To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> > Subject: Re: Nifi cluster features - Questions
>> >
>> > The sending node (where the remote process group is) will distribute the
>> > data evenly across the two nodes, so an individual file will only be sent to
>> > one of the nodes. You could think of it as if a separate NiFi instance was
>> > sending directly to a two node cluster, it would be evenly distributing the
>> > data across the two nodes. In this case it just so happens to all be with in
>> > the same cluster.
>> >
>> > The most common use case for this scenario is the List and Fetch processors
>> > like HDFS. You can perform the listing on primary node, and then distribute
>> > the results so the fetching takes place on all nodes.
>> >
>> > On Saturday, January 9, 2016, Chakrader Dewaragatla
>> > <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
>> >>
>> >> Bryan – Thanks, how do the nodes distribute the load for a input port. As
>> >> port is open and listening on two nodes,  does it copy same files on both
>> >> the nodes?
>> >> I need to try this setup to see the results, appreciate your help.
>> >>
>> >> Thanks,
>> >> -Chakri
>> >>
>> >> From: Bryan Bende <bbende@gmail.com <ma...@gmail.com>>
>> >> Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> >> Date: Friday, January 8, 2016 at 3:44 PM
>> >> To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> >> Subject: Re: Nifi cluster features - Questions
>> >>
>> >> Hi Chakri,
>> >>
>> >> I believe the DistributeLoad processor is more for load balancing when
>> >> sending to downstream systems. For example, if you had two HTTP endpoints,
>> >> you could have the first relationship from DistributeLoad going to a
>> >> PostHTTP that posts to endpoint #1, and the second relationship going to a
>> >> second PostHTTP that goes to endpoint #2.
>> >>
>> >> If you want to distribute the data with in the cluster, then you need to
>> >> use site-to-site. The way you do this is the following...
>> >>
>> >> - Add an Input Port connected to your PutFile.
>> >> - Add GenerateFlowFile scheduled on primary node only, connected to a
>> >> Remote Process Group. The Remote Process Group should be connected to the
>> >> Input Port from the previous step.
>> >>
>> >> So both nodes have an input port listening for data, but only the primary
>> >> node produces a FlowFile and sends it to the RPG which then re-distributes
>> >> it back to one of the Input Ports.
>> >>
>> >> In order for this to work you need to set nifi.remote.input.socket.port in
>> >> nifi.properties to some available port, and you probably want
>> >> nifi.remote.input.secure=false for testing.
>> >>
>> >> -Bryan
>> >>
>> >>
>> >> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>> >> <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
>> >>>
>> >>> Mark – I have setup a two node cluster and tried the following .
>> >>>  GenrateFlowfile processor (Run only on primary node) —> DistributionLoad
>> >>> processor (RoundRobin)   —> PutFile
>> >>>
>> >>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
>> >>> >> run on primary node only).
>> >>> From your above comment, It should put file on two nodes. It put files on
>> >>> primary node only. Any thoughts ?
>> >>>
>> >>> Thanks,
>> >>> -Chakri
>> >>>
>> >>> From: Mark Payne <markap14@hotmail.com <ma...@hotmail.com>>
>> >>> Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> >>> Date: Wednesday, October 7, 2015 at 11:28 AM
>> >>>
>> >>> To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> >>> Subject: Re: Nifi cluster features - Questions
>> >>>
>> >>> Chakri,
>> >>>
>> >>> Correct - when NiFi instances are clustered, they do not transfer data
>> >>> between the nodes. This is very different
>> >>> than you might expect from something like Storm or Spark, as the key
>> >>> goals and design are quite different.
>> >>> We have discussed providing the ability to allow the user to indicate
>> >>> that they want to have the framework
>> >>> do load balancing for specific connections in the background, but it's
>> >>> still in more of a discussion phase.
>> >>>
>> >>> Site-to-Site is simply the capability that we have developed to transfer
>> >>> data between one instance of
>> >>> NiFi and another instance of NiFi. So currently, if we want to do load
>> >>> balancing across the cluster, we would
>> >>> create a site-to-site connection (by dragging a Remote Process Group onto
>> >>> the graph) and give that
>> >>> site-to-site connection the URL of our cluster. That way, you can push
>> >>> data to your own cluster, effectively
>> >>> providing a load balancing capability.
>> >>>
>> >>> If you were to just run ListenHTTP without setting it to Primary Node,
>> >>> then every node in the cluster will be listening
>> >>> for incoming HTTP connections. So you could then use a simple load
>> >>> balancer in front of NiFi to distribute the load
>> >>> across your cluster.
>> >>>
>> >>> Does this help? If you have any more questions we're happy to help!
>> >>>
>> >>> Thanks
>> >>> -Mark
>> >>>
>> >>>
>> >>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>> >>> <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
>> >>>
>> >>> Mark - Thanks for the notes.
>> >>>
>> >>> >> The other option would be to have a ListenHTTP processor run on
>> >>> >> Primary Node only and then use Site-to-Site to distribute the data to other
>> >>> >> nodes.
>> >>> Lets say I have 5 node cluster and ListenHTTP processor on Primary node,
>> >>> collected data on primary node is not transfered to other nodes by default
>> >>> for processing despite all nodes are part of one cluster?
>> >>> If ListenHTTP processor is running  as a dafult (with out explicit
>> >>> setting to run on primary node), how does the data transferred to rest of
>> >>> the nodes? Does site-to-site come in play when I make one processor to run
>> >>> on primary node ?
>> >>>
>> >>> Thanks,
>> >>> -Chakri
>> >>>
>> >>> From: Mark Payne <markap14@hotmail.com <ma...@hotmail.com>>
>> >>> Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> >>> Date: Wednesday, October 7, 2015 at 7:00 AM
>> >>> To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> >>> Subject: Re: Nifi cluster features - Questions
>> >>>
>> >>> Hello Chakro,
>> >>>
>> >>> When you create a cluster of NiFi instances, each node in the cluster is
>> >>> acting independently and in exactly
>> >>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the
>> >>> same flow. However, they will be
>> >>> pulling in different data and therefore operating on different data.
>> >>>
>> >>> So if you pull in 10 1-gig files from S3, each of those files will be
>> >>> processed on the node that pulled the data
>> >>> in. NiFi does not currently shuffle data around between nodes in the
>> >>> cluster (you can use site-to-site to do
>> >>> this if you want to, but it won't happen automatically). If you set the
>> >>> number of Concurrent Tasks to 5, then
>> >>> you will have up to 5 threads running for that processor on each node.
>> >>>
>> >>> The only exception to this is the Primary Node. You can schedule a
>> >>> Processor to run only on the Primary Node
>> >>> by right-clicking on the Processor, and going to the Configure menu. In
>> >>> the Scheduling tab, you can change
>> >>> the Scheduling Strategy to Primary Node Only. In this case, that
>> >>> Processor will only be triggered to run on
>> >>> whichever node is elected the Primary Node (this can be changed in the
>> >>> Cluster management screen by clicking
>> >>> the appropriate icon in the top-right corner of the UI).
>> >>>
>> >>> The GetFile/PutFile will run on all nodes (unless you schedule it to run
>> >>> on primary node only).
>> >>>
>> >>> If you are attempting to have a single input running HTTP and then push
>> >>> that out across the entire cluster to
>> >>> process the data, you would have a few options. First, you could just use
>> >>> an HTTP Load Balancer in front of NiFi.
>> >>> The other option would be to have a ListenHTTP processor run on Primary
>> >>> Node only and then use Site-to-Site
>> >>> to distribute the data to other nodes.
>> >>>
>> >>> For more info on site-to-site, you can see the Site-to-Site section of
>> >>> the User Guide at
>> >>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site <http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site>
>> >>>
>> >>> If you have any more questions, let us know!
>> >>>
>> >>> Thanks
>> >>> -Mark
>> >>>
>> >>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>> >>> <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
>> >>>
>> >>> Nifi Team – I would like to understand the advantages of Nifi clustering
>> >>> setup.
>> >>>
>> >>> Questions :
>> >>>
>> >>>  - How does workflow work on multiple nodes ? Does it share the resources
>> >>> intra nodes ?
>> >>> Lets say I need to pull data 10 1Gig files from S3, how does work load
>> >>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?
>> >>>
>> >>>  - How to “isolate” the processor to the master node (or one node)?
>> >>>
>> >>> - Getfile/Putfile processors on cluster setup, does it get/put on primary
>> >>> node ? How do I force processor to look in one of the slave node?
>> >>>
>> >>> - How can we have a workflow where the input side we want to receive
>> >>> requests (http) and then the rest of the pipeline need to run in parallel on
>> >>> all the nodes ?
>> >>>
>> >>> Thanks,
>> >>> -Chakro
>> >>>
>> >>> ________________________________
>> >>> The information contained in this transmission may contain privileged and
>> >>> confidential information. It is intended only for the use of the person(s)
>> >>> named above. If you are not the intended recipient, you are hereby notified
>> >>> that any review, dissemination, distribution or duplication of this
>> >>> communication is strictly prohibited. If you are not the intended recipient,
>> >>> please contact the sender by reply email and destroy all copies of the
>> >>> original message.
>> >>> ________________________________
>> >>>
>> >>>
>> >>> ________________________________
>> >>> The information contained in this transmission may contain privileged and
>> >>> confidential information. It is intended only for the use of the person(s)
>> >>> named above. If you are not the intended recipient, you are hereby notified
>> >>> that any review, dissemination, distribution or duplication of this
>> >>> communication is strictly prohibited. If you are not the intended recipient,
>> >>> please contact the sender by reply email and destroy all copies of the
>> >>> original message.
>> >>> ________________________________
>> >>>
>> >>>
>> >>> ________________________________
>> >>> The information contained in this transmission may contain privileged and
>> >>> confidential information. It is intended only for the use of the person(s)
>> >>> named above. If you are not the intended recipient, you are hereby notified
>> >>> that any review, dissemination, distribution or duplication of this
>> >>> communication is strictly prohibited. If you are not the intended recipient,
>> >>> please contact the sender by reply email and destroy all copies of the
>> >>> original message.
>> >>> ________________________________
>> >>
>> >>
>> >> ________________________________
>> >> The information contained in this transmission may contain privileged and
>> >> confidential information. It is intended only for the use of the person(s)
>> >> named above. If you are not the intended recipient, you are hereby notified
>> >> that any review, dissemination, distribution or duplication of this
>> >> communication is strictly prohibited. If you are not the intended recipient,
>> >> please contact the sender by reply email and destroy all copies of the
>> >> original message.
>> >> ________________________________
>> >
>> >
>> >
>> > --
>> > Sent from Gmail Mobile
>> > ________________________________
>> > The information contained in this transmission may contain privileged and
>> > confidential information. It is intended only for the use of the person(s)
>> > named above. If you are not the intended recipient, you are hereby notified
>> > that any review, dissemination, distribution or duplication of this
>> > communication is strictly prohibited. If you are not the intended recipient,
>> > please contact the sender by reply email and destroy all copies of the
>> > original message.
>> > ________________________________
>> 
>> The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
> 
> The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.


Re: Nifi cluster features - Questions

Posted by Chakrader Dewaragatla <Ch...@lifelock.com>.
Thanks Mark. I will look into it.

Couple of questions:


  *
Going back to my earlier question, In a nifi cluster with two slaves and NCM how do I make two slaves accept and process the incoming flowfile in distibuted fashion. Site to site is the only way to go ?
In our use case, we have http listener running on primary node and putfile processor should run on two slaves in distributed fashion.

It is more like a new (or existing) feature.
 - In a nifi cluster setup, can we group the machines and set site-to-site to individual group.
 For instance I have 10 node cluster, can I group them into 5 groups with two nodes each. Run processors on dedicated group (using site to site or other means).

Thanks,
-Chakri

From: Mark Payne <ma...@hotmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Monday, January 11, 2016 at 5:24 AM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

This line in the logs is particularly interesting (on primary node):

2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data

This indicates that all of the site-to-site data will go to the host i-c894e249.dev.aws.lifelock.ad. Moreover, because that is the only node listed, this means
that the NCM responded, indicating that this is the only node in the cluster that is currently connected and has site-to-site enabled. Can you double-check the nifi.properties
file on the Primary Node and verify that the "nifi.remote.input.socket.port" is property is specified, and that the "nifi.remote.input.secure" property is set to "false"?
Of note is that if the "nifi.remote.input.secure" property is set to true, but keystore and truststore are not specified, then site-to-site will be disabled (there would be a warning
in the log in this case).

If you can verify that both of those properties are set properly on both nodes, then we can delve in further, but probably best to start by double-checking the easy things :)

Thanks
-Mark


On Jan 10, 2016, at 5:55 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Bryan – Here are the logs :
I have 5 sec flow file.

On primary node (No data coming in)

2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis
2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:39,450 INFO [pool-26-thread-16] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1 threads
2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis
2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec
2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis
2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 51 milliseconds at a rate of 391 bytes/sec
2016-01-10 22:52:45,092 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73
2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes) in 61 millis
2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis
2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 301
2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/295.prov in 111 milliseconds
2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 8 records
2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec
2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis
2016-01-10 22:52:54,326 INFO [NiFi Web Server-22] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa] to run
2016-01-10 22:52:54,353 INFO [NiFi Web Server-26] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run
2016-01-10 22:52:54,377 INFO [NiFi Web Server-25] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run
2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis
2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis


On Secondary node (Data coming in)

2016-01-10 22:52:43,896 INFO [pool-18-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 88 milliseconds
2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec
2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20] at location /root/putt/275243509297560
2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 17037
2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/17031.prov in 56 milliseconds
2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 10 records
2016-01-10 22:52:45,034 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73
2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes) in 76 millis
2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis
2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec
2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20] at location /root/putt/275248510432074
2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis
From: Bryan Bende <bb...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Sunday, January 10, 2016 at 2:43 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

Glad you got site-to-site working.

Regarding the data distribution, I'm not sure why it is behaving that way. I just did a similar test running ncm, node1, and node2 all on my local machine, with GenerateFlowFile running every 10 seconds, and Input Port going to a LogAttribute, and I see it alternating between node1 and node2 logs every 10 seconds.

Is there anything in your primary node logs (primary_node/logs/nifi-app.log) when you see the data on the other node?

-Bryan


On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com>> wrote:
Chakri,

Would love to hear what you've learned and how that differed from the
docs themselves.  Site-to-site has proven difficult to setup so we're
clearly not there yet in having the right operator/admin experience.

Thanks
Joe

On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
<Ch...@lifelock.com>> wrote:
> I was able to get site-to-site work.
> I tried to follow your instructions to send data distribute across the
> nodes.
>
> GenerateFlowFile (On Primary) —> RPG
> RPG —> Input Port   —> Putfile (Time driven scheduling)
>
> However, data is only written to one slave (Secondary slave). Primary slave
> has not data.
>
> Image screenshot :
> http://tinyurl.com/jjvjtmq
>
> From: Chakrader Dewaragatla <ch...@lifelock.com>>
> Date: Sunday, January 10, 2016 at 11:26 AM
>
> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
>
> Bryan – Thanks – I am trying to setup site-to-site.
> I have two slaves and one NCM.
>
> My properties as follows :
>
> On both Slaves:
>
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> On NCM:
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> When I try drop remote process group (with http://<NCM IP>:8080/nifi), I see
> error as follows for two nodes.
>
> [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
> [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
>
> Do you have insight why its trying to connecting 8080 on slaves ? When do
> 10880 port come into the picture ? I remember try setting site to site few
> months back and succeeded.
>
> Thanks,
> -Chakri
>
>
>
> From: Bryan Bende <bb...@gmail.com>>
> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Date: Saturday, January 9, 2016 at 11:22 AM
> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
>
> The sending node (where the remote process group is) will distribute the
> data evenly across the two nodes, so an individual file will only be sent to
> one of the nodes. You could think of it as if a separate NiFi instance was
> sending directly to a two node cluster, it would be evenly distributing the
> data across the two nodes. In this case it just so happens to all be with in
> the same cluster.
>
> The most common use case for this scenario is the List and Fetch processors
> like HDFS. You can perform the listing on primary node, and then distribute
> the results so the fetching takes place on all nodes.
>
> On Saturday, January 9, 2016, Chakrader Dewaragatla
> <Ch...@lifelock.com>> wrote:
>>
>> Bryan – Thanks, how do the nodes distribute the load for a input port. As
>> port is open and listening on two nodes,  does it copy same files on both
>> the nodes?
>> I need to try this setup to see the results, appreciate your help.
>>
>> Thanks,
>> -Chakri
>>
>> From: Bryan Bende <bb...@gmail.com>>
>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>> Date: Friday, January 8, 2016 at 3:44 PM
>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Hi Chakri,
>>
>> I believe the DistributeLoad processor is more for load balancing when
>> sending to downstream systems. For example, if you had two HTTP endpoints,
>> you could have the first relationship from DistributeLoad going to a
>> PostHTTP that posts to endpoint #1, and the second relationship going to a
>> second PostHTTP that goes to endpoint #2.
>>
>> If you want to distribute the data with in the cluster, then you need to
>> use site-to-site. The way you do this is the following...
>>
>> - Add an Input Port connected to your PutFile.
>> - Add GenerateFlowFile scheduled on primary node only, connected to a
>> Remote Process Group. The Remote Process Group should be connected to the
>> Input Port from the previous step.
>>
>> So both nodes have an input port listening for data, but only the primary
>> node produces a FlowFile and sends it to the RPG which then re-distributes
>> it back to one of the Input Ports.
>>
>> In order for this to work you need to set nifi.remote.input.socket.port in
>> nifi.properties to some available port, and you probably want
>> nifi.remote.input.secure=false for testing.
>>
>> -Bryan
>>
>>
>> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>> <Ch...@lifelock.com>> wrote:
>>>
>>> Mark – I have setup a two node cluster and tried the following .
>>>  GenrateFlowfile processor (Run only on primary node) —> DistributionLoad
>>> processor (RoundRobin)   —> PutFile
>>>
>>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
>>> >> run on primary node only).
>>> From your above comment, It should put file on two nodes. It put files on
>>> primary node only. Any thoughts ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>>
>>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Date: Wednesday, October 7, 2015 at 11:28 AM
>>>
>>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Chakri,
>>>
>>> Correct - when NiFi instances are clustered, they do not transfer data
>>> between the nodes. This is very different
>>> than you might expect from something like Storm or Spark, as the key
>>> goals and design are quite different.
>>> We have discussed providing the ability to allow the user to indicate
>>> that they want to have the framework
>>> do load balancing for specific connections in the background, but it's
>>> still in more of a discussion phase.
>>>
>>> Site-to-Site is simply the capability that we have developed to transfer
>>> data between one instance of
>>> NiFi and another instance of NiFi. So currently, if we want to do load
>>> balancing across the cluster, we would
>>> create a site-to-site connection (by dragging a Remote Process Group onto
>>> the graph) and give that
>>> site-to-site connection the URL of our cluster. That way, you can push
>>> data to your own cluster, effectively
>>> providing a load balancing capability.
>>>
>>> If you were to just run ListenHTTP without setting it to Primary Node,
>>> then every node in the cluster will be listening
>>> for incoming HTTP connections. So you could then use a simple load
>>> balancer in front of NiFi to distribute the load
>>> across your cluster.
>>>
>>> Does this help? If you have any more questions we're happy to help!
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com>> wrote:
>>>
>>> Mark - Thanks for the notes.
>>>
>>> >> The other option would be to have a ListenHTTP processor run on
>>> >> Primary Node only and then use Site-to-Site to distribute the data to other
>>> >> nodes.
>>> Lets say I have 5 node cluster and ListenHTTP processor on Primary node,
>>> collected data on primary node is not transfered to other nodes by default
>>> for processing despite all nodes are part of one cluster?
>>> If ListenHTTP processor is running  as a dafult (with out explicit
>>> setting to run on primary node), how does the data transferred to rest of
>>> the nodes? Does site-to-site come in play when I make one processor to run
>>> on primary node ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>>
>>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Date: Wednesday, October 7, 2015 at 7:00 AM
>>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Hello Chakro,
>>>
>>> When you create a cluster of NiFi instances, each node in the cluster is
>>> acting independently and in exactly
>>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the
>>> same flow. However, they will be
>>> pulling in different data and therefore operating on different data.
>>>
>>> So if you pull in 10 1-gig files from S3, each of those files will be
>>> processed on the node that pulled the data
>>> in. NiFi does not currently shuffle data around between nodes in the
>>> cluster (you can use site-to-site to do
>>> this if you want to, but it won't happen automatically). If you set the
>>> number of Concurrent Tasks to 5, then
>>> you will have up to 5 threads running for that processor on each node.
>>>
>>> The only exception to this is the Primary Node. You can schedule a
>>> Processor to run only on the Primary Node
>>> by right-clicking on the Processor, and going to the Configure menu. In
>>> the Scheduling tab, you can change
>>> the Scheduling Strategy to Primary Node Only. In this case, that
>>> Processor will only be triggered to run on
>>> whichever node is elected the Primary Node (this can be changed in the
>>> Cluster management screen by clicking
>>> the appropriate icon in the top-right corner of the UI).
>>>
>>> The GetFile/PutFile will run on all nodes (unless you schedule it to run
>>> on primary node only).
>>>
>>> If you are attempting to have a single input running HTTP and then push
>>> that out across the entire cluster to
>>> process the data, you would have a few options. First, you could just use
>>> an HTTP Load Balancer in front of NiFi.
>>> The other option would be to have a ListenHTTP processor run on Primary
>>> Node only and then use Site-to-Site
>>> to distribute the data to other nodes.
>>>
>>> For more info on site-to-site, you can see the Site-to-Site section of
>>> the User Guide at
>>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>>>
>>> If you have any more questions, let us know!
>>>
>>> Thanks
>>> -Mark
>>>
>>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com>> wrote:
>>>
>>> Nifi Team – I would like to understand the advantages of Nifi clustering
>>> setup.
>>>
>>> Questions :
>>>
>>>  - How does workflow work on multiple nodes ? Does it share the resources
>>> intra nodes ?
>>> Lets say I need to pull data 10 1Gig files from S3, how does work load
>>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?
>>>
>>>  - How to “isolate” the processor to the master node (or one node)?
>>>
>>> - Getfile/Putfile processors on cluster setup, does it get/put on primary
>>> node ? How do I force processor to look in one of the slave node?
>>>
>>> - How can we have a workflow where the input side we want to receive
>>> requests (http) and then the rest of the pipeline need to run in parallel on
>>> all the nodes ?
>>>
>>> Thanks,
>>> -Chakro
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>
>>
>> ________________________________
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended recipient,
>> please contact the sender by reply email and destroy all copies of the
>> original message.
>> ________________________________
>
>
>
> --
> Sent from Gmail Mobile
> ________________________________
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of the
> original message.
> ________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Nifi cluster features - Questions

Posted by Mark Payne <ma...@hotmail.com>.
Chakri,

This line in the logs is particularly interesting (on primary node):

> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data

This indicates that all of the site-to-site data will go to the host i-c894e249.dev.aws.lifelock.ad. Moreover, because that is the only node listed, this means
that the NCM responded, indicating that this is the only node in the cluster that is currently connected and has site-to-site enabled. Can you double-check the nifi.properties
file on the Primary Node and verify that the "nifi.remote.input.socket.port" is property is specified, and that the "nifi.remote.input.secure" property is set to "false"?
Of note is that if the "nifi.remote.input.secure" property is set to true, but keystore and truststore are not specified, then site-to-site will be disabled (there would be a warning
in the log in this case).

If you can verify that both of those properties are set properly on both nodes, then we can delve in further, but probably best to start by double-checking the easy things :)

Thanks
-Mark


> On Jan 10, 2016, at 5:55 PM, Chakrader Dewaragatla <Ch...@lifelock.com> wrote:
> 
> Bryan – Here are the logs : 
> I have 5 sec flow file.
> 
> On primary node (No data coming in)
> 
> 2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis
> 2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
> 2016-01-10 22:52:39,450 INFO [pool-26-thread-16] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1 threads
> 2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:
> Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data
> 2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
> 2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis
> 2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec
> 2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis
> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 51 milliseconds at a rate of 391 bytes/sec
> 2016-01-10 22:52:45,092 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73
> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
> 2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes) in 61 millis
> 2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis
> 2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 301
> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/295.prov in 111 milliseconds
> 2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 8 records
> 2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec
> 2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis
> 2016-01-10 22:52:54,326 INFO [NiFi Web Server-22] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa] to run
> 2016-01-10 22:52:54,353 INFO [NiFi Web Server-26] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run
> 2016-01-10 22:52:54,377 INFO [NiFi Web Server-25] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run
> 2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis
> 2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false
> 2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis
> 
> 
> On Secondary node (Data coming in)
> 
> 2016-01-10 22:52:43,896 INFO [pool-18-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 88 milliseconds
> 2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec
> 2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20] at location /root/putt/275243509297560
> 2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 17037
> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/17031.prov in 56 milliseconds
> 2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 10 records
> 2016-01-10 22:52:45,034 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73
> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.
> 2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes) in 76 millis
> 2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis
> 2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec
> 2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20] at location /root/putt/275248510432074
> 2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis
> From: Bryan Bende <bbende@gmail.com <ma...@gmail.com>>
> Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> Date: Sunday, January 10, 2016 at 2:43 PM
> To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
> 
> Chakri,
> 
> Glad you got site-to-site working.
> 
> Regarding the data distribution, I'm not sure why it is behaving that way. I just did a similar test running ncm, node1, and node2 all on my local machine, with GenerateFlowFile running every 10 seconds, and Input Port going to a LogAttribute, and I see it alternating between node1 and node2 logs every 10 seconds.
> 
> Is there anything in your primary node logs (primary_node/logs/nifi-app.log) when you see the data on the other node? 
> 
> -Bryan
> 
> 
> On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <joe.witt@gmail.com <ma...@gmail.com>> wrote:
> Chakri,
> 
> Would love to hear what you've learned and how that differed from the
> docs themselves.  Site-to-site has proven difficult to setup so we're
> clearly not there yet in having the right operator/admin experience.
> 
> Thanks
> Joe
> 
> On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
> <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
> > I was able to get site-to-site work.
> > I tried to follow your instructions to send data distribute across the
> > nodes.
> >
> > GenerateFlowFile (On Primary) —> RPG
> > RPG —> Input Port   —> Putfile (Time driven scheduling)
> >
> > However, data is only written to one slave (Secondary slave). Primary slave
> > has not data.
> >
> > Image screenshot :
> > http://tinyurl.com/jjvjtmq <http://tinyurl.com/jjvjtmq>
> >
> > From: Chakrader Dewaragatla <chakrader.dewaragatla@lifelock.com <ma...@lifelock.com>>
> > Date: Sunday, January 10, 2016 at 11:26 AM
> >
> > To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> > Subject: Re: Nifi cluster features - Questions
> >
> > Bryan – Thanks – I am trying to setup site-to-site.
> > I have two slaves and one NCM.
> >
> > My properties as follows :
> >
> > On both Slaves:
> >
> > nifi.remote.input.socket.port=10880
> > nifi.remote.input.secure=false
> >
> > On NCM:
> > nifi.remote.input.socket.port=10880
> > nifi.remote.input.secure=false
> >
> > When I try drop remote process group (with http://<NCM <http://<NCM> IP>:8080/nifi), I see
> > error as follows for two nodes.
> >
> > [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
> > communication
> > [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
> > communication
> >
> > Do you have insight why its trying to connecting 8080 on slaves ? When do
> > 10880 port come into the picture ? I remember try setting site to site few
> > months back and succeeded.
> >
> > Thanks,
> > -Chakri
> >
> >
> >
> > From: Bryan Bende <bbende@gmail.com <ma...@gmail.com>>
> > Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> > Date: Saturday, January 9, 2016 at 11:22 AM
> > To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> > Subject: Re: Nifi cluster features - Questions
> >
> > The sending node (where the remote process group is) will distribute the
> > data evenly across the two nodes, so an individual file will only be sent to
> > one of the nodes. You could think of it as if a separate NiFi instance was
> > sending directly to a two node cluster, it would be evenly distributing the
> > data across the two nodes. In this case it just so happens to all be with in
> > the same cluster.
> >
> > The most common use case for this scenario is the List and Fetch processors
> > like HDFS. You can perform the listing on primary node, and then distribute
> > the results so the fetching takes place on all nodes.
> >
> > On Saturday, January 9, 2016, Chakrader Dewaragatla
> > <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
> >>
> >> Bryan – Thanks, how do the nodes distribute the load for a input port. As
> >> port is open and listening on two nodes,  does it copy same files on both
> >> the nodes?
> >> I need to try this setup to see the results, appreciate your help.
> >>
> >> Thanks,
> >> -Chakri
> >>
> >> From: Bryan Bende <bbende@gmail.com <ma...@gmail.com>>
> >> Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> >> Date: Friday, January 8, 2016 at 3:44 PM
> >> To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> >> Subject: Re: Nifi cluster features - Questions
> >>
> >> Hi Chakri,
> >>
> >> I believe the DistributeLoad processor is more for load balancing when
> >> sending to downstream systems. For example, if you had two HTTP endpoints,
> >> you could have the first relationship from DistributeLoad going to a
> >> PostHTTP that posts to endpoint #1, and the second relationship going to a
> >> second PostHTTP that goes to endpoint #2.
> >>
> >> If you want to distribute the data with in the cluster, then you need to
> >> use site-to-site. The way you do this is the following...
> >>
> >> - Add an Input Port connected to your PutFile.
> >> - Add GenerateFlowFile scheduled on primary node only, connected to a
> >> Remote Process Group. The Remote Process Group should be connected to the
> >> Input Port from the previous step.
> >>
> >> So both nodes have an input port listening for data, but only the primary
> >> node produces a FlowFile and sends it to the RPG which then re-distributes
> >> it back to one of the Input Ports.
> >>
> >> In order for this to work you need to set nifi.remote.input.socket.port in
> >> nifi.properties to some available port, and you probably want
> >> nifi.remote.input.secure=false for testing.
> >>
> >> -Bryan
> >>
> >>
> >> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
> >> <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
> >>>
> >>> Mark – I have setup a two node cluster and tried the following .
> >>>  GenrateFlowfile processor (Run only on primary node) —> DistributionLoad
> >>> processor (RoundRobin)   —> PutFile
> >>>
> >>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
> >>> >> run on primary node only).
> >>> From your above comment, It should put file on two nodes. It put files on
> >>> primary node only. Any thoughts ?
> >>>
> >>> Thanks,
> >>> -Chakri
> >>>
> >>> From: Mark Payne <markap14@hotmail.com <ma...@hotmail.com>>
> >>> Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> >>> Date: Wednesday, October 7, 2015 at 11:28 AM
> >>>
> >>> To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> >>> Subject: Re: Nifi cluster features - Questions
> >>>
> >>> Chakri,
> >>>
> >>> Correct - when NiFi instances are clustered, they do not transfer data
> >>> between the nodes. This is very different
> >>> than you might expect from something like Storm or Spark, as the key
> >>> goals and design are quite different.
> >>> We have discussed providing the ability to allow the user to indicate
> >>> that they want to have the framework
> >>> do load balancing for specific connections in the background, but it's
> >>> still in more of a discussion phase.
> >>>
> >>> Site-to-Site is simply the capability that we have developed to transfer
> >>> data between one instance of
> >>> NiFi and another instance of NiFi. So currently, if we want to do load
> >>> balancing across the cluster, we would
> >>> create a site-to-site connection (by dragging a Remote Process Group onto
> >>> the graph) and give that
> >>> site-to-site connection the URL of our cluster. That way, you can push
> >>> data to your own cluster, effectively
> >>> providing a load balancing capability.
> >>>
> >>> If you were to just run ListenHTTP without setting it to Primary Node,
> >>> then every node in the cluster will be listening
> >>> for incoming HTTP connections. So you could then use a simple load
> >>> balancer in front of NiFi to distribute the load
> >>> across your cluster.
> >>>
> >>> Does this help? If you have any more questions we're happy to help!
> >>>
> >>> Thanks
> >>> -Mark
> >>>
> >>>
> >>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
> >>> <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
> >>>
> >>> Mark - Thanks for the notes.
> >>>
> >>> >> The other option would be to have a ListenHTTP processor run on
> >>> >> Primary Node only and then use Site-to-Site to distribute the data to other
> >>> >> nodes.
> >>> Lets say I have 5 node cluster and ListenHTTP processor on Primary node,
> >>> collected data on primary node is not transfered to other nodes by default
> >>> for processing despite all nodes are part of one cluster?
> >>> If ListenHTTP processor is running  as a dafult (with out explicit
> >>> setting to run on primary node), how does the data transferred to rest of
> >>> the nodes? Does site-to-site come in play when I make one processor to run
> >>> on primary node ?
> >>>
> >>> Thanks,
> >>> -Chakri
> >>>
> >>> From: Mark Payne <markap14@hotmail.com <ma...@hotmail.com>>
> >>> Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> >>> Date: Wednesday, October 7, 2015 at 7:00 AM
> >>> To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
> >>> Subject: Re: Nifi cluster features - Questions
> >>>
> >>> Hello Chakro,
> >>>
> >>> When you create a cluster of NiFi instances, each node in the cluster is
> >>> acting independently and in exactly
> >>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the
> >>> same flow. However, they will be
> >>> pulling in different data and therefore operating on different data.
> >>>
> >>> So if you pull in 10 1-gig files from S3, each of those files will be
> >>> processed on the node that pulled the data
> >>> in. NiFi does not currently shuffle data around between nodes in the
> >>> cluster (you can use site-to-site to do
> >>> this if you want to, but it won't happen automatically). If you set the
> >>> number of Concurrent Tasks to 5, then
> >>> you will have up to 5 threads running for that processor on each node.
> >>>
> >>> The only exception to this is the Primary Node. You can schedule a
> >>> Processor to run only on the Primary Node
> >>> by right-clicking on the Processor, and going to the Configure menu. In
> >>> the Scheduling tab, you can change
> >>> the Scheduling Strategy to Primary Node Only. In this case, that
> >>> Processor will only be triggered to run on
> >>> whichever node is elected the Primary Node (this can be changed in the
> >>> Cluster management screen by clicking
> >>> the appropriate icon in the top-right corner of the UI).
> >>>
> >>> The GetFile/PutFile will run on all nodes (unless you schedule it to run
> >>> on primary node only).
> >>>
> >>> If you are attempting to have a single input running HTTP and then push
> >>> that out across the entire cluster to
> >>> process the data, you would have a few options. First, you could just use
> >>> an HTTP Load Balancer in front of NiFi.
> >>> The other option would be to have a ListenHTTP processor run on Primary
> >>> Node only and then use Site-to-Site
> >>> to distribute the data to other nodes.
> >>>
> >>> For more info on site-to-site, you can see the Site-to-Site section of
> >>> the User Guide at
> >>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site <http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site>
> >>>
> >>> If you have any more questions, let us know!
> >>>
> >>> Thanks
> >>> -Mark
> >>>
> >>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
> >>> <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
> >>>
> >>> Nifi Team – I would like to understand the advantages of Nifi clustering
> >>> setup.
> >>>
> >>> Questions :
> >>>
> >>>  - How does workflow work on multiple nodes ? Does it share the resources
> >>> intra nodes ?
> >>> Lets say I need to pull data 10 1Gig files from S3, how does work load
> >>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?
> >>>
> >>>  - How to “isolate” the processor to the master node (or one node)?
> >>>
> >>> - Getfile/Putfile processors on cluster setup, does it get/put on primary
> >>> node ? How do I force processor to look in one of the slave node?
> >>>
> >>> - How can we have a workflow where the input side we want to receive
> >>> requests (http) and then the rest of the pipeline need to run in parallel on
> >>> all the nodes ?
> >>>
> >>> Thanks,
> >>> -Chakro
> >>>
> >>> ________________________________
> >>> The information contained in this transmission may contain privileged and
> >>> confidential information. It is intended only for the use of the person(s)
> >>> named above. If you are not the intended recipient, you are hereby notified
> >>> that any review, dissemination, distribution or duplication of this
> >>> communication is strictly prohibited. If you are not the intended recipient,
> >>> please contact the sender by reply email and destroy all copies of the
> >>> original message.
> >>> ________________________________
> >>>
> >>>
> >>> ________________________________
> >>> The information contained in this transmission may contain privileged and
> >>> confidential information. It is intended only for the use of the person(s)
> >>> named above. If you are not the intended recipient, you are hereby notified
> >>> that any review, dissemination, distribution or duplication of this
> >>> communication is strictly prohibited. If you are not the intended recipient,
> >>> please contact the sender by reply email and destroy all copies of the
> >>> original message.
> >>> ________________________________
> >>>
> >>>
> >>> ________________________________
> >>> The information contained in this transmission may contain privileged and
> >>> confidential information. It is intended only for the use of the person(s)
> >>> named above. If you are not the intended recipient, you are hereby notified
> >>> that any review, dissemination, distribution or duplication of this
> >>> communication is strictly prohibited. If you are not the intended recipient,
> >>> please contact the sender by reply email and destroy all copies of the
> >>> original message.
> >>> ________________________________
> >>
> >>
> >> ________________________________
> >> The information contained in this transmission may contain privileged and
> >> confidential information. It is intended only for the use of the person(s)
> >> named above. If you are not the intended recipient, you are hereby notified
> >> that any review, dissemination, distribution or duplication of this
> >> communication is strictly prohibited. If you are not the intended recipient,
> >> please contact the sender by reply email and destroy all copies of the
> >> original message.
> >> ________________________________
> >
> >
> >
> > --
> > Sent from Gmail Mobile
> > ________________________________
> > The information contained in this transmission may contain privileged and
> > confidential information. It is intended only for the use of the person(s)
> > named above. If you are not the intended recipient, you are hereby notified
> > that any review, dissemination, distribution or duplication of this
> > communication is strictly prohibited. If you are not the intended recipient,
> > please contact the sender by reply email and destroy all copies of the
> > original message.
> > ________________________________
> 
> The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.


Re: Nifi cluster features - Questions

Posted by Chakrader Dewaragatla <Ch...@lifelock.com>.
Bryan – Here are the logs :
I have 5 sec flow file.

On primary node (No data coming in)


2016-01-10 22:52:36,322 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:36,146 and sent at 2016-01-10 22:52:36,322; send took 0 millis

2016-01-10 22:52:36,476 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false

2016-01-10 22:52:39,450 INFO [pool-26-thread-16] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run with 1 threads

2016-01-10 22:52:39,459 INFO [Timer-Driven Process Thread-7] o.a.n.r.c.socket.EndpointConnectionPool New Weighted Distribution of Nodes:

Node[i-c894e249.dev.aws.lifelock.ad:0] will receive 100.0% of data

2016-01-10 22:52:39,480 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false

2016-01-10 22:52:39,576 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:39,452 and sent at 2016-01-10 22:52:39,576; send took 1 millis

2016-01-10 22:52:39,662 INFO [Timer-Driven Process Thread-7] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=f6ff266d-e03f-4a8e-af5a-1455dd433ff4,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=1980, length=20],offset=0,name=275238507698589,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec

2016-01-10 22:52:41,327 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:41,147 and sent at 2016-01-10 22:52:41,327; send took 0 millis

2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-1] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=effbc026-98d2-4548-9069-f95d57c8bf4b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2000, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 51 milliseconds at a rate of 391 bytes/sec

2016-01-10 22:52:45,092 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd from 10.228.68.73

2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.

2016-01-10 22:52:45,094 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 8ecc76f9-e978-4e9b-a8ed-41a47647d5bd (type=FLOW_REQUEST, length=331 bytes) in 61 millis

2016-01-10 22:52:46,391 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:46,148 and sent at 2016-01-10 22:52:46,391; send took 60 millis

2016-01-10 22:52:48,470 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 301

2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/295.prov in 111 milliseconds

2016-01-10 22:52:48,580 INFO [Provenance Repository Rollover Thread-2] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 8 records

2016-01-10 22:52:49,517 INFO [Timer-Driven Process Thread-10] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=nifi-input,target=http://10.228.68.73:8080/nifi] Successfully sent [StandardFlowFileRecord[uuid=505bef8e-15e6-4345-b909-cb3be21275bd,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452456659209-1, container=default, section=1], offset=2020, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) to nifi://i-c894e249.dev.aws.lifelock.ad:10880 in 50 milliseconds at a rate of 392 bytes/sec

2016-01-10 22:52:51,395 INFO [Clustering Tasks Thread-3] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:51,150 and sent at 2016-01-10 22:52:51,395; send took 0 millis

2016-01-10 22:52:54,326 INFO [NiFi Web Server-22] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling StandardRootGroupPort[name=nifi-input,id=392bfcc3-dfc2-4497-8148-8128336856fa] to run

2016-01-10 22:52:54,353 INFO [NiFi Web Server-26] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] to run

2016-01-10 22:52:54,377 INFO [NiFi Web Server-25] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling GenerateFlowFile[id=6efbcd69-0b82-4ea2-a90d-01b39efaf3db] to run

2016-01-10 22:52:54,397 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:54,379 and sent at 2016-01-10 22:52:54,397; send took 0 millis

2016-01-10 22:52:54,488 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5dff8cbf // Another save pending = false

2016-01-10 22:52:56,399 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:56,151 and sent at 2016-01-10 22:52:56,399; send took 0 millis


On Secondary node (Data coming in)


2016-01-10 22:52:43,896 INFO [pool-18-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 88 milliseconds

2016-01-10 22:52:44,524 INFO [Timer-Driven Process Thread-3] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec

2016-01-10 22:52:44,534 INFO [Timer-Driven Process Thread-1] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=614a656d-965b-4915-95f7-ee59e049ea20,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1960, length=20],offset=0,name=275243509297560,size=20] at location /root/putt/275243509297560

2016-01-10 22:52:44,671 INFO [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository Created new Provenance Event Writers for events starting with ID 17037

2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files (6 records) into single Provenance Log File ./provenance_repository/17031.prov in 56 milliseconds

2016-01-10 22:52:44,727 INFO [Provenance Repository Rollover Thread-1] o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance Event file containing 10 records

2016-01-10 22:52:45,034 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Received request e288a3eb-28fb-48cf-9f4b-bc36acb810bb from 10.228.68.73

2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.nifi.controller.StandardFlowService Received flow request message from manager.

2016-01-10 22:52:45,036 INFO [Process NCM Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request e288a3eb-28fb-48cf-9f4b-bc36acb810bb (type=FLOW_REQUEST, length=331 bytes) in 76 millis

2016-01-10 22:52:45,498 INFO [Clustering Tasks Thread-2] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:45,421 and sent at 2016-01-10 22:52:45,498; send took 0 millis

2016-01-10 22:52:49,518 INFO [Timer-Driven Process Thread-6] o.a.n.r.p.s.SocketFlowFileServerProtocol SocketFlowFileServerProtocol[CommsID=e3151c71-9c43-4179-a69d-bc1e1b94b573] Successfully received [StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20]] (20 bytes) from Peer[url=nifi://10.228.68.106:40611] in 51 milliseconds at a rate of 387 bytes/sec

2016-01-10 22:52:49,520 INFO [Timer-Driven Process Thread-8] o.a.nifi.processors.standard.PutFile PutFile[id=2a2c47e1-a4cf-4c32-ba17-d195af3c2a1b] Produced copy of StandardFlowFileRecord[uuid=a6986405-1f15-4233-a06f-1b9ce50c0e24,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1452457702480-1, container=default, section=1], offset=1980, length=20],offset=0,name=275248510432074,size=20] at location /root/putt/275248510432074

2016-01-10 22:52:50,561 INFO [Clustering Tasks Thread-1] org.apache.nifi.cluster.heartbeat Heartbeat created at 2016-01-10 22:52:50,423 and sent at 2016-01-10 22:52:50,561; send took 59 millis

From: Bryan Bende <bb...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Sunday, January 10, 2016 at 2:43 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

Glad you got site-to-site working.

Regarding the data distribution, I'm not sure why it is behaving that way. I just did a similar test running ncm, node1, and node2 all on my local machine, with GenerateFlowFile running every 10 seconds, and Input Port going to a LogAttribute, and I see it alternating between node1 and node2 logs every 10 seconds.

Is there anything in your primary node logs (primary_node/logs/nifi-app.log) when you see the data on the other node?

-Bryan


On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com>> wrote:
Chakri,

Would love to hear what you've learned and how that differed from the
docs themselves.  Site-to-site has proven difficult to setup so we're
clearly not there yet in having the right operator/admin experience.

Thanks
Joe

On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
<Ch...@lifelock.com>> wrote:
> I was able to get site-to-site work.
> I tried to follow your instructions to send data distribute across the
> nodes.
>
> GenerateFlowFile (On Primary) —> RPG
> RPG —> Input Port   —> Putfile (Time driven scheduling)
>
> However, data is only written to one slave (Secondary slave). Primary slave
> has not data.
>
> Image screenshot :
> http://tinyurl.com/jjvjtmq
>
> From: Chakrader Dewaragatla <ch...@lifelock.com>>
> Date: Sunday, January 10, 2016 at 11:26 AM
>
> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
>
> Bryan – Thanks – I am trying to setup site-to-site.
> I have two slaves and one NCM.
>
> My properties as follows :
>
> On both Slaves:
>
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> On NCM:
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> When I try drop remote process group (with http://<NCM IP>:8080/nifi), I see
> error as follows for two nodes.
>
> [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
> [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
>
> Do you have insight why its trying to connecting 8080 on slaves ? When do
> 10880 port come into the picture ? I remember try setting site to site few
> months back and succeeded.
>
> Thanks,
> -Chakri
>
>
>
> From: Bryan Bende <bb...@gmail.com>>
> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Date: Saturday, January 9, 2016 at 11:22 AM
> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
> Subject: Re: Nifi cluster features - Questions
>
> The sending node (where the remote process group is) will distribute the
> data evenly across the two nodes, so an individual file will only be sent to
> one of the nodes. You could think of it as if a separate NiFi instance was
> sending directly to a two node cluster, it would be evenly distributing the
> data across the two nodes. In this case it just so happens to all be with in
> the same cluster.
>
> The most common use case for this scenario is the List and Fetch processors
> like HDFS. You can perform the listing on primary node, and then distribute
> the results so the fetching takes place on all nodes.
>
> On Saturday, January 9, 2016, Chakrader Dewaragatla
> <Ch...@lifelock.com>> wrote:
>>
>> Bryan – Thanks, how do the nodes distribute the load for a input port. As
>> port is open and listening on two nodes,  does it copy same files on both
>> the nodes?
>> I need to try this setup to see the results, appreciate your help.
>>
>> Thanks,
>> -Chakri
>>
>> From: Bryan Bende <bb...@gmail.com>>
>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>> Date: Friday, January 8, 2016 at 3:44 PM
>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Hi Chakri,
>>
>> I believe the DistributeLoad processor is more for load balancing when
>> sending to downstream systems. For example, if you had two HTTP endpoints,
>> you could have the first relationship from DistributeLoad going to a
>> PostHTTP that posts to endpoint #1, and the second relationship going to a
>> second PostHTTP that goes to endpoint #2.
>>
>> If you want to distribute the data with in the cluster, then you need to
>> use site-to-site. The way you do this is the following...
>>
>> - Add an Input Port connected to your PutFile.
>> - Add GenerateFlowFile scheduled on primary node only, connected to a
>> Remote Process Group. The Remote Process Group should be connected to the
>> Input Port from the previous step.
>>
>> So both nodes have an input port listening for data, but only the primary
>> node produces a FlowFile and sends it to the RPG which then re-distributes
>> it back to one of the Input Ports.
>>
>> In order for this to work you need to set nifi.remote.input.socket.port in
>> nifi.properties to some available port, and you probably want
>> nifi.remote.input.secure=false for testing.
>>
>> -Bryan
>>
>>
>> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>> <Ch...@lifelock.com>> wrote:
>>>
>>> Mark – I have setup a two node cluster and tried the following .
>>>  GenrateFlowfile processor (Run only on primary node) —> DistributionLoad
>>> processor (RoundRobin)   —> PutFile
>>>
>>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
>>> >> run on primary node only).
>>> From your above comment, It should put file on two nodes. It put files on
>>> primary node only. Any thoughts ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>>
>>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Date: Wednesday, October 7, 2015 at 11:28 AM
>>>
>>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Chakri,
>>>
>>> Correct - when NiFi instances are clustered, they do not transfer data
>>> between the nodes. This is very different
>>> than you might expect from something like Storm or Spark, as the key
>>> goals and design are quite different.
>>> We have discussed providing the ability to allow the user to indicate
>>> that they want to have the framework
>>> do load balancing for specific connections in the background, but it's
>>> still in more of a discussion phase.
>>>
>>> Site-to-Site is simply the capability that we have developed to transfer
>>> data between one instance of
>>> NiFi and another instance of NiFi. So currently, if we want to do load
>>> balancing across the cluster, we would
>>> create a site-to-site connection (by dragging a Remote Process Group onto
>>> the graph) and give that
>>> site-to-site connection the URL of our cluster. That way, you can push
>>> data to your own cluster, effectively
>>> providing a load balancing capability.
>>>
>>> If you were to just run ListenHTTP without setting it to Primary Node,
>>> then every node in the cluster will be listening
>>> for incoming HTTP connections. So you could then use a simple load
>>> balancer in front of NiFi to distribute the load
>>> across your cluster.
>>>
>>> Does this help? If you have any more questions we're happy to help!
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com>> wrote:
>>>
>>> Mark - Thanks for the notes.
>>>
>>> >> The other option would be to have a ListenHTTP processor run on
>>> >> Primary Node only and then use Site-to-Site to distribute the data to other
>>> >> nodes.
>>> Lets say I have 5 node cluster and ListenHTTP processor on Primary node,
>>> collected data on primary node is not transfered to other nodes by default
>>> for processing despite all nodes are part of one cluster?
>>> If ListenHTTP processor is running  as a dafult (with out explicit
>>> setting to run on primary node), how does the data transferred to rest of
>>> the nodes? Does site-to-site come in play when I make one processor to run
>>> on primary node ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>>
>>> Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Date: Wednesday, October 7, 2015 at 7:00 AM
>>> To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Hello Chakro,
>>>
>>> When you create a cluster of NiFi instances, each node in the cluster is
>>> acting independently and in exactly
>>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the
>>> same flow. However, they will be
>>> pulling in different data and therefore operating on different data.
>>>
>>> So if you pull in 10 1-gig files from S3, each of those files will be
>>> processed on the node that pulled the data
>>> in. NiFi does not currently shuffle data around between nodes in the
>>> cluster (you can use site-to-site to do
>>> this if you want to, but it won't happen automatically). If you set the
>>> number of Concurrent Tasks to 5, then
>>> you will have up to 5 threads running for that processor on each node.
>>>
>>> The only exception to this is the Primary Node. You can schedule a
>>> Processor to run only on the Primary Node
>>> by right-clicking on the Processor, and going to the Configure menu. In
>>> the Scheduling tab, you can change
>>> the Scheduling Strategy to Primary Node Only. In this case, that
>>> Processor will only be triggered to run on
>>> whichever node is elected the Primary Node (this can be changed in the
>>> Cluster management screen by clicking
>>> the appropriate icon in the top-right corner of the UI).
>>>
>>> The GetFile/PutFile will run on all nodes (unless you schedule it to run
>>> on primary node only).
>>>
>>> If you are attempting to have a single input running HTTP and then push
>>> that out across the entire cluster to
>>> process the data, you would have a few options. First, you could just use
>>> an HTTP Load Balancer in front of NiFi.
>>> The other option would be to have a ListenHTTP processor run on Primary
>>> Node only and then use Site-to-Site
>>> to distribute the data to other nodes.
>>>
>>> For more info on site-to-site, you can see the Site-to-Site section of
>>> the User Guide at
>>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>>>
>>> If you have any more questions, let us know!
>>>
>>> Thanks
>>> -Mark
>>>
>>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com>> wrote:
>>>
>>> Nifi Team – I would like to understand the advantages of Nifi clustering
>>> setup.
>>>
>>> Questions :
>>>
>>>  - How does workflow work on multiple nodes ? Does it share the resources
>>> intra nodes ?
>>> Lets say I need to pull data 10 1Gig files from S3, how does work load
>>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?
>>>
>>>  - How to “isolate” the processor to the master node (or one node)?
>>>
>>> - Getfile/Putfile processors on cluster setup, does it get/put on primary
>>> node ? How do I force processor to look in one of the slave node?
>>>
>>> - How can we have a workflow where the input side we want to receive
>>> requests (http) and then the rest of the pipeline need to run in parallel on
>>> all the nodes ?
>>>
>>> Thanks,
>>> -Chakro
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>
>>
>> ________________________________
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended recipient,
>> please contact the sender by reply email and destroy all copies of the
>> original message.
>> ________________________________
>
>
>
> --
> Sent from Gmail Mobile
> ________________________________
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of the
> original message.
> ________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Nifi cluster features - Questions

Posted by Bryan Bende <bb...@gmail.com>.
Chakri,

Glad you got site-to-site working.

Regarding the data distribution, I'm not sure why it is behaving that way.
I just did a similar test running ncm, node1, and node2 all on my local
machine, with GenerateFlowFile running every 10 seconds, and Input Port
going to a LogAttribute, and I see it alternating between node1 and node2
logs every 10 seconds.

Is there anything in your primary node logs
(primary_node/logs/nifi-app.log) when you see the data on the other node?

-Bryan


On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <jo...@gmail.com> wrote:

> Chakri,
>
> Would love to hear what you've learned and how that differed from the
> docs themselves.  Site-to-site has proven difficult to setup so we're
> clearly not there yet in having the right operator/admin experience.
>
> Thanks
> Joe
>
> On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
> <Ch...@lifelock.com> wrote:
> > I was able to get site-to-site work.
> > I tried to follow your instructions to send data distribute across the
> > nodes.
> >
> > GenerateFlowFile (On Primary) —> RPG
> > RPG —> Input Port   —> Putfile (Time driven scheduling)
> >
> > However, data is only written to one slave (Secondary slave). Primary
> slave
> > has not data.
> >
> > Image screenshot :
> > http://tinyurl.com/jjvjtmq
> >
> > From: Chakrader Dewaragatla <ch...@lifelock.com>
> > Date: Sunday, January 10, 2016 at 11:26 AM
> >
> > To: "users@nifi.apache.org" <us...@nifi.apache.org>
> > Subject: Re: Nifi cluster features - Questions
> >
> > Bryan – Thanks – I am trying to setup site-to-site.
> > I have two slaves and one NCM.
> >
> > My properties as follows :
> >
> > On both Slaves:
> >
> > nifi.remote.input.socket.port=10880
> > nifi.remote.input.secure=false
> >
> > On NCM:
> > nifi.remote.input.socket.port=10880
> > nifi.remote.input.secure=false
> >
> > When I try drop remote process group (with http://<NCM IP>:8080/nifi),
> I see
> > error as follows for two nodes.
> >
> > [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
> > communication
> > [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
> > communication
> >
> > Do you have insight why its trying to connecting 8080 on slaves ? When do
> > 10880 port come into the picture ? I remember try setting site to site
> few
> > months back and succeeded.
> >
> > Thanks,
> > -Chakri
> >
> >
> >
> > From: Bryan Bende <bb...@gmail.com>
> > Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> > Date: Saturday, January 9, 2016 at 11:22 AM
> > To: "users@nifi.apache.org" <us...@nifi.apache.org>
> > Subject: Re: Nifi cluster features - Questions
> >
> > The sending node (where the remote process group is) will distribute the
> > data evenly across the two nodes, so an individual file will only be
> sent to
> > one of the nodes. You could think of it as if a separate NiFi instance
> was
> > sending directly to a two node cluster, it would be evenly distributing
> the
> > data across the two nodes. In this case it just so happens to all be
> with in
> > the same cluster.
> >
> > The most common use case for this scenario is the List and Fetch
> processors
> > like HDFS. You can perform the listing on primary node, and then
> distribute
> > the results so the fetching takes place on all nodes.
> >
> > On Saturday, January 9, 2016, Chakrader Dewaragatla
> > <Ch...@lifelock.com> wrote:
> >>
> >> Bryan – Thanks, how do the nodes distribute the load for a input port.
> As
> >> port is open and listening on two nodes,  does it copy same files on
> both
> >> the nodes?
> >> I need to try this setup to see the results, appreciate your help.
> >>
> >> Thanks,
> >> -Chakri
> >>
> >> From: Bryan Bende <bb...@gmail.com>
> >> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> >> Date: Friday, January 8, 2016 at 3:44 PM
> >> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> >> Subject: Re: Nifi cluster features - Questions
> >>
> >> Hi Chakri,
> >>
> >> I believe the DistributeLoad processor is more for load balancing when
> >> sending to downstream systems. For example, if you had two HTTP
> endpoints,
> >> you could have the first relationship from DistributeLoad going to a
> >> PostHTTP that posts to endpoint #1, and the second relationship going
> to a
> >> second PostHTTP that goes to endpoint #2.
> >>
> >> If you want to distribute the data with in the cluster, then you need to
> >> use site-to-site. The way you do this is the following...
> >>
> >> - Add an Input Port connected to your PutFile.
> >> - Add GenerateFlowFile scheduled on primary node only, connected to a
> >> Remote Process Group. The Remote Process Group should be connected to
> the
> >> Input Port from the previous step.
> >>
> >> So both nodes have an input port listening for data, but only the
> primary
> >> node produces a FlowFile and sends it to the RPG which then
> re-distributes
> >> it back to one of the Input Ports.
> >>
> >> In order for this to work you need to set nifi.remote.input.socket.port
> in
> >> nifi.properties to some available port, and you probably want
> >> nifi.remote.input.secure=false for testing.
> >>
> >> -Bryan
> >>
> >>
> >> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
> >> <Ch...@lifelock.com> wrote:
> >>>
> >>> Mark – I have setup a two node cluster and tried the following .
> >>>  GenrateFlowfile processor (Run only on primary node) —>
> DistributionLoad
> >>> processor (RoundRobin)   —> PutFile
> >>>
> >>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
> >>> >> run on primary node only).
> >>> From your above comment, It should put file on two nodes. It put files
> on
> >>> primary node only. Any thoughts ?
> >>>
> >>> Thanks,
> >>> -Chakri
> >>>
> >>> From: Mark Payne <ma...@hotmail.com>
> >>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> >>> Date: Wednesday, October 7, 2015 at 11:28 AM
> >>>
> >>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> >>> Subject: Re: Nifi cluster features - Questions
> >>>
> >>> Chakri,
> >>>
> >>> Correct - when NiFi instances are clustered, they do not transfer data
> >>> between the nodes. This is very different
> >>> than you might expect from something like Storm or Spark, as the key
> >>> goals and design are quite different.
> >>> We have discussed providing the ability to allow the user to indicate
> >>> that they want to have the framework
> >>> do load balancing for specific connections in the background, but it's
> >>> still in more of a discussion phase.
> >>>
> >>> Site-to-Site is simply the capability that we have developed to
> transfer
> >>> data between one instance of
> >>> NiFi and another instance of NiFi. So currently, if we want to do load
> >>> balancing across the cluster, we would
> >>> create a site-to-site connection (by dragging a Remote Process Group
> onto
> >>> the graph) and give that
> >>> site-to-site connection the URL of our cluster. That way, you can push
> >>> data to your own cluster, effectively
> >>> providing a load balancing capability.
> >>>
> >>> If you were to just run ListenHTTP without setting it to Primary Node,
> >>> then every node in the cluster will be listening
> >>> for incoming HTTP connections. So you could then use a simple load
> >>> balancer in front of NiFi to distribute the load
> >>> across your cluster.
> >>>
> >>> Does this help? If you have any more questions we're happy to help!
> >>>
> >>> Thanks
> >>> -Mark
> >>>
> >>>
> >>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
> >>> <Ch...@lifelock.com> wrote:
> >>>
> >>> Mark - Thanks for the notes.
> >>>
> >>> >> The other option would be to have a ListenHTTP processor run on
> >>> >> Primary Node only and then use Site-to-Site to distribute the data
> to other
> >>> >> nodes.
> >>> Lets say I have 5 node cluster and ListenHTTP processor on Primary
> node,
> >>> collected data on primary node is not transfered to other nodes by
> default
> >>> for processing despite all nodes are part of one cluster?
> >>> If ListenHTTP processor is running  as a dafult (with out explicit
> >>> setting to run on primary node), how does the data transferred to rest
> of
> >>> the nodes? Does site-to-site come in play when I make one processor to
> run
> >>> on primary node ?
> >>>
> >>> Thanks,
> >>> -Chakri
> >>>
> >>> From: Mark Payne <ma...@hotmail.com>
> >>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> >>> Date: Wednesday, October 7, 2015 at 7:00 AM
> >>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> >>> Subject: Re: Nifi cluster features - Questions
> >>>
> >>> Hello Chakro,
> >>>
> >>> When you create a cluster of NiFi instances, each node in the cluster
> is
> >>> acting independently and in exactly
> >>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly
> the
> >>> same flow. However, they will be
> >>> pulling in different data and therefore operating on different data.
> >>>
> >>> So if you pull in 10 1-gig files from S3, each of those files will be
> >>> processed on the node that pulled the data
> >>> in. NiFi does not currently shuffle data around between nodes in the
> >>> cluster (you can use site-to-site to do
> >>> this if you want to, but it won't happen automatically). If you set the
> >>> number of Concurrent Tasks to 5, then
> >>> you will have up to 5 threads running for that processor on each node.
> >>>
> >>> The only exception to this is the Primary Node. You can schedule a
> >>> Processor to run only on the Primary Node
> >>> by right-clicking on the Processor, and going to the Configure menu. In
> >>> the Scheduling tab, you can change
> >>> the Scheduling Strategy to Primary Node Only. In this case, that
> >>> Processor will only be triggered to run on
> >>> whichever node is elected the Primary Node (this can be changed in the
> >>> Cluster management screen by clicking
> >>> the appropriate icon in the top-right corner of the UI).
> >>>
> >>> The GetFile/PutFile will run on all nodes (unless you schedule it to
> run
> >>> on primary node only).
> >>>
> >>> If you are attempting to have a single input running HTTP and then push
> >>> that out across the entire cluster to
> >>> process the data, you would have a few options. First, you could just
> use
> >>> an HTTP Load Balancer in front of NiFi.
> >>> The other option would be to have a ListenHTTP processor run on Primary
> >>> Node only and then use Site-to-Site
> >>> to distribute the data to other nodes.
> >>>
> >>> For more info on site-to-site, you can see the Site-to-Site section of
> >>> the User Guide at
> >>>
> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
> >>>
> >>> If you have any more questions, let us know!
> >>>
> >>> Thanks
> >>> -Mark
> >>>
> >>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
> >>> <Ch...@lifelock.com> wrote:
> >>>
> >>> Nifi Team – I would like to understand the advantages of Nifi
> clustering
> >>> setup.
> >>>
> >>> Questions :
> >>>
> >>>  - How does workflow work on multiple nodes ? Does it share the
> resources
> >>> intra nodes ?
> >>> Lets say I need to pull data 10 1Gig files from S3, how does work load
> >>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per
> node ?
> >>>
> >>>  - How to “isolate” the processor to the master node (or one node)?
> >>>
> >>> - Getfile/Putfile processors on cluster setup, does it get/put on
> primary
> >>> node ? How do I force processor to look in one of the slave node?
> >>>
> >>> - How can we have a workflow where the input side we want to receive
> >>> requests (http) and then the rest of the pipeline need to run in
> parallel on
> >>> all the nodes ?
> >>>
> >>> Thanks,
> >>> -Chakro
> >>>
> >>> ________________________________
> >>> The information contained in this transmission may contain privileged
> and
> >>> confidential information. It is intended only for the use of the
> person(s)
> >>> named above. If you are not the intended recipient, you are hereby
> notified
> >>> that any review, dissemination, distribution or duplication of this
> >>> communication is strictly prohibited. If you are not the intended
> recipient,
> >>> please contact the sender by reply email and destroy all copies of the
> >>> original message.
> >>> ________________________________
> >>>
> >>>
> >>> ________________________________
> >>> The information contained in this transmission may contain privileged
> and
> >>> confidential information. It is intended only for the use of the
> person(s)
> >>> named above. If you are not the intended recipient, you are hereby
> notified
> >>> that any review, dissemination, distribution or duplication of this
> >>> communication is strictly prohibited. If you are not the intended
> recipient,
> >>> please contact the sender by reply email and destroy all copies of the
> >>> original message.
> >>> ________________________________
> >>>
> >>>
> >>> ________________________________
> >>> The information contained in this transmission may contain privileged
> and
> >>> confidential information. It is intended only for the use of the
> person(s)
> >>> named above. If you are not the intended recipient, you are hereby
> notified
> >>> that any review, dissemination, distribution or duplication of this
> >>> communication is strictly prohibited. If you are not the intended
> recipient,
> >>> please contact the sender by reply email and destroy all copies of the
> >>> original message.
> >>> ________________________________
> >>
> >>
> >> ________________________________
> >> The information contained in this transmission may contain privileged
> and
> >> confidential information. It is intended only for the use of the
> person(s)
> >> named above. If you are not the intended recipient, you are hereby
> notified
> >> that any review, dissemination, distribution or duplication of this
> >> communication is strictly prohibited. If you are not the intended
> recipient,
> >> please contact the sender by reply email and destroy all copies of the
> >> original message.
> >> ________________________________
> >
> >
> >
> > --
> > Sent from Gmail Mobile
> > ________________________________
> > The information contained in this transmission may contain privileged and
> > confidential information. It is intended only for the use of the
> person(s)
> > named above. If you are not the intended recipient, you are hereby
> notified
> > that any review, dissemination, distribution or duplication of this
> > communication is strictly prohibited. If you are not the intended
> recipient,
> > please contact the sender by reply email and destroy all copies of the
> > original message.
> > ________________________________
>

Re: Nifi cluster features - Questions

Posted by Joe Witt <jo...@gmail.com>.
Chakri,

Would love to hear what you've learned and how that differed from the
docs themselves.  Site-to-site has proven difficult to setup so we're
clearly not there yet in having the right operator/admin experience.

Thanks
Joe

On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla
<Ch...@lifelock.com> wrote:
> I was able to get site-to-site work.
> I tried to follow your instructions to send data distribute across the
> nodes.
>
> GenerateFlowFile (On Primary) —> RPG
> RPG —> Input Port   —> Putfile (Time driven scheduling)
>
> However, data is only written to one slave (Secondary slave). Primary slave
> has not data.
>
> Image screenshot :
> http://tinyurl.com/jjvjtmq
>
> From: Chakrader Dewaragatla <ch...@lifelock.com>
> Date: Sunday, January 10, 2016 at 11:26 AM
>
> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Subject: Re: Nifi cluster features - Questions
>
> Bryan – Thanks – I am trying to setup site-to-site.
> I have two slaves and one NCM.
>
> My properties as follows :
>
> On both Slaves:
>
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> On NCM:
> nifi.remote.input.socket.port=10880
> nifi.remote.input.secure=false
>
> When I try drop remote process group (with http://<NCM IP>:8080/nifi), I see
> error as follows for two nodes.
>
> [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
> [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site
> communication
>
> Do you have insight why its trying to connecting 8080 on slaves ? When do
> 10880 port come into the picture ? I remember try setting site to site few
> months back and succeeded.
>
> Thanks,
> -Chakri
>
>
>
> From: Bryan Bende <bb...@gmail.com>
> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Date: Saturday, January 9, 2016 at 11:22 AM
> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Subject: Re: Nifi cluster features - Questions
>
> The sending node (where the remote process group is) will distribute the
> data evenly across the two nodes, so an individual file will only be sent to
> one of the nodes. You could think of it as if a separate NiFi instance was
> sending directly to a two node cluster, it would be evenly distributing the
> data across the two nodes. In this case it just so happens to all be with in
> the same cluster.
>
> The most common use case for this scenario is the List and Fetch processors
> like HDFS. You can perform the listing on primary node, and then distribute
> the results so the fetching takes place on all nodes.
>
> On Saturday, January 9, 2016, Chakrader Dewaragatla
> <Ch...@lifelock.com> wrote:
>>
>> Bryan – Thanks, how do the nodes distribute the load for a input port. As
>> port is open and listening on two nodes,  does it copy same files on both
>> the nodes?
>> I need to try this setup to see the results, appreciate your help.
>>
>> Thanks,
>> -Chakri
>>
>> From: Bryan Bende <bb...@gmail.com>
>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> Date: Friday, January 8, 2016 at 3:44 PM
>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Hi Chakri,
>>
>> I believe the DistributeLoad processor is more for load balancing when
>> sending to downstream systems. For example, if you had two HTTP endpoints,
>> you could have the first relationship from DistributeLoad going to a
>> PostHTTP that posts to endpoint #1, and the second relationship going to a
>> second PostHTTP that goes to endpoint #2.
>>
>> If you want to distribute the data with in the cluster, then you need to
>> use site-to-site. The way you do this is the following...
>>
>> - Add an Input Port connected to your PutFile.
>> - Add GenerateFlowFile scheduled on primary node only, connected to a
>> Remote Process Group. The Remote Process Group should be connected to the
>> Input Port from the previous step.
>>
>> So both nodes have an input port listening for data, but only the primary
>> node produces a FlowFile and sends it to the RPG which then re-distributes
>> it back to one of the Input Ports.
>>
>> In order for this to work you need to set nifi.remote.input.socket.port in
>> nifi.properties to some available port, and you probably want
>> nifi.remote.input.secure=false for testing.
>>
>> -Bryan
>>
>>
>> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla
>> <Ch...@lifelock.com> wrote:
>>>
>>> Mark – I have setup a two node cluster and tried the following .
>>>  GenrateFlowfile processor (Run only on primary node) —> DistributionLoad
>>> processor (RoundRobin)   —> PutFile
>>>
>>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
>>> >> run on primary node only).
>>> From your above comment, It should put file on two nodes. It put files on
>>> primary node only. Any thoughts ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>
>>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> Date: Wednesday, October 7, 2015 at 11:28 AM
>>>
>>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Chakri,
>>>
>>> Correct - when NiFi instances are clustered, they do not transfer data
>>> between the nodes. This is very different
>>> than you might expect from something like Storm or Spark, as the key
>>> goals and design are quite different.
>>> We have discussed providing the ability to allow the user to indicate
>>> that they want to have the framework
>>> do load balancing for specific connections in the background, but it's
>>> still in more of a discussion phase.
>>>
>>> Site-to-Site is simply the capability that we have developed to transfer
>>> data between one instance of
>>> NiFi and another instance of NiFi. So currently, if we want to do load
>>> balancing across the cluster, we would
>>> create a site-to-site connection (by dragging a Remote Process Group onto
>>> the graph) and give that
>>> site-to-site connection the URL of our cluster. That way, you can push
>>> data to your own cluster, effectively
>>> providing a load balancing capability.
>>>
>>> If you were to just run ListenHTTP without setting it to Primary Node,
>>> then every node in the cluster will be listening
>>> for incoming HTTP connections. So you could then use a simple load
>>> balancer in front of NiFi to distribute the load
>>> across your cluster.
>>>
>>> Does this help? If you have any more questions we're happy to help!
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com> wrote:
>>>
>>> Mark - Thanks for the notes.
>>>
>>> >> The other option would be to have a ListenHTTP processor run on
>>> >> Primary Node only and then use Site-to-Site to distribute the data to other
>>> >> nodes.
>>> Lets say I have 5 node cluster and ListenHTTP processor on Primary node,
>>> collected data on primary node is not transfered to other nodes by default
>>> for processing despite all nodes are part of one cluster?
>>> If ListenHTTP processor is running  as a dafult (with out explicit
>>> setting to run on primary node), how does the data transferred to rest of
>>> the nodes? Does site-to-site come in play when I make one processor to run
>>> on primary node ?
>>>
>>> Thanks,
>>> -Chakri
>>>
>>> From: Mark Payne <ma...@hotmail.com>
>>> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> Date: Wednesday, October 7, 2015 at 7:00 AM
>>> To: "users@nifi.apache.org" <us...@nifi.apache.org>
>>> Subject: Re: Nifi cluster features - Questions
>>>
>>> Hello Chakro,
>>>
>>> When you create a cluster of NiFi instances, each node in the cluster is
>>> acting independently and in exactly
>>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the
>>> same flow. However, they will be
>>> pulling in different data and therefore operating on different data.
>>>
>>> So if you pull in 10 1-gig files from S3, each of those files will be
>>> processed on the node that pulled the data
>>> in. NiFi does not currently shuffle data around between nodes in the
>>> cluster (you can use site-to-site to do
>>> this if you want to, but it won't happen automatically). If you set the
>>> number of Concurrent Tasks to 5, then
>>> you will have up to 5 threads running for that processor on each node.
>>>
>>> The only exception to this is the Primary Node. You can schedule a
>>> Processor to run only on the Primary Node
>>> by right-clicking on the Processor, and going to the Configure menu. In
>>> the Scheduling tab, you can change
>>> the Scheduling Strategy to Primary Node Only. In this case, that
>>> Processor will only be triggered to run on
>>> whichever node is elected the Primary Node (this can be changed in the
>>> Cluster management screen by clicking
>>> the appropriate icon in the top-right corner of the UI).
>>>
>>> The GetFile/PutFile will run on all nodes (unless you schedule it to run
>>> on primary node only).
>>>
>>> If you are attempting to have a single input running HTTP and then push
>>> that out across the entire cluster to
>>> process the data, you would have a few options. First, you could just use
>>> an HTTP Load Balancer in front of NiFi.
>>> The other option would be to have a ListenHTTP processor run on Primary
>>> Node only and then use Site-to-Site
>>> to distribute the data to other nodes.
>>>
>>> For more info on site-to-site, you can see the Site-to-Site section of
>>> the User Guide at
>>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>>>
>>> If you have any more questions, let us know!
>>>
>>> Thanks
>>> -Mark
>>>
>>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla
>>> <Ch...@lifelock.com> wrote:
>>>
>>> Nifi Team – I would like to understand the advantages of Nifi clustering
>>> setup.
>>>
>>> Questions :
>>>
>>>  - How does workflow work on multiple nodes ? Does it share the resources
>>> intra nodes ?
>>> Lets say I need to pull data 10 1Gig files from S3, how does work load
>>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?
>>>
>>>  - How to “isolate” the processor to the master node (or one node)?
>>>
>>> - Getfile/Putfile processors on cluster setup, does it get/put on primary
>>> node ? How do I force processor to look in one of the slave node?
>>>
>>> - How can we have a workflow where the input side we want to receive
>>> requests (http) and then the rest of the pipeline need to run in parallel on
>>> all the nodes ?
>>>
>>> Thanks,
>>> -Chakro
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>>
>>>
>>> ________________________________
>>> The information contained in this transmission may contain privileged and
>>> confidential information. It is intended only for the use of the person(s)
>>> named above. If you are not the intended recipient, you are hereby notified
>>> that any review, dissemination, distribution or duplication of this
>>> communication is strictly prohibited. If you are not the intended recipient,
>>> please contact the sender by reply email and destroy all copies of the
>>> original message.
>>> ________________________________
>>
>>
>> ________________________________
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended recipient,
>> please contact the sender by reply email and destroy all copies of the
>> original message.
>> ________________________________
>
>
>
> --
> Sent from Gmail Mobile
> ________________________________
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of the
> original message.
> ________________________________

Re: Nifi cluster features - Questions

Posted by Chakrader Dewaragatla <Ch...@lifelock.com>.
I was able to get site-to-site work.
I tried to follow your instructions to send data distribute across the nodes.

GenerateFlowFile (On Primary) —> RPG
RPG —> Input Port   —> Putfile (Time driven scheduling)

However, data is only written to one slave (Secondary slave). Primary slave has not data.

Image screenshot :
http://tinyurl.com/jjvjtmq

From: Chakrader Dewaragatla <ch...@lifelock.com>>
Date: Sunday, January 10, 2016 at 11:26 AM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Bryan – Thanks – I am trying to setup site-to-site.
I have two slaves and one NCM.

My properties as follows :

On both Slaves:

nifi.remote.input.socket.port=10880
nifi.remote.input.secure=false

On NCM:
nifi.remote.input.socket.port=10880
nifi.remote.input.secure=false

When I try drop remote process group (with http://<NCM IP>:8080/nifi), I see error as follows for two nodes.

[<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site communication
[<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site communication

Do you have insight why its trying to connecting 8080 on slaves ? When do 10880 port come into the picture ? I remember try setting site to site few months back and succeeded.

Thanks,
-Chakri



From: Bryan Bende <bb...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Saturday, January 9, 2016 at 11:22 AM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

The sending node (where the remote process group is) will distribute the data evenly across the two nodes, so an individual file will only be sent to one of the nodes. You could think of it as if a separate NiFi instance was sending directly to a two node cluster, it would be evenly distributing the data across the two nodes. In this case it just so happens to all be with in the same cluster.

The most common use case for this scenario is the List and Fetch processors like HDFS. You can perform the listing on primary node, and then distribute the results so the fetching takes place on all nodes.

On Saturday, January 9, 2016, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:
Bryan – Thanks, how do the nodes distribute the load for a input port. As port is open and listening on two nodes,  does it copy same files on both the nodes?
I need to try this setup to see the results, appreciate your help.

Thanks,
-Chakri

From: Bryan Bende <bbende@gmail.com<javascript:_e(%7B%7D,'cvml','bbende@gmail.com');>>
Reply-To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Date: Friday, January 8, 2016 at 3:44 PM
To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Subject: Re: Nifi cluster features - Questions

Hi Chakri,

I believe the DistributeLoad processor is more for load balancing when sending to downstream systems. For example, if you had two HTTP endpoints,
you could have the first relationship from DistributeLoad going to a PostHTTP that posts to endpoint #1, and the second relationship going to a second PostHTTP that goes to endpoint #2.

If you want to distribute the data with in the cluster, then you need to use site-to-site. The way you do this is the following...

- Add an Input Port connected to your PutFile.
- Add GenerateFlowFile scheduled on primary node only, connected to a Remote Process Group. The Remote Process Group should be connected to the Input Port from the previous step.

So both nodes have an input port listening for data, but only the primary node produces a FlowFile and sends it to the RPG which then re-distributes it back to one of the Input Ports.

In order for this to work you need to set nifi.remote.input.socket.port in nifi.properties to some available port, and you probably want nifi.remote.input.secure=false for testing.

-Bryan


On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com<javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>> wrote:
Mark – I have setup a two node cluster and tried the following .
 GenrateFlowfile processor (Run only on primary node) —> DistributionLoad processor (RoundRobin)   —> PutFile

>> The GetFile/PutFile will run on all nodes (unless you schedule it to run on primary node only).
>From your above comment, It should put file on two nodes. It put files on primary node only. Any thoughts ?

Thanks,
-Chakri

From: Mark Payne <markap14@hotmail.com<javascript:_e(%7B%7D,'cvml','markap14@hotmail.com');>>
Reply-To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Date: Wednesday, October 7, 2015 at 11:28 AM

To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Subject: Re: Nifi cluster features - Questions

Chakri,

Correct - when NiFi instances are clustered, they do not transfer data between the nodes. This is very different
than you might expect from something like Storm or Spark, as the key goals and design are quite different.
We have discussed providing the ability to allow the user to indicate that they want to have the framework
do load balancing for specific connections in the background, but it's still in more of a discussion phase.

Site-to-Site is simply the capability that we have developed to transfer data between one instance of
NiFi and another instance of NiFi. So currently, if we want to do load balancing across the cluster, we would
create a site-to-site connection (by dragging a Remote Process Group onto the graph) and give that
site-to-site connection the URL of our cluster. That way, you can push data to your own cluster, effectively
providing a load balancing capability.

If you were to just run ListenHTTP without setting it to Primary Node, then every node in the cluster will be listening
for incoming HTTP connections. So you could then use a simple load balancer in front of NiFi to distribute the load
across your cluster.

Does this help? If you have any more questions we're happy to help!

Thanks
-Mark


On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com<javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>> wrote:

Mark - Thanks for the notes.

>> The other option would be to have a ListenHTTP processor run on Primary Node only and then use Site-to-Site to distribute the data to other nodes.
Lets say I have 5 node cluster and ListenHTTP processor on Primary node, collected data on primary node is not transfered to other nodes by default for processing despite all nodes are part of one cluster?
If ListenHTTP processor is running  as a dafult (with out explicit setting to run on primary node), how does the data transferred to rest of the nodes? Does site-to-site come in play when I make one processor to run on primary node ?

Thanks,
-Chakri

From: Mark Payne <markap14@hotmail.com<javascript:_e(%7B%7D,'cvml','markap14@hotmail.com');>>
Reply-To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Date: Wednesday, October 7, 2015 at 7:00 AM
To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Subject: Re: Nifi cluster features - Questions

Hello Chakro,

When you create a cluster of NiFi instances, each node in the cluster is acting independently and in exactly
the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the same flow. However, they will be
pulling in different data and therefore operating on different data.

So if you pull in 10 1-gig files from S3, each of those files will be processed on the node that pulled the data
in. NiFi does not currently shuffle data around between nodes in the cluster (you can use site-to-site to do
this if you want to, but it won't happen automatically). If you set the number of Concurrent Tasks to 5, then
you will have up to 5 threads running for that processor on each node.

The only exception to this is the Primary Node. You can schedule a Processor to run only on the Primary Node
by right-clicking on the Processor, and going to the Configure menu. In the Scheduling tab, you can change
the Scheduling Strategy to Primary Node Only. In this case, that Processor will only be triggered to run on
whichever node is elected the Primary Node (this can be changed in the Cluster management screen by clicking
the appropriate icon in the top-right corner of the UI).

The GetFile/PutFile will run on all nodes (unless you schedule it to run on primary node only).

If you are attempting to have a single input running HTTP and then push that out across the entire cluster to
process the data, you would have a few options. First, you could just use an HTTP Load Balancer in front of NiFi.
The other option would be to have a ListenHTTP processor run on Primary Node only and then use Site-to-Site
to distribute the data to other nodes.

For more info on site-to-site, you can see the Site-to-Site section of the User Guide at
http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site

If you have any more questions, let us know!

Thanks
-Mark

On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com<javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>> wrote:

Nifi Team – I would like to understand the advantages of Nifi clustering setup.

Questions :

 - How does workflow work on multiple nodes ? Does it share the resources intra nodes ?
Lets say I need to pull data 10 1Gig files from S3, how does work load distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?

 - How to “isolate” the processor to the master node (or one node)?

- Getfile/Putfile processors on cluster setup, does it get/put on primary node ? How do I force processor to look in one of the slave node?

- How can we have a workflow where the input side we want to receive requests (http) and then the rest of the pipeline need to run in parallel on all the nodes ?

Thanks,
-Chakro

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________


--
Sent from Gmail Mobile
________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Nifi cluster features - Questions

Posted by Chakrader Dewaragatla <Ch...@lifelock.com>.
Bryan – Thanks – I am trying to setup site-to-site.
I have two slaves and one NCM.

My properties as follows :

On both Slaves:

nifi.remote.input.socket.port=10880
nifi.remote.input.secure=false

On NCM:
nifi.remote.input.socket.port=10880
nifi.remote.input.secure=false

When I try drop remote process group (with http://<NCM IP>:8080/nifi), I see error as follows for two nodes.

[<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site communication
[<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site communication

Do you have insight why its trying to connecting 8080 on slaves ? When do 10880 port come into the picture ? I remember try setting site to site few months back and succeeded.

Thanks,
-Chakri



From: Bryan Bende <bb...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Saturday, January 9, 2016 at 11:22 AM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

The sending node (where the remote process group is) will distribute the data evenly across the two nodes, so an individual file will only be sent to one of the nodes. You could think of it as if a separate NiFi instance was sending directly to a two node cluster, it would be evenly distributing the data across the two nodes. In this case it just so happens to all be with in the same cluster.

The most common use case for this scenario is the List and Fetch processors like HDFS. You can perform the listing on primary node, and then distribute the results so the fetching takes place on all nodes.

On Saturday, January 9, 2016, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:
Bryan – Thanks, how do the nodes distribute the load for a input port. As port is open and listening on two nodes,  does it copy same files on both the nodes?
I need to try this setup to see the results, appreciate your help.

Thanks,
-Chakri

From: Bryan Bende <bbende@gmail.com<javascript:_e(%7B%7D,'cvml','bbende@gmail.com');>>
Reply-To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Date: Friday, January 8, 2016 at 3:44 PM
To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Subject: Re: Nifi cluster features - Questions

Hi Chakri,

I believe the DistributeLoad processor is more for load balancing when sending to downstream systems. For example, if you had two HTTP endpoints,
you could have the first relationship from DistributeLoad going to a PostHTTP that posts to endpoint #1, and the second relationship going to a second PostHTTP that goes to endpoint #2.

If you want to distribute the data with in the cluster, then you need to use site-to-site. The way you do this is the following...

- Add an Input Port connected to your PutFile.
- Add GenerateFlowFile scheduled on primary node only, connected to a Remote Process Group. The Remote Process Group should be connected to the Input Port from the previous step.

So both nodes have an input port listening for data, but only the primary node produces a FlowFile and sends it to the RPG which then re-distributes it back to one of the Input Ports.

In order for this to work you need to set nifi.remote.input.socket.port in nifi.properties to some available port, and you probably want nifi.remote.input.secure=false for testing.

-Bryan


On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com<javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>> wrote:
Mark – I have setup a two node cluster and tried the following .
 GenrateFlowfile processor (Run only on primary node) —> DistributionLoad processor (RoundRobin)   —> PutFile

>> The GetFile/PutFile will run on all nodes (unless you schedule it to run on primary node only).
>From your above comment, It should put file on two nodes. It put files on primary node only. Any thoughts ?

Thanks,
-Chakri

From: Mark Payne <markap14@hotmail.com<javascript:_e(%7B%7D,'cvml','markap14@hotmail.com');>>
Reply-To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Date: Wednesday, October 7, 2015 at 11:28 AM

To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Subject: Re: Nifi cluster features - Questions

Chakri,

Correct - when NiFi instances are clustered, they do not transfer data between the nodes. This is very different
than you might expect from something like Storm or Spark, as the key goals and design are quite different.
We have discussed providing the ability to allow the user to indicate that they want to have the framework
do load balancing for specific connections in the background, but it's still in more of a discussion phase.

Site-to-Site is simply the capability that we have developed to transfer data between one instance of
NiFi and another instance of NiFi. So currently, if we want to do load balancing across the cluster, we would
create a site-to-site connection (by dragging a Remote Process Group onto the graph) and give that
site-to-site connection the URL of our cluster. That way, you can push data to your own cluster, effectively
providing a load balancing capability.

If you were to just run ListenHTTP without setting it to Primary Node, then every node in the cluster will be listening
for incoming HTTP connections. So you could then use a simple load balancer in front of NiFi to distribute the load
across your cluster.

Does this help? If you have any more questions we're happy to help!

Thanks
-Mark


On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com<javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>> wrote:

Mark - Thanks for the notes.

>> The other option would be to have a ListenHTTP processor run on Primary Node only and then use Site-to-Site to distribute the data to other nodes.
Lets say I have 5 node cluster and ListenHTTP processor on Primary node, collected data on primary node is not transfered to other nodes by default for processing despite all nodes are part of one cluster?
If ListenHTTP processor is running  as a dafult (with out explicit setting to run on primary node), how does the data transferred to rest of the nodes? Does site-to-site come in play when I make one processor to run on primary node ?

Thanks,
-Chakri

From: Mark Payne <markap14@hotmail.com<javascript:_e(%7B%7D,'cvml','markap14@hotmail.com');>>
Reply-To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Date: Wednesday, October 7, 2015 at 7:00 AM
To: "users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <users@nifi.apache.org<javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
Subject: Re: Nifi cluster features - Questions

Hello Chakro,

When you create a cluster of NiFi instances, each node in the cluster is acting independently and in exactly
the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the same flow. However, they will be
pulling in different data and therefore operating on different data.

So if you pull in 10 1-gig files from S3, each of those files will be processed on the node that pulled the data
in. NiFi does not currently shuffle data around between nodes in the cluster (you can use site-to-site to do
this if you want to, but it won't happen automatically). If you set the number of Concurrent Tasks to 5, then
you will have up to 5 threads running for that processor on each node.

The only exception to this is the Primary Node. You can schedule a Processor to run only on the Primary Node
by right-clicking on the Processor, and going to the Configure menu. In the Scheduling tab, you can change
the Scheduling Strategy to Primary Node Only. In this case, that Processor will only be triggered to run on
whichever node is elected the Primary Node (this can be changed in the Cluster management screen by clicking
the appropriate icon in the top-right corner of the UI).

The GetFile/PutFile will run on all nodes (unless you schedule it to run on primary node only).

If you are attempting to have a single input running HTTP and then push that out across the entire cluster to
process the data, you would have a few options. First, you could just use an HTTP Load Balancer in front of NiFi.
The other option would be to have a ListenHTTP processor run on Primary Node only and then use Site-to-Site
to distribute the data to other nodes.

For more info on site-to-site, you can see the Site-to-Site section of the User Guide at
http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site

If you have any more questions, let us know!

Thanks
-Mark

On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com<javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>> wrote:

Nifi Team – I would like to understand the advantages of Nifi clustering setup.

Questions :

 - How does workflow work on multiple nodes ? Does it share the resources intra nodes ?
Lets say I need to pull data 10 1Gig files from S3, how does work load distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?

 - How to “isolate” the processor to the master node (or one node)?

- Getfile/Putfile processors on cluster setup, does it get/put on primary node ? How do I force processor to look in one of the slave node?

- How can we have a workflow where the input side we want to receive requests (http) and then the rest of the pipeline need to run in parallel on all the nodes ?

Thanks,
-Chakro

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________


--
Sent from Gmail Mobile
________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Nifi cluster features - Questions

Posted by Bryan Bende <bb...@gmail.com>.
The sending node (where the remote process group is) will distribute the
data evenly across the two nodes, so an individual file will only be sent
to one of the nodes. You could think of it as if a separate NiFi instance
was sending directly to a two node cluster, it would be evenly distributing
the data across the two nodes. In this case it just so happens to all be
with in the same cluster.

The most common use case for this scenario is the List and Fetch processors
like HDFS. You can perform the listing on primary node, and then distribute
the results so the fetching takes place on all nodes.

On Saturday, January 9, 2016, Chakrader Dewaragatla <
Chakrader.Dewaragatla@lifelock.com> wrote:

> Bryan – Thanks, how do the nodes distribute the load for a input port. As
> port is open and listening on two nodes,  does it copy same files on both
> the nodes?
> I need to try this setup to see the results, appreciate your help.
>
> Thanks,
> -Chakri
>
> From: Bryan Bende <bbende@gmail.com
> <javascript:_e(%7B%7D,'cvml','bbende@gmail.com');>>
> Reply-To: "users@nifi.apache.org
> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <
> users@nifi.apache.org
> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
> Date: Friday, January 8, 2016 at 3:44 PM
> To: "users@nifi.apache.org
> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <
> users@nifi.apache.org
> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
> Subject: Re: Nifi cluster features - Questions
>
> Hi Chakri,
>
> I believe the DistributeLoad processor is more for load balancing when
> sending to downstream systems. For example, if you had two HTTP endpoints,
> you could have the first relationship from DistributeLoad going to a
> PostHTTP that posts to endpoint #1, and the second relationship going to a
> second PostHTTP that goes to endpoint #2.
>
> If you want to distribute the data with in the cluster, then you need to
> use site-to-site. The way you do this is the following...
>
> - Add an Input Port connected to your PutFile.
> - Add GenerateFlowFile scheduled on primary node only, connected to a
> Remote Process Group. The Remote Process Group should be connected to the
> Input Port from the previous step.
>
> So both nodes have an input port listening for data, but only the primary
> node produces a FlowFile and sends it to the RPG which then re-distributes
> it back to one of the Input Ports.
>
> In order for this to work you need to set nifi.remote.input.socket.port in
> nifi.properties to some available port, and you probably want
> nifi.remote.input.secure=false for testing.
>
> -Bryan
>
>
> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla <
> Chakrader.Dewaragatla@lifelock.com
> <javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>>
> wrote:
>
>> Mark – I have setup a two node cluster and tried the following .
>>  GenrateFlowfile processor (Run only on primary node) —> DistributionLoad
>> processor (RoundRobin)   —> PutFile
>>
>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
>> run on primary node only).
>> From your above comment, It should put file on two nodes. It put files on
>> primary node only. Any thoughts ?
>>
>> Thanks,
>> -Chakri
>>
>> From: Mark Payne <markap14@hotmail.com
>> <javascript:_e(%7B%7D,'cvml','markap14@hotmail.com');>>
>> Reply-To: "users@nifi.apache.org
>> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <
>> users@nifi.apache.org
>> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
>> Date: Wednesday, October 7, 2015 at 11:28 AM
>>
>> To: "users@nifi.apache.org
>> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <
>> users@nifi.apache.org
>> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Chakri,
>>
>> Correct - when NiFi instances are clustered, they do not transfer data
>> between the nodes. This is very different
>> than you might expect from something like Storm or Spark, as the key
>> goals and design are quite different.
>> We have discussed providing the ability to allow the user to indicate
>> that they want to have the framework
>> do load balancing for specific connections in the background, but it's
>> still in more of a discussion phase.
>>
>> Site-to-Site is simply the capability that we have developed to transfer
>> data between one instance of
>> NiFi and another instance of NiFi. So currently, if we want to do load
>> balancing across the cluster, we would
>> create a site-to-site connection (by dragging a Remote Process Group onto
>> the graph) and give that
>> site-to-site connection the URL of our cluster. That way, you can push
>> data to your own cluster, effectively
>> providing a load balancing capability.
>>
>> If you were to just run ListenHTTP without setting it to Primary Node,
>> then every node in the cluster will be listening
>> for incoming HTTP connections. So you could then use a simple load
>> balancer in front of NiFi to distribute the load
>> across your cluster.
>>
>> Does this help? If you have any more questions we're happy to help!
>>
>> Thanks
>> -Mark
>>
>>
>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla <
>> Chakrader.Dewaragatla@lifelock.com
>> <javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>>
>> wrote:
>>
>> Mark - Thanks for the notes.
>>
>> >> The other option would be to have a ListenHTTP processor run on
>> Primary Node only and then use Site-to-Site to distribute the data to other
>> nodes.
>> Lets say I have 5 node cluster and ListenHTTP processor on Primary node,
>> collected data on primary node is not transfered to other nodes by default
>> for processing despite all nodes are part of one cluster?
>> If ListenHTTP processor is running  as a dafult (with out explicit
>> setting to run on primary node), how does the data transferred to rest of
>> the nodes? Does site-to-site come in play when I make one processor to run
>> on primary node ?
>>
>> Thanks,
>> -Chakri
>>
>> From: Mark Payne <markap14@hotmail.com
>> <javascript:_e(%7B%7D,'cvml','markap14@hotmail.com');>>
>> Reply-To: "users@nifi.apache.org
>> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <
>> users@nifi.apache.org
>> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
>> Date: Wednesday, October 7, 2015 at 7:00 AM
>> To: "users@nifi.apache.org
>> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>" <
>> users@nifi.apache.org
>> <javascript:_e(%7B%7D,'cvml','users@nifi.apache.org');>>
>> Subject: Re: Nifi cluster features - Questions
>>
>> Hello Chakro,
>>
>> When you create a cluster of NiFi instances, each node in the cluster is
>> acting independently and in exactly
>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the
>> same flow. However, they will be
>> pulling in different data and therefore operating on different data.
>>
>> So if you pull in 10 1-gig files from S3, each of those files will be
>> processed on the node that pulled the data
>> in. NiFi does not currently shuffle data around between nodes in the
>> cluster (you can use site-to-site to do
>> this if you want to, but it won't happen automatically). If you set the
>> number of Concurrent Tasks to 5, then
>> you will have up to 5 threads running for that processor on each node.
>>
>> The only exception to this is the Primary Node. You can schedule a
>> Processor to run only on the Primary Node
>> by right-clicking on the Processor, and going to the Configure menu. In
>> the Scheduling tab, you can change
>> the Scheduling Strategy to Primary Node Only. In this case, that
>> Processor will only be triggered to run on
>> whichever node is elected the Primary Node (this can be changed in the
>> Cluster management screen by clicking
>> the appropriate icon in the top-right corner of the UI).
>>
>> The GetFile/PutFile will run on all nodes (unless you schedule it to run
>> on primary node only).
>>
>> If you are attempting to have a single input running HTTP and then push
>> that out across the entire cluster to
>> process the data, you would have a few options. First, you could just use
>> an HTTP Load Balancer in front of NiFi.
>> The other option would be to have a ListenHTTP processor run on Primary
>> Node only and then use Site-to-Site
>> to distribute the data to other nodes.
>>
>> For more info on site-to-site, you can see the Site-to-Site section of
>> the User Guide at
>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>>
>> If you have any more questions, let us know!
>>
>> Thanks
>> -Mark
>>
>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla <
>> Chakrader.Dewaragatla@lifelock.com
>> <javascript:_e(%7B%7D,'cvml','Chakrader.Dewaragatla@lifelock.com');>>
>> wrote:
>>
>> Nifi Team – I would like to understand the advantages of Nifi clustering
>> setup.
>>
>> Questions :
>>
>>  - How does workflow work on multiple nodes ? Does it share the resources
>> intra nodes ?
>> Lets say I need to pull data 10 1Gig files from S3, how does work load
>> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node
>> ?
>>
>>  - How to “isolate” the processor to the master node (or one node)?
>>
>> - Getfile/Putfile processors on cluster setup, does it get/put on primary
>> node ? How do I force processor to look in one of the slave node?
>>
>> - How can we have a workflow where the input side we want to receive
>> requests (http) and then the rest of the pipeline need to run in parallel
>> on all the nodes ?
>>
>> Thanks,
>> -Chakro
>>
>> ------------------------------
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>> ------------------------------
>>
>>
>> ------------------------------
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>> ------------------------------
>>
>>
>> ------------------------------
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>> ------------------------------
>>
>
> ------------------------------
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>


-- 
Sent from Gmail Mobile

Re: Nifi cluster features - Questions

Posted by Chakrader Dewaragatla <Ch...@lifelock.com>.
Bryan – Thanks, how do the nodes distribute the load for a input port. As port is open and listening on two nodes,  does it copy same files on both the nodes?
I need to try this setup to see the results, appreciate your help.

Thanks,
-Chakri

From: Bryan Bende <bb...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Friday, January 8, 2016 at 3:44 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Hi Chakri,

I believe the DistributeLoad processor is more for load balancing when sending to downstream systems. For example, if you had two HTTP endpoints,
you could have the first relationship from DistributeLoad going to a PostHTTP that posts to endpoint #1, and the second relationship going to a second PostHTTP that goes to endpoint #2.

If you want to distribute the data with in the cluster, then you need to use site-to-site. The way you do this is the following...

- Add an Input Port connected to your PutFile.
- Add GenerateFlowFile scheduled on primary node only, connected to a Remote Process Group. The Remote Process Group should be connected to the Input Port from the previous step.

So both nodes have an input port listening for data, but only the primary node produces a FlowFile and sends it to the RPG which then re-distributes it back to one of the Input Ports.

In order for this to work you need to set nifi.remote.input.socket.port in nifi.properties to some available port, and you probably want nifi.remote.input.secure=false for testing.

-Bryan


On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:
Mark – I have setup a two node cluster and tried the following .
 GenrateFlowfile processor (Run only on primary node) —> DistributionLoad processor (RoundRobin)   —> PutFile

>> The GetFile/PutFile will run on all nodes (unless you schedule it to run on primary node only).
>From your above comment, It should put file on two nodes. It put files on primary node only. Any thoughts ?

Thanks,
-Chakri

From: Mark Payne <ma...@hotmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Wednesday, October 7, 2015 at 11:28 AM

To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Chakri,

Correct - when NiFi instances are clustered, they do not transfer data between the nodes. This is very different
than you might expect from something like Storm or Spark, as the key goals and design are quite different.
We have discussed providing the ability to allow the user to indicate that they want to have the framework
do load balancing for specific connections in the background, but it's still in more of a discussion phase.

Site-to-Site is simply the capability that we have developed to transfer data between one instance of
NiFi and another instance of NiFi. So currently, if we want to do load balancing across the cluster, we would
create a site-to-site connection (by dragging a Remote Process Group onto the graph) and give that
site-to-site connection the URL of our cluster. That way, you can push data to your own cluster, effectively
providing a load balancing capability.

If you were to just run ListenHTTP without setting it to Primary Node, then every node in the cluster will be listening
for incoming HTTP connections. So you could then use a simple load balancer in front of NiFi to distribute the load
across your cluster.

Does this help? If you have any more questions we're happy to help!

Thanks
-Mark


On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Mark - Thanks for the notes.

>> The other option would be to have a ListenHTTP processor run on Primary Node only and then use Site-to-Site to distribute the data to other nodes.
Lets say I have 5 node cluster and ListenHTTP processor on Primary node, collected data on primary node is not transfered to other nodes by default for processing despite all nodes are part of one cluster?
If ListenHTTP processor is running  as a dafult (with out explicit setting to run on primary node), how does the data transferred to rest of the nodes? Does site-to-site come in play when I make one processor to run on primary node ?

Thanks,
-Chakri

From: Mark Payne <ma...@hotmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Wednesday, October 7, 2015 at 7:00 AM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: Nifi cluster features - Questions

Hello Chakro,

When you create a cluster of NiFi instances, each node in the cluster is acting independently and in exactly
the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the same flow. However, they will be
pulling in different data and therefore operating on different data.

So if you pull in 10 1-gig files from S3, each of those files will be processed on the node that pulled the data
in. NiFi does not currently shuffle data around between nodes in the cluster (you can use site-to-site to do
this if you want to, but it won't happen automatically). If you set the number of Concurrent Tasks to 5, then
you will have up to 5 threads running for that processor on each node.

The only exception to this is the Primary Node. You can schedule a Processor to run only on the Primary Node
by right-clicking on the Processor, and going to the Configure menu. In the Scheduling tab, you can change
the Scheduling Strategy to Primary Node Only. In this case, that Processor will only be triggered to run on
whichever node is elected the Primary Node (this can be changed in the Cluster management screen by clicking
the appropriate icon in the top-right corner of the UI).

The GetFile/PutFile will run on all nodes (unless you schedule it to run on primary node only).

If you are attempting to have a single input running HTTP and then push that out across the entire cluster to
process the data, you would have a few options. First, you could just use an HTTP Load Balancer in front of NiFi.
The other option would be to have a ListenHTTP processor run on Primary Node only and then use Site-to-Site
to distribute the data to other nodes.

For more info on site-to-site, you can see the Site-to-Site section of the User Guide at
http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site

If you have any more questions, let us know!

Thanks
-Mark

On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Nifi Team – I would like to understand the advantages of Nifi clustering setup.

Questions :

 - How does workflow work on multiple nodes ? Does it share the resources intra nodes ?
Lets say I need to pull data 10 1Gig files from S3, how does work load distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ?

 - How to “isolate” the processor to the master node (or one node)?

- Getfile/Putfile processors on cluster setup, does it get/put on primary node ? How do I force processor to look in one of the slave node?

- How can we have a workflow where the input side we want to receive requests (http) and then the rest of the pipeline need to run in parallel on all the nodes ?

Thanks,
-Chakro

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Nifi cluster features - Questions

Posted by Bryan Bende <bb...@gmail.com>.
Hi Chakri,

I believe the DistributeLoad processor is more for load balancing when
sending to downstream systems. For example, if you had two HTTP endpoints,
you could have the first relationship from DistributeLoad going to a
PostHTTP that posts to endpoint #1, and the second relationship going to a
second PostHTTP that goes to endpoint #2.

If you want to distribute the data with in the cluster, then you need to
use site-to-site. The way you do this is the following...

- Add an Input Port connected to your PutFile.
- Add GenerateFlowFile scheduled on primary node only, connected to a
Remote Process Group. The Remote Process Group should be connected to the
Input Port from the previous step.

So both nodes have an input port listening for data, but only the primary
node produces a FlowFile and sends it to the RPG which then re-distributes
it back to one of the Input Ports.

In order for this to work you need to set nifi.remote.input.socket.port in
nifi.properties to some available port, and you probably want
nifi.remote.input.secure=false for testing.

-Bryan


On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla <
Chakrader.Dewaragatla@lifelock.com> wrote:

> Mark – I have setup a two node cluster and tried the following .
>  GenrateFlowfile processor (Run only on primary node) —> DistributionLoad
> processor (RoundRobin)   —> PutFile
>
> >> The GetFile/PutFile will run on all nodes (unless you schedule it to
> run on primary node only).
> From your above comment, It should put file on two nodes. It put files on
> primary node only. Any thoughts ?
>
> Thanks,
> -Chakri
>
> From: Mark Payne <ma...@hotmail.com>
> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Date: Wednesday, October 7, 2015 at 11:28 AM
>
> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Subject: Re: Nifi cluster features - Questions
>
> Chakri,
>
> Correct - when NiFi instances are clustered, they do not transfer data
> between the nodes. This is very different
> than you might expect from something like Storm or Spark, as the key goals
> and design are quite different.
> We have discussed providing the ability to allow the user to indicate that
> they want to have the framework
> do load balancing for specific connections in the background, but it's
> still in more of a discussion phase.
>
> Site-to-Site is simply the capability that we have developed to transfer
> data between one instance of
> NiFi and another instance of NiFi. So currently, if we want to do load
> balancing across the cluster, we would
> create a site-to-site connection (by dragging a Remote Process Group onto
> the graph) and give that
> site-to-site connection the URL of our cluster. That way, you can push
> data to your own cluster, effectively
> providing a load balancing capability.
>
> If you were to just run ListenHTTP without setting it to Primary Node,
> then every node in the cluster will be listening
> for incoming HTTP connections. So you could then use a simple load
> balancer in front of NiFi to distribute the load
> across your cluster.
>
> Does this help? If you have any more questions we're happy to help!
>
> Thanks
> -Mark
>
>
> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla <
> Chakrader.Dewaragatla@lifelock.com> wrote:
>
> Mark - Thanks for the notes.
>
> >> The other option would be to have a ListenHTTP processor run on Primary
> Node only and then use Site-to-Site to distribute the data to other nodes.
> Lets say I have 5 node cluster and ListenHTTP processor on Primary node,
> collected data on primary node is not transfered to other nodes by default
> for processing despite all nodes are part of one cluster?
> If ListenHTTP processor is running  as a dafult (with out explicit setting
> to run on primary node), how does the data transferred to rest of the
> nodes? Does site-to-site come in play when I make one processor to run on
> primary node ?
>
> Thanks,
> -Chakri
>
> From: Mark Payne <ma...@hotmail.com>
> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Date: Wednesday, October 7, 2015 at 7:00 AM
> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Subject: Re: Nifi cluster features - Questions
>
> Hello Chakro,
>
> When you create a cluster of NiFi instances, each node in the cluster is
> acting independently and in exactly
> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the
> same flow. However, they will be
> pulling in different data and therefore operating on different data.
>
> So if you pull in 10 1-gig files from S3, each of those files will be
> processed on the node that pulled the data
> in. NiFi does not currently shuffle data around between nodes in the
> cluster (you can use site-to-site to do
> this if you want to, but it won't happen automatically). If you set the
> number of Concurrent Tasks to 5, then
> you will have up to 5 threads running for that processor on each node.
>
> The only exception to this is the Primary Node. You can schedule a
> Processor to run only on the Primary Node
> by right-clicking on the Processor, and going to the Configure menu. In
> the Scheduling tab, you can change
> the Scheduling Strategy to Primary Node Only. In this case, that Processor
> will only be triggered to run on
> whichever node is elected the Primary Node (this can be changed in the
> Cluster management screen by clicking
> the appropriate icon in the top-right corner of the UI).
>
> The GetFile/PutFile will run on all nodes (unless you schedule it to run
> on primary node only).
>
> If you are attempting to have a single input running HTTP and then push
> that out across the entire cluster to
> process the data, you would have a few options. First, you could just use
> an HTTP Load Balancer in front of NiFi.
> The other option would be to have a ListenHTTP processor run on Primary
> Node only and then use Site-to-Site
> to distribute the data to other nodes.
>
> For more info on site-to-site, you can see the Site-to-Site section of the
> User Guide at
> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
>
> If you have any more questions, let us know!
>
> Thanks
> -Mark
>
> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla <
> Chakrader.Dewaragatla@lifelock.com> wrote:
>
> Nifi Team – I would like to understand the advantages of Nifi clustering
> setup.
>
> Questions :
>
>  - How does workflow work on multiple nodes ? Does it share the resources
> intra nodes ?
> Lets say I need to pull data 10 1Gig files from S3, how does work load
> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node
> ?
>
>  - How to “isolate” the processor to the master node (or one node)?
>
> - Getfile/Putfile processors on cluster setup, does it get/put on primary
> node ? How do I force processor to look in one of the slave node?
>
> - How can we have a workflow where the input side we want to receive
> requests (http) and then the rest of the pipeline need to run in parallel
> on all the nodes ?
>
> Thanks,
> -Chakro
>
> ------------------------------
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>
>
> ------------------------------
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>
>
> ------------------------------
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>