You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by "tommy.yardley@baesystems.com" <to...@baesystems.com> on 2015/09/03 14:14:13 UTC

Resetting counters whilst clustered disconnects nodes

Hi,

I have a three machine setup (1 NCM + 2 Nodes) running 0.2.0-incubating and observed the following:

1. Resetting counters can result in the MCN disconnecting a node

2. The node that is disconnected begins processing FlowFiles

Description:

My clustered NiFi is running a single pipeline containing 3 processors. While the pipeline is running, resetting counters will result in any nodes which are not processing anything (i.e. are not contributing to the count) to disconnect. The node can then be reconnected via the UI. Looking at the stats it appears the pipeline then began running on the disconnected node, as well as the single remaining connected node. This has been tested using custom processors as well as standard processors.

Steps to Replicate:

1. Create cluster with 2 nodes + 1 MCN (2 nodes for processing are needed or the problem won't appear)

2. Add GenerateFlowFile processor:

a. Scheduling: Change Scheduling strategy to 'On primary node'

b. Properties: Change File Size to '10B' (say)

3. Add HashAttribute processor:

a. Properties: Change Key to 'hash.value'

4. Add DetectDuplicate processor:

a. Properties: Under Distributed Cache Service add a 'DistributedMapCacheClientService'

i. For the Client Service Add Server name to 'localhost' under properties

ii. Enable The Client Service

iii. Add a DistrubtedMapCacheServer under the Controller Services

iv. Enable the Cache Server

v. Exit NiFi Flow Settings

5. Connect all 3 processors on success

6. Auto-terminate all options for DetectDuplicate

7. Run all processors and wait for ~10seconds or so

8. Open counters tab and refresh to make sure counters > 0

9. Reset one of the counters

Note: I'm specifically using the DetectDuplicate processor in this example because it contains a custom counter.

This should then disconnect the node that was not active (node that was not selected to be the primary). Even though the GenerateFlowFile processor is scheduled to run on the primary node the disconnected node begins to emit FlowFiles.

The following Warning was pulled from the MCNs logs:

2015-09-02 10:40:16,750 WARN [NiFi Web Server-149] o.a.n.c.manager.impl.WebClusterManager One or more nodes failed to process URI 'http://localhost:8082/nifi-api/controller/counters/2207ea22-0d4a-389d-b746-82e568c6228d'. Requesting each node to disconnect from cluster.

I'm interested in knowing if this is expected behaviour or if I should open a JIRA ticket (2 perhaps).

Thanks,
Tommy
Please consider the environment before printing this email. This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of BAE Systems Applied Intelligence Limited, details of which can be found at http://www.baesystems.com/Businesses/index.htm.

Re: Resetting counters whilst clustered disconnects nodes

Posted by Matt Gilman <ma...@gmail.com>.

Tommy,

Thanks for the great write up! I've replicated the issue of the node disconnecting using the steps you've provided. I've created a JIRA for the issue [1]. For the other concern, that is how it's currently designed to work. The 'run on primary node only' applies when a node is part of a cluster. If a node is disconnected from a cluster and a processor is configured with that scheduling strategy the processor will run as though it's timer driven.

We should have the counters issue addressed for the upcoming 0.3.0 release.

Thanks!

Matt

[1] https://issues.apache.org/jira/browse/NIFI-926

> On Thu, Sep 3, 2015 at 5:14 AM, tommy.yardley@baesystems.com <to...@baesystems.com> wrote:
> Hi,
> 
> I have a three machine setup (1 NCM + 2 Nodes) running 0.2.0-incubating and observed the following:
> 
> 
> 1.       Resetting counters can result in the MCN disconnecting a node
> 
> 2.       The node that is disconnected begins processing FlowFiles
> 
> Description:
> 
> My clustered NiFi is running a single pipeline containing 3 processors. While the pipeline is running, resetting counters will result in any nodes which are not processing anything (i.e. are not contributing to the count) to disconnect. The node can then be reconnected via the UI. Looking at the stats it appears the pipeline then began running on the disconnected node, as well as the single remaining connected node. This has been tested using custom processors as well as standard processors.
> 
> Steps to Replicate:
> 
> 
> 1.       Create cluster with 2 nodes + 1 MCN (2 nodes for processing are needed or the problem won't appear)
> 
> 2.       Add GenerateFlowFile processor:
> 
> a.       Scheduling: Change Scheduling strategy to 'On primary node'
> 
> b.      Properties: Change File Size to '10B' (say)
> 
> 3.       Add HashAttribute processor:
> 
> a.       Properties: Change Key to 'hash.value'
> 
> 4.       Add DetectDuplicate processor:
> 
> a.       Properties: Under Distributed Cache Service add a 'DistributedMapCacheClientService'
> 
>                                                                i.      For the Client Service Add Server name to 'localhost' under properties
> 
>                                                              ii.      Enable The Client Service
> 
>                                                             iii.      Add a DistrubtedMapCacheServer under the Controller Services
> 
>                                                            iv.      Enable the Cache Server
> 
>                                                              v.      Exit NiFi Flow Settings
> 
> 5.       Connect all 3 processors on success
> 
> 6.       Auto-terminate all options for DetectDuplicate
> 
> 7.       Run all processors and wait for ~10seconds or so
> 
> 8.       Open counters tab and refresh to make sure counters > 0
> 
> 9.       Reset one of the counters
> 
> Note: I'm specifically using the DetectDuplicate processor in this example because it contains a custom counter.
> 
> This should then disconnect the node that was not active (node that was not selected to be the primary). Even though the GenerateFlowFile processor is scheduled to run on the primary node the disconnected node begins to emit FlowFiles.
> 
> The following Warning was pulled from the MCNs logs:
> 
> 2015-09-02 10:40:16,750 WARN [NiFi Web Server-149] o.a.n.c.manager.impl.WebClusterManager One or more nodes failed to process URI 'http://localhost:8082/nifi-api/controller/counters/2207ea22-0d4a-389d-b746-82e568c6228d'.  Requesting each node to disconnect from cluster.
> 
> I'm interested in knowing if this is expected behaviour or if I should open a JIRA ticket (2 perhaps).
> 
> Thanks,
> Tommy
> Please consider the environment before printing this email. This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of BAE Systems Applied Intelligence Limited, details of which can be found at http://www.baesystems.com/Businesses/index.htm.