You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Corey Flowers <cf...@onyxpoint.com> on 2015/10/02 20:23:19 UTC

flow.tar.stale

I have a cluster that is running production and the nodes within the
cluster keep falling out. The flow.tar is stuck in flow.tar.stale but I
can't tell which server is causing the timing issue. In ApacheNIFI, which
conf property actually increases the setting for the node to respond? I see:

nifi.cluster.manager.flow.retrieval.delay
description: the delay before the cluster manager retrieves the latest flow
configuration.
But I thought this pulled the flow.xml out of memory to save to disk.

What I need is to increase the time before a node drops out of the cluster
because the flow.tar is stale.

Also, it would be great if in the logs it said which node had not responded
and times of each successful response. This would greatly help to identify
systems that are the slow pokes or the ones that are potentially too busy
to respond fast enough.

Thanks!

-- 
Corey Flowers
Vice President, Onyx Point, Inc
(410) 541-6699
cflowers@onyxpoint.com

-- This account not approved for unencrypted proprietary information --

Re: flow.tar.stale

Posted by Mark Payne <ma...@hotmail.com>.

Corey,

I think the properties you're looking for are:

nifi.cluster.manager.node.api.read.timeout - the amount of time to wait between each successful transfer of data before considering it an error. I.e., if we go this amount of time (30 secs by default) without receiving any data from the node, it will timeout.

nifi.cluster.manager.node.api.connection.timeout - the amount of time to wait for a connection to be established before timing out.

If you are seeing timeouts without any indication of which node is the problem, I certainly agree that is a problem. Can you provide the actual error message that you are seeing, so that it's easier to understand where in the code the timeout is actually occurring?

In the meantime, you should see timing info if you add the following line to your conf/logback.xml file:
<logger name="org.apache.nifi.cluster.manager.impl.HttpRequestReplicatorImpl" level="DEBUG" />

That will provide some pretty verbose logging, though, as it logs timing info for each request to each node, as well as min, max, average.

Thanks
-Mark

> On Oct 2, 2015, at 2:23 PM, Corey Flowers <cf...@onyxpoint.com> wrote:
> 
> I have a cluster that is running production and the nodes within the
> cluster keep falling out. The flow.tar is stuck in flow.tar.stale but I
> can't tell which server is causing the timing issue. In ApacheNIFI, which
> conf property actually increases the setting for the node to respond? I see:
> 
> nifi.cluster.manager.flow.retrieval.delay
> description: the delay before the cluster manager retrieves the latest flow
> configuration.
> But I thought this pulled the flow.xml out of memory to save to disk.
> 
> What I need is to increase the time before a node drops out of the cluster
> because the flow.tar is stale.
> 
> Also, it would be great if in the logs it said which node had not responded
> and times of each successful response. This would greatly help to identify
> systems that are the slow pokes or the ones that are potentially too busy
> to respond fast enough.
> 
> Thanks!
> 
> -- 
> Corey Flowers
> Vice President, Onyx Point, Inc
> (410) 541-6699
> cflowers@onyxpoint.com
> 
> -- This account not approved for unencrypted proprietary information --