You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Arnaud G <gr...@gmail.com> on 2017/05/12 09:47:17 UTC

Queue incoherent state

Hi again!

I currently have  another issue with incoherent queue status.

Following the upgrade to 1.2 of a cluster, I have a couple of queues that
display through the GUI a high number of flowfiles.

As the queue were no emptying despite tuning, I tried to list the content
of the queue. This action returns that the queue contains no flowfile,
which is not the expected as the GUI displays another value.

If I try to empty the queue, I receive a message: 0 FlowFiles (0 bytes) out
of 210'000 (92.71MB) were removed from the queue.

And of course I cannot delete the queue as this action reports me that the
queue is not empty.

So somehow it seems that the queue are empty but that the current display
of the queue don't reflect it (it is very likely that some data were lost
during the upgrade procedure as we had to reboot a few node to change the
heap property)

What will be the best method to restore a proper state and be able to edit
the flow file again?

Thank you!

Arnaud

Re: Queue incoherent state

Posted by Arnaud G <gr...@gmail.com>.

Hi again,

Unfortunately I didn't save the logs when the node restarted, but I don't
remember anything that provided me a clue regarding the reason of the
blocked queue.

I just have a few logs during the week-end when the queues were in this
strange state:

2017-05-14 09:01:29,635 INFO [pool-12-thread-1] org.wali.
MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@7bdc0ad3
checkpointed with 85016 Records and 18 Swap Files in 5447 milliseconds
(Stop-the-world time = 37 milliseconds, Clear Edit Logs time = 30 millis),
max Transaction ID 307183065

2017-05-14 09:04:48,056 INFO [Write-Ahead Local State Provider Maintenance]
org.wali.MinimalLockingWriteAheadLog
org.wali.MinimalLockingWriteAheadLog@265c0752
checkpointed with 2 Records and 0 Swap Files in 22 milliseconds
(Stop-the-world time = 1 milliseconds, Clear Edit Logs time = 1 millis),
max Transaction ID 7

2017-05-14 09:05:37,737 INFO [pool-12-thread-1] org.wali.
MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@7bdc0ad3
checkpointed with 85016 Records and 18 Swap Files in 4677 milliseconds
(Stop-the-world time = 35 milliseconds, Clear Edit Logs time = 17 millis),
max Transaction ID 307183065

2017-05-14 09:11:50,435 INFO [pool-12-thread-1] o.a.n.c.r.
WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository
with 85016 records in 4057

milliseconds


This log is reporting 85K records that were not available if I requested
the queue status (the queue was always empty) and overall the queue were
reporting more elements (over 100K)



As we can see this records were stuck as 2 hours later they were still
there, and other records were flowing nicely during the week-end in the
cluster.


2017-05-14 11:01:12,839 INFO [pool-12-thread-1] o.a.n.c.r.
WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository
with 85016 records in 3408 milliseconds


Regarding disk space, I don't think that it ran out of space at any moment.
I even have a local backup of the flowfile directory that I did before
emptying it.


I hope it helps.


Arnaud






On Tue, May 16, 2017 at 3:51 PM, Mark Payne <ma...@hotmail.com> wrote:

> Arnaud,
>
> Did you have any WARN or ERROR messages in the logs? I'm particular
> interested in anything
> that mentions the word "Swap" or "swap" (i.e., regardless of case). Is it
> possible that the FlowFile Repository
> could have run out of disk space?
>
> Thanks
> -Mark
>
> On May 16, 2017, at 3:34 AM, Arnaud G <gr...@gmail.com> wrote:
>
> Hi Matt,
>
> Thanks for your reply!
>
> I finally solved the problem by deleting all the content in the flowfile
> directory, but here are my observations:
>
> 1) The problem was coming from one of the cluster node, when this node was
> out of the cluster, the queue were reporting 0 flowfile.
> 2) The first time I restarted this node, about 20'000 flowfile reappeared
> and were treated, every time I subsequently restarted this node about
> 20-30k flowfiles were again treated (I was only specifically monitoring one
> queue, but it happened for multiple other queues)
> 3) After 3-4 reboots of this node the queue reported 90K elements and
> remained in this state despite multiple other restart.
> 4) The flowfile directory on this node contained 200 MB of data
> 5) I tried to setup the flowfile expiration but it didn't do anything to
> the queue status
> 6) I tried to change the backpressure threshold without any effect.
> 7) During the problem the queue was operating normally on the cluster, and
> flowfiles were flowing through it without any issue.
>
> Arnaud
>
>
>
> On Mon, May 15, 2017 at 10:39 PM, Matt Gilman <ma...@gmail.com>
> wrote:
>
>> Sorry for the delayed response. Similar behavior has been reported by
>> some other users [1]. Does the connection have any back pressure threshold
>> configured? Can new flowfiles be enqueued? Do the expiration settings have
>> any affect?
>>
>> Lastly, if you restart the cluster does it claim the connection still has
>> flowfiles enqueued?
>>
>> Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-3897
>>
>> On Fri, May 12, 2017 at 5:47 AM, Arnaud G <gr...@gmail.com> wrote:
>>
>>> Hi again!
>>>
>>> I currently have  another issue with incoherent queue status.
>>>
>>> Following the upgrade to 1.2 of a cluster, I have a couple of queues
>>> that display through the GUI a high number of flowfiles.
>>>
>>> As the queue were no emptying despite tuning, I tried to list the
>>> content of the queue. This action returns that the queue contains no
>>> flowfile, which is not the expected as the GUI displays another value.
>>>
>>> If I try to empty the queue, I receive a message: 0 FlowFiles (0 bytes)
>>> out of 210'000 (92.71MB) were removed from the queue.
>>>
>>> And of course I cannot delete the queue as this action reports me that
>>> the queue is not empty.
>>>
>>> So somehow it seems that the queue are empty but that the current
>>> display of the queue don't reflect it (it is very likely that some data
>>> were lost during the upgrade procedure as we had to reboot a few node to
>>> change the heap property)
>>>
>>> What will be the best method to restore a proper state and be able to
>>> edit the flow file again?
>>>
>>> Thank you!
>>>
>>> Arnaud
>>>
>>>
>>>
>>
>
>

Re: Queue incoherent state

Posted by Mark Payne <ma...@hotmail.com>.

Arnaud,

Did you have any WARN or ERROR messages in the logs? I'm particular interested in anything
that mentions the word "Swap" or "swap" (i.e., regardless of case). Is it possible that the FlowFile Repository
could have run out of disk space?

Thanks
-Mark

On May 16, 2017, at 3:34 AM, Arnaud G <gr...@gmail.com>> wrote:

Hi Matt,

Thanks for your reply!

I finally solved the problem by deleting all the content in the flowfile directory, but here are my observations:

1) The problem was coming from one of the cluster node, when this node was out of the cluster, the queue were reporting 0 flowfile.
2) The first time I restarted this node, about 20'000 flowfile reappeared and were treated, every time I subsequently restarted this node about 20-30k flowfiles were again treated (I was only specifically monitoring one queue, but it happened for multiple other queues)
3) After 3-4 reboots of this node the queue reported 90K elements and remained in this state despite multiple other restart.
4) The flowfile directory on this node contained 200 MB of data
5) I tried to setup the flowfile expiration but it didn't do anything to the queue status
6) I tried to change the backpressure threshold without any effect.
7) During the problem the queue was operating normally on the cluster, and flowfiles were flowing through it without any issue.

Arnaud

On Mon, May 15, 2017 at 10:39 PM, Matt Gilman <ma...@gmail.com>> wrote:
Sorry for the delayed response. Similar behavior has been reported by some other users [1]. Does the connection have any back pressure threshold configured? Can new flowfiles be enqueued? Do the expiration settings have any affect?

Lastly, if you restart the cluster does it claim the connection still has flowfiles enqueued?

Matt

[1] https://issues.apache.org/jira/browse/NIFI-3897

On Fri, May 12, 2017 at 5:47 AM, Arnaud G <gr...@gmail.com>> wrote:
Hi again!

I currently have another issue with incoherent queue status.

Following the upgrade to 1.2 of a cluster, I have a couple of queues that display through the GUI a high number of flowfiles.

As the queue were no emptying despite tuning, I tried to list the content of the queue. This action returns that the queue contains no flowfile, which is not the expected as the GUI displays another value.

If I try to empty the queue, I receive a message: 0 FlowFiles (0 bytes) out of 210'000 (92.71MB) were removed from the queue.

And of course I cannot delete the queue as this action reports me that the queue is not empty.

So somehow it seems that the queue are empty but that the current display of the queue don't reflect it (it is very likely that some data were lost during the upgrade procedure as we had to reboot a few node to change the heap property)

What will be the best method to restore a proper state and be able to edit the flow file again?

Thank you!

Arnaud

Re: Queue incoherent state

Posted by Matt Gilman <ma...@gmail.com>.

Thanks for following up with these details. I'm going to add these
observations to the corresponding JIRA [1].

Thanks and sorry again for the inconvenience.

Matt

[1] https://issues.apache.org/jira/browse/NIFI-3897

On Tue, May 16, 2017 at 3:34 AM, Arnaud G <gr...@gmail.com> wrote:

> Hi Matt,
>
> Thanks for your reply!
>
> I finally solved the problem by deleting all the content in the flowfile
> directory, but here are my observations:
>
> 1) The problem was coming from one of the cluster node, when this node was
> out of the cluster, the queue were reporting 0 flowfile.
> 2) The first time I restarted this node, about 20'000 flowfile reappeared
> and were treated, every time I subsequently restarted this node about
> 20-30k flowfiles were again treated (I was only specifically monitoring one
> queue, but it happened for multiple other queues)
> 3) After 3-4 reboots of this node the queue reported 90K elements and
> remained in this state despite multiple other restart.
> 4) The flowfile directory on this node contained 200 MB of data
> 5) I tried to setup the flowfile expiration but it didn't do anything to
> the queue status
> 6) I tried to change the backpressure threshold without any effect.
> 7) During the problem the queue was operating normally on the cluster, and
> flowfiles were flowing through it without any issue.
>
> Arnaud
>
>
>
> On Mon, May 15, 2017 at 10:39 PM, Matt Gilman <ma...@gmail.com>
> wrote:
>
>> Sorry for the delayed response. Similar behavior has been reported by
>> some other users [1]. Does the connection have any back pressure threshold
>> configured? Can new flowfiles be enqueued? Do the expiration settings have
>> any affect?
>>
>> Lastly, if you restart the cluster does it claim the connection still has
>> flowfiles enqueued?
>>
>> Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-3897
>>
>> On Fri, May 12, 2017 at 5:47 AM, Arnaud G <gr...@gmail.com> wrote:
>>
>>> Hi again!
>>>
>>> I currently have  another issue with incoherent queue status.
>>>
>>> Following the upgrade to 1.2 of a cluster, I have a couple of queues
>>> that display through the GUI a high number of flowfiles.
>>>
>>> As the queue were no emptying despite tuning, I tried to list the
>>> content of the queue. This action returns that the queue contains no
>>> flowfile, which is not the expected as the GUI displays another value.
>>>
>>> If I try to empty the queue, I receive a message: 0 FlowFiles (0 bytes)
>>> out of 210'000 (92.71MB) were removed from the queue.
>>>
>>> And of course I cannot delete the queue as this action reports me that
>>> the queue is not empty.
>>>
>>> So somehow it seems that the queue are empty but that the current
>>> display of the queue don't reflect it (it is very likely that some data
>>> were lost during the upgrade procedure as we had to reboot a few node to
>>> change the heap property)
>>>
>>> What will be the best method to restore a proper state and be able to
>>> edit the flow file again?
>>>
>>> Thank you!
>>>
>>> Arnaud
>>>
>>>
>>>
>>
>

Re: Queue incoherent state

Posted by Arnaud G <gr...@gmail.com>.

Hi Matt,

Thanks for your reply!

I finally solved the problem by deleting all the content in the flowfile
directory, but here are my observations:

1) The problem was coming from one of the cluster node, when this node was
out of the cluster, the queue were reporting 0 flowfile.
2) The first time I restarted this node, about 20'000 flowfile reappeared
and were treated, every time I subsequently restarted this node about
20-30k flowfiles were again treated (I was only specifically monitoring one
queue, but it happened for multiple other queues)
3) After 3-4 reboots of this node the queue reported 90K elements and
remained in this state despite multiple other restart.
4) The flowfile directory on this node contained 200 MB of data
5) I tried to setup the flowfile expiration but it didn't do anything to
the queue status
6) I tried to change the backpressure threshold without any effect.
7) During the problem the queue was operating normally on the cluster, and
flowfiles were flowing through it without any issue.

Arnaud

On Mon, May 15, 2017 at 10:39 PM, Matt Gilman <ma...@gmail.com>
wrote:

> Sorry for the delayed response. Similar behavior has been reported by some
> other users [1]. Does the connection have any back pressure threshold
> configured? Can new flowfiles be enqueued? Do the expiration settings have
> any affect?
>
> Lastly, if you restart the cluster does it claim the connection still has
> flowfiles enqueued?
>
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-3897
>
> On Fri, May 12, 2017 at 5:47 AM, Arnaud G <gr...@gmail.com> wrote:
>
>> Hi again!
>>
>> I currently have  another issue with incoherent queue status.
>>
>> Following the upgrade to 1.2 of a cluster, I have a couple of queues that
>> display through the GUI a high number of flowfiles.
>>
>> As the queue were no emptying despite tuning, I tried to list the content
>> of the queue. This action returns that the queue contains no flowfile,
>> which is not the expected as the GUI displays another value.
>>
>> If I try to empty the queue, I receive a message: 0 FlowFiles (0 bytes)
>> out of 210'000 (92.71MB) were removed from the queue.
>>
>> And of course I cannot delete the queue as this action reports me that
>> the queue is not empty.
>>
>> So somehow it seems that the queue are empty but that the current display
>> of the queue don't reflect it (it is very likely that some data were lost
>> during the upgrade procedure as we had to reboot a few node to change the
>> heap property)
>>
>> What will be the best method to restore a proper state and be able to
>> edit the flow file again?
>>
>> Thank you!
>>
>> Arnaud
>>
>>
>>
>

Re: Queue incoherent state

Posted by Matt Gilman <ma...@gmail.com>.

Sorry for the delayed response. Similar behavior has been reported by some
other users [1]. Does the connection have any back pressure threshold
configured? Can new flowfiles be enqueued? Do the expiration settings have
any affect?

Lastly, if you restart the cluster does it claim the connection still has
flowfiles enqueued?

Matt

[1] https://issues.apache.org/jira/browse/NIFI-3897

On Fri, May 12, 2017 at 5:47 AM, Arnaud G <gr...@gmail.com> wrote:

> Hi again!
>
> I currently have  another issue with incoherent queue status.
>
> Following the upgrade to 1.2 of a cluster, I have a couple of queues that
> display through the GUI a high number of flowfiles.
>
> As the queue were no emptying despite tuning, I tried to list the content
> of the queue. This action returns that the queue contains no flowfile,
> which is not the expected as the GUI displays another value.
>
> If I try to empty the queue, I receive a message: 0 FlowFiles (0 bytes)
> out of 210'000 (92.71MB) were removed from the queue.
>
> And of course I cannot delete the queue as this action reports me that the
> queue is not empty.
>
> So somehow it seems that the queue are empty but that the current display
> of the queue don't reflect it (it is very likely that some data were lost
> during the upgrade procedure as we had to reboot a few node to change the
> heap property)
>
> What will be the best method to restore a proper state and be able to edit
> the flow file again?
>
> Thank you!
>
> Arnaud
>
>
>