You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/04/04 18:29:00 UTC
[jira] [Commented] (NIFI-9835) When node offloads, can get stuck, logging errors about a negative queue size

    [ https://issues.apache.org/jira/browse/NIFI-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516997#comment-17516997 ] 

ASF subversion and git services commented on NIFI-9835:
-------------------------------------------------------

Commit 854c419635f37dffc2c56f74b41fab259e06bd6c in nifi's branch refs/heads/support/nifi-1.16 from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=854c419635 ]

NIFI-9835: Fixed threading bug in which NioAsyncLoadBalanceClient calls LoadBalanceSession.isComplete() followed by LoadBalanceSession.isCanceled() but it's possible for the complete flag to change before the canceled flag (they are not updated atomically). So changed to use a single LoadBalanceSessionState enum that represents the state. Also made the private StandardProcessSession.commit(boolean) method synchronized. When a processor is terminated (as is the case in Offload), we roll back sessions and both the commit() and rollback() need to be synchronized. Only the public commit() method was synchronized, and now with commitAsync() happening, we had the ability to commit without any synchronization. This addresses that concern. Also fixed a typo in docs for MergeRecord.

This closes #5902

Signed-off-by: David Handermann <ex...@apache.org>


> When node offloads, can get stuck, logging errors about a negative queue size
> -----------------------------------------------------------------------------
>
>                 Key: NIFI-9835
>                 URL: https://issues.apache.org/jira/browse/NIFI-9835
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 1.17.0, 1.16.1
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a node is offloaded, we can occasionally see that it does not complete the offload and logs errors about creating a negative queue size. For example:
> {code:java}
> 2022-03-24 19:43:53,919 ERROR [Load-Balanced Client Thread-1[] o.a.n.c.queue.SwappablePriorityQueue Updated Size of Queue Unacknowledged from FlowFile Queue Size[ ActiveQueue=[0, 0 Bytes[], Swap Queue=[0, 0 Bytes[], Swap Files=[0[], Unacknowledged=[-7, -5820 Bytes[] ] to FlowFile Queue Size[ ActiveQueue=[0, 0 Bytes[], Swap Queue=[0, 0 Bytes[], Swap Files=[0[], Unacknowledged=[-7, -5820 Bytes[] ]
> java.lang.RuntimeException: Cannot create negative queue size
>     at org.apache.nifi.controller.queue.SwappablePriorityQueue.logIfNegative(SwappablePriorityQueue.java:1055)
>     at org.apache.nifi.controller.queue.SwappablePriorityQueue.incrementUnacknowledgedQueueSize(SwappablePriorityQueue.java:1045)
>     at org.apache.nifi.controller.queue.SwappablePriorityQueue.acknowledge(SwappablePriorityQueue.java:451)
>     at org.apache.nifi.controller.queue.clustered.partition.RemoteQueuePartition$2.onTransactionComplete(RemoteQueuePartition.java:223)
>     at org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient.communicate(NioAsyncLoadBalanceClient.java:281)
>     at org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClientTask.run(NioAsyncLoadBalanceClientTask.java:81)
>     at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)