You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Mikael <mi...@telia.com> on 2019/11/20 22:23:14 UTC

Streaming exception

Hi!

When I get timeout exceptions on the striping threads (like below) when 
streaming data, what is the best way around it ? should I increase the 
thread pool size, I would guess the reason is that the HD is not that 
fast and both WAL and storage is on the same drive (it's a persistent 
cache), but I  would like some kind of setup that does not have to be 
tuned all the time to work without exceptions even if persistent storage 
is not so fast, I do use:

<property name="writeThrottlingEnabled" value="true"/>

So the question is what to modify that would help best, more threads, 
bigger checkpointPageBufferSize (128MB on a 2GB data region) or 
something else ? 11 seconds is a long time so increasing timeouts does 
not sound like a good idea ?

[2019-11-20T21:36:05,471][ERROR][tcp-disco-msg-worker-#2][G] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=11s]
[2019-11-20T21:36:05,471][ERROR][tcp-disco-msg-worker-#2][] Critical 
system error detected. Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler 
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
[name=data-streamer-stripe-0, igniteInstanceName=null, finished=false, 
heartbeatTs=1574282154412]]]
org.apache.ignite.IgniteException: GridWorker 
[name=data-streamer-stripe-0, igniteInstanceName=null, finished=false, 
heartbeatTs=1574282154412]
     at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831) 
~[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) 
~[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) 
~[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
~[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663) 
~[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181) 
[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700) 
[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119) 
[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-2.7.6.jar:2.7.6]
[2019-11-20T21:36:05,810][ERROR][tcp-disco-msg-worker-#2][G] Blocked 
system-critical thread has been detected. This can lead to cluster-wide 
undefined behaviour [threadName=data-streamer-stripe-1, blockedFor=11s]
[2019-11-20T21:36:05,810][ERROR][tcp-disco-msg-worker-#2][] Critical 
system error detected. Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler 
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
[name=data-streamer-stripe-1, igniteInstanceName=null, finished=false, 
heartbeatTs=1574282154310]]]
org.apache.ignite.IgniteException: GridWorker 
[name=data-streamer-stripe-1, igniteInstanceName=null, finished=false, 
heartbeatTs=1574282154310]
     at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831) 
~[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) 
~[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) 
~[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) 
~[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663) 
~[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181) 
[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700) 
[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119) 
[ignite-core-2.7.6.jar:2.7.6]
     at 
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
[ignite-core-2.7.6.jar:2.7.6]

Mikael



Re: Streaming exception

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Increasing checkpoint page buffer is very useful, this is the approach I
recommend taking. Also, we recommend using SSD with Ignite and not HDDs.

Regards,
-- 
Ilya Kasnacheev


чт, 21 нояб. 2019 г. в 01:23, Mikael <mi...@telia.com>:

> Hi!
>
> When I get timeout exceptions on the striping threads (like below) when
> streaming data, what is the best way around it ? should I increase the
> thread pool size, I would guess the reason is that the HD is not that
> fast and both WAL and storage is on the same drive (it's a persistent
> cache), but I  would like some kind of setup that does not have to be
> tuned all the time to work without exceptions even if persistent storage
> is not so fast, I do use:
>
> <property name="writeThrottlingEnabled" value="true"/>
>
> So the question is what to modify that would help best, more threads,
> bigger checkpointPageBufferSize (128MB on a 2GB data region) or
> something else ? 11 seconds is a long time so increasing timeouts does
> not sound like a good idea ?
>
> [2019-11-20T21:36:05,471][ERROR][tcp-disco-msg-worker-#2][G] Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=11s]
> [2019-11-20T21:36:05,471][ERROR][tcp-disco-msg-worker-#2][] Critical
> system error detected. Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
> [name=data-streamer-stripe-0, igniteInstanceName=null, finished=false,
> heartbeatTs=1574282154412]]]
> org.apache.ignite.IgniteException: GridWorker
> [name=data-streamer-stripe-0, igniteInstanceName=null, finished=false,
> heartbeatTs=1574282154412]
>      at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>
> ~[ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>
> ~[ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
>
> ~[ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
>
> ~[ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
>
> ~[ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
>
> [ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
>
> [ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> [ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
>
> [ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> [ignite-core-2.7.6.jar:2.7.6]
> [2019-11-20T21:36:05,810][ERROR][tcp-disco-msg-worker-#2][G] Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [threadName=data-streamer-stripe-1, blockedFor=11s]
> [2019-11-20T21:36:05,810][ERROR][tcp-disco-msg-worker-#2][] Critical
> system error detected. Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
> [name=data-streamer-stripe-1, igniteInstanceName=null, finished=false,
> heartbeatTs=1574282154310]]]
> org.apache.ignite.IgniteException: GridWorker
> [name=data-streamer-stripe-1, igniteInstanceName=null, finished=false,
> heartbeatTs=1574282154310]
>      at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>
> ~[ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>
> ~[ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
>
> ~[ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
>
> ~[ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
>
> ~[ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
>
> [ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
>
> [ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> [ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
>
> [ignite-core-2.7.6.jar:2.7.6]
>      at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> [ignite-core-2.7.6.jar:2.7.6]
>
> Mikael
>
>
>