You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by KR Kumar <kr...@gmail.com> on 2019/10/22 14:57:07 UTC

Error while adding the node the baseline topology

Hi guys - I am running into the following issue when trying to add a node
to the baseline topology? Its happening only after we had upgraded from 2.3
to 2.75. Any pointers would be appreciated.

2019-10-22 10:31:42,441][WARN ][data-streamer-stripe-3-#52][PageMemoryImpl]
Parking thread=data-streamer-stripe-3-#52 for timeout
(ms)=771038
[2019-10-22 10:31:45,635][ERROR][tcp-disco-msg-worker-#2][G] Blocked
system-critical thread has been detected. This can lead to cluster-wide
undefined behaviour [threadName=data-streamer-stripe-30, blockedFor=95s]
[2019-10-22 10:31:45,635][WARN ][tcp-disco-msg-worker-#2][G] Thread
[name="data-streamer-stripe-30-#79", id=110, state=TIMED_WAITING,
blockCnt=0, waitCnt=36470]

[2019-10-22 10:31:45,637][ERROR][tcp-disco-msg-worker-#2][root] Critical
system error detected. Will be handled accordingly to configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=data-streamer-stripe-30, igniteInstanceName=null, finished=false,
heartbeatTs=1571754609956]]]
class org.apache.ignite.IgniteException: GridWorker
[name=data-streamer-stripe-30, igniteInstanceName=null, finished=false,
heartbeatTs=1571754609956]

Thanx and Regards,
KR Kumar

Re: Error while adding the node the baseline topology

Posted by Stanislav Lukyanov <st...@gmail.com>.
This message actually looks worrisome:
    2019-10-22 10:31:42,441][WARN
][data-streamer-stripe-3-#52][PageMemoryImpl] Parking
thread=data-streamer-stripe-3-#52 for timeout (ms)=771038

It means that Ignite's throttling algorithm has decided to put a thread to
sleep for 771 seconds.

Can you share your persistence configuration (DataStorageConfiguration or
PersistenceStorageConfiguration).

Thanks,
Stan

On Thu, Oct 31, 2019 at 2:39 AM Denis Magda <dm...@apache.org> wrote:

> Have you tried to turn of the failure handling following  the previously
> shared documentation page? It looks like some timeouts need to be tuned.
>
> Denis
>
> On Friday, October 25, 2019, krkumar24061975@gmail.com <
> krkumar24061975@gmail.com> wrote:
>
>> Hi - The application is doing two things, one thread is writing 2kb size
>> events to the ignite cache as a key value and other thread is executing
>> ignite SQLs thru ignite jdbc connections. The throughput is anything
>> between
>> 25K to 40K events per second on the cache size. We are using data streamer
>> for writing the key value cache. The cluster has 4 nodes with 198GB ram
>> and
>> 48 cores.
>>
>> We got a similar error again and here is the error description:
>>
>> [2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][G] Blocked
>> system-critical thread has been detected. This can lead to cluster-wide
>> undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=2032s]
>> [2019-10-25 10:16:45,399][WARN ][disco-event-worker-#142][G] Thread
>> [name="data-streamer-stripe-0-#49", id=80, state=WAITING, blockCnt=7,
>> waitCnt=5352642]
>>
>> [2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][root] Critical
>> system error detected. Will be handled accordingly to configured handler
>> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
>> super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
>> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
>> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
>> [name=data-streamer-stripe-0, igniteInstanceName=null, finished=false,
>> heartbeatTs=1572010973019]]]
>>
>> Thanx and Regards,
>> KR Kumar
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>
>
> --
> -
> Denis
>
>

Re: Error while adding the node the baseline topology

Posted by Denis Magda <dm...@apache.org>.
Have you tried to turn of the failure handling following  the previously
shared documentation page? It looks like some timeouts need to be tuned.

Denis

On Friday, October 25, 2019, krkumar24061975@gmail.com <
krkumar24061975@gmail.com> wrote:

> Hi - The application is doing two things, one thread is writing 2kb size
> events to the ignite cache as a key value and other thread is executing
> ignite SQLs thru ignite jdbc connections. The throughput is anything
> between
> 25K to 40K events per second on the cache size. We are using data streamer
> for writing the key value cache. The cluster has 4 nodes with 198GB ram and
> 48 cores.
>
> We got a similar error again and here is the error description:
>
> [2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][G] Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=2032s]
> [2019-10-25 10:16:45,399][WARN ][disco-event-worker-#142][G] Thread
> [name="data-streamer-stripe-0-#49", id=80, state=WAITING, blockCnt=7,
> waitCnt=5352642]
>
> [2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][root] Critical
> system error detected. Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
> [name=data-streamer-stripe-0, igniteInstanceName=null, finished=false,
> heartbeatTs=1572010973019]]]
>
> Thanx and Regards,
> KR Kumar
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


-- 
-
Denis

Re: Error while adding the node the baseline topology

Posted by "krkumar24061975@gmail.com" <kr...@gmail.com>.
Hi - The application is doing two things, one thread is writing 2kb size
events to the ignite cache as a key value and other thread is executing
ignite SQLs thru ignite jdbc connections. The throughput is anything between
25K to 40K events per second on the cache size. We are using data streamer
for writing the key value cache. The cluster has 4 nodes with 198GB ram and
48 cores.

We got a similar error again and here is the error description:

[2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][G] Blocked
system-critical thread has been detected. This can lead to cluster-wide
undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=2032s]
[2019-10-25 10:16:45,399][WARN ][disco-event-worker-#142][G] Thread
[name="data-streamer-stripe-0-#49", id=80, state=WAITING, blockCnt=7,
waitCnt=5352642]

[2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][root] Critical
system error detected. Will be handled accordingly to configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=data-streamer-stripe-0, igniteInstanceName=null, finished=false,
heartbeatTs=1572010973019]]]

Thanx and Regards,
KR Kumar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Error while adding the node the baseline topology

Posted by Denis Magda <dm...@apache.org>.
Hi,

What is the application doing while you are changing the topology? Is the
cluster under the load?

Generally, we've added critical failure handlers in the latest version of
Ignite and the message reported is printed out by them:
https://apacheignite.readme.io/docs/critical-failures-handling

-
Denis


On Tue, Oct 22, 2019 at 7:57 AM KR Kumar <kr...@gmail.com> wrote:

> Hi guys - I am running into the following issue when trying to add a node
> to the baseline topology? Its happening only after we had upgraded from 2.3
> to 2.75. Any pointers would be appreciated.
>
> 2019-10-22 10:31:42,441][WARN
> ][data-streamer-stripe-3-#52][PageMemoryImpl] Parking
> thread=data-streamer-stripe-3-#52 for timeout
> (ms)=771038
> [2019-10-22 10:31:45,635][ERROR][tcp-disco-msg-worker-#2][G] Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [threadName=data-streamer-stripe-30, blockedFor=95s]
> [2019-10-22 10:31:45,635][WARN ][tcp-disco-msg-worker-#2][G] Thread
> [name="data-streamer-stripe-30-#79", id=110, state=TIMED_WAITING,
> blockCnt=0, waitCnt=36470]
>
> [2019-10-22 10:31:45,637][ERROR][tcp-disco-msg-worker-#2][root] Critical
> system error detected. Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
> [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false,
> heartbeatTs=1571754609956]]]
> class org.apache.ignite.IgniteException: GridWorker
> [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false,
> heartbeatTs=1571754609956]
>
> Thanx and Regards,
> KR Kumar
>