You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Philip Wu <pw...@enfusionsystems.com> on 2019/03/19 13:34:26 UTC

Ignite 2.7 Errors

Hi, recently we upgraded Ignite from 2.5 to 2.7, got the following error.

Is this configurational, or known bug in 2.7?


2019-03-18 15:44:23,383 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=grid-nio-worker-tcp-comm-9,
igniteInstanceName=XXXGrid, finished=false, heartbeatTs=1552941767243]]]:
class org.apache.ignite.IgniteException: GridWorker
[name=grid-nio-worker-tcp-comm-9, igniteInstanceName=XXXGrid,
finished=false, heartbeatTs=1552941767243]



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite 2.7 Errors

Posted by Andrey Kuznetsov <st...@gmail.com>.
Philip, if you can bear with so huge JVM pauses, then there is no use to
investigate stacktraces anymore. Just increase systemWorkerBlockedTimeout
parameter of IgniteConfiguration appropriately, as described in
https://apacheignite.readme.io/docs/critical-failures-handling#section-critical-workers-health-check
and Ignite 2.7 won't report these failures.

чт, 21 мар. 2019 г. в 20:42, Philip Wu <pw...@enfusionsystems.com>:

> hello, Andrey -
>
> for your 2nd question, in Ignite 2.5, we have 15 mins + JVM paused as well,
> but no IgniteException, was working fine.
>
> 2019-03-15 19:08:46,088 WARNING [ (jvm-pause-detector-worker)] Possible too
> long JVM pause: 1001113 milliseconds.
>
> 2019-03-15 19:08:46,280 INFO  [IgniteKernal%XXXGrid
> (grid-timeout-worker-#71%XXXGrid%)]
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>     ^-- Node [id=1dc0de55, name=EnfusionGrid, uptime=01:49:39.992]
>     ^-- H/N/C [hosts=1, nodes=1, CPUs=32]
>     ^-- CPU [cur=100%, avg=39.77%, GC=1042.83%]
>     ^-- PageMemory [pages=2300496]
>     ^-- Heap [used=295123MB, free=14.48%, comm=345088MB]
>     ^-- Non heap [used=442MB, free=-1%, comm=463MB]
>     ^-- Outbound messages queue [size=0]
>     ^-- Public thread pool [active=0, idle=0, qSize=0]
>     ^-- System thread pool [active=0, idle=1, qSize=0]
> 2019-03-15 19:08:46,280 INFO  [IgniteKernal%XXXGrid
> (grid-timeout-worker-#71%XXXGrid%)] FreeList [name=XXXGrid, buckets=256,
> dataPages=1, reusePages=0]
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


-- 
Best regards,
  Andrey Kuznetsov.

Re: Ignite 2.7 Errors

Posted by Philip Wu <pw...@enfusionsystems.com>.
hello, Andrey - 

for your 2nd question, in Ignite 2.5, we have 15 mins + JVM paused as well,
but no IgniteException, was working fine.

2019-03-15 19:08:46,088 WARNING [ (jvm-pause-detector-worker)] Possible too
long JVM pause: 1001113 milliseconds.

2019-03-15 19:08:46,280 INFO  [IgniteKernal%XXXGrid
(grid-timeout-worker-#71%XXXGrid%)]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=1dc0de55, name=EnfusionGrid, uptime=01:49:39.992]
    ^-- H/N/C [hosts=1, nodes=1, CPUs=32]
    ^-- CPU [cur=100%, avg=39.77%, GC=1042.83%]
    ^-- PageMemory [pages=2300496]
    ^-- Heap [used=295123MB, free=14.48%, comm=345088MB]
    ^-- Non heap [used=442MB, free=-1%, comm=463MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=1, qSize=0]
2019-03-15 19:08:46,280 INFO  [IgniteKernal%XXXGrid
(grid-timeout-worker-#71%XXXGrid%)] FreeList [name=XXXGrid, buckets=256,
dataPages=1, reusePages=0]





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite 2.7 Errors

Posted by Philip Wu <pw...@enfusionsystems.com>.
Hello, Andrey - 

for your 1st question, I do have a straceTrace in other
*FailureProcessor*.log

Thread [name="tcp-disco-msg-worker-#2%XXXGrid%", id=445, state=RUNNABLE,
blockCnt=0, waitCnt=320611]
        at sun.management.ThreadImpl.dumpThreads0(Native Method)
        at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:454)
        at o.a.i.i.util.IgniteUtils.dumpThreads(IgniteUtils.java:1364)
        at
o.a.i.i.processors.failure.FailureProcessor.process(FailureProcessor.java:128)
        - locked o.a.i.i.processors.failure.FailureProcessor@6d7b017b
        at
o.a.i.i.processors.failure.FailureProcessor.process(FailureProcessor.java:104)
        at
o.a.i.i.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1829)
        at
o.a.i.i.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
        at o.a.i.i.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
        at o.a.i.i.util.worker.GridWorker.onIdle(GridWorker.java:297)
        at
o.a.i.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
        at
o.a.i.spi.discovery.tcp.ServerImpl$RingMessageWorker$$Lambda$211/2104222253.run(Unknown
Source)
        at
o.a.i.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
        at
o.a.i.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
        at
o.a.i.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
        at o.a.i.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite 2.7 Errors

Posted by Andrey Kuznetsov <st...@gmail.com>.
Sorry, my mistake. I meant the last message you provided, but it doesn't
contain stacktrace, only brief thread information.
Anyway, 15 minutes long JVM pauses are suspicious. Do you have the same
message on 2.5 or 2.6?

Best regards,
Andrey Kuznetsov.

чт, 21 марта 2019, 19:49 Philip Wu pwu@enfusionsystems.com:

> before that , there was:
>
> 2019-03-20 22:28:45,028 WARNING [G (tcp-disco-msg-worker-#2%XXXGrid%)]
> Thread [name="grid-nio-worker-tcp-comm-1-#73%XXXGrid%", id=415,
> state=RUNNABLE, blockCnt=0, waitCnt=0]
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Ignite 2.7 Errors

Posted by Philip Wu <pw...@enfusionsystems.com>.
before that , there was:

2019-03-20 22:28:45,028 WARNING [G (tcp-disco-msg-worker-#2%XXXGrid%)]
Thread [name="grid-nio-worker-tcp-comm-1-#73%XXXGrid%", id=415,
state=RUNNABLE, blockCnt=0, waitCnt=0]




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite 2.7 Errors

Posted by Philip Wu <pw...@enfusionsystems.com>.
Hello, Andrey - actually this is the sequence of events in time order:

2019-03-20 22:28:44,999 WARNING [IgniteKernal%XXXGrid
(jvm-pause-detector-worker)] Possible too long JVM pause: 928937
milliseconds

2019-03-20 22:28:45,014 SEVERE [G (tcp-disco-msg-worker-#2%XXXGrid%)]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=grid-nio-worker-tcp-comm-1,
blockedFor=928s]

019-03-20 22:28:45,021 WARN  [FailoverTransport (ActiveMQ Transport:
Transport ) failed , attempting to automatically reconnect:
java.io.EOFException

2019-03-20 22:28:45,021 ERROR
[ActiveMQDelegate=>MasterServiceGlobalConnection (ActiveMQ Transport: 
transport Interrupted
2019-03-20 22:28:45,023 WARN  [FailoverTransport (ActiveMQ Transport: 
Transport () failed , attempting to automatically reconnect:
java.io.EOFException

2019-03-20 22:28:45,028 WARNING [G (tcp-disco-msg-worker-#2%EnfusionGrid%)]
Thread [name="grid-nio-worker-tcp-comm-1-#73%EnfusionGrid%", id=415,
state=RUNNABLE, blockCnt=0, waitCnt=0]

2019-03-20 22:28:45,028 WARN  [FailoverTransport (ActiveMQ Transport: )]
Transport  failed , attempting to automatically reconnect:
java.io.EOFException
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] Exception in thread "ActiveMQ InactivityMonitor
WriteCheckTimer" java.lang.NullPointerException
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)]     at
org.apache.activemq.transport.AbstractInactivityMonitor.writeCheck(AbstractInactivityMonitor.java:219)
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)]     at
org.apache.activemq.transport.AbstractInactivityMonitor$3.run(AbstractInactivityMonitor.java:153)
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)]     at
org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)]     at java.util.TimerThread.mainLoop(Timer.java:555)
2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)]     at java.util.TimerThread.run(Timer.java:505)



2019-03-20 22:28:45,044 ERROR
[ActiveMQDelegate=>MasterServiceGlobalConnection (ActiveMQ Transport: ]
transport Interrupted

2019-03-20 22:28:45,044 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
finished=false, heartbeatTs=1553137996031]]]: class
org.apache.ignite.IgniteException: GridWorker
[name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
finished=false, heartbeatTs=1553137996031]
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
        at
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
        at
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
        at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)



2019-03-20 22:28:45,052 WARNING [FailureProcessor
(tcp-disco-msg-worker-#2%XXXGrid%)] No deadlocked threads detected.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite 2.7 Errors

Posted by Philip Wu <pw...@enfusionsystems.com>.
Hello, Andrey -

I see this:

2019-03-20 22:28:45,052 WARNING [FailureProcessor
(tcp-disco-msg-worker-#2%XXXGrid%)] No deadlocked threads detected.

is that what you mean?

--- 

Also, prior to crash, I see this:
not sure if it is related.

2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)] Exception in thread "ActiveMQ InactivityMonitor
WriteCheckTimer" java.lang.NullPointerException
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)]     at
org.apache.activemq.transport.AbstractInactivityMonitor.writeCheck(AbstractInactivityMonitor.java:219)
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)]     at
org.apache.activemq.transport.AbstractInactivityMonitor$3.run(AbstractInactivityMonitor.java:153)
2019-03-20 22:28:45,029 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)]     at
org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:33)
2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)]     at java.util.TimerThread.mainLoop(Timer.java:555)
2019-03-20 22:28:45,030 ERROR [stderr (ActiveMQ InactivityMonitor
WriteCheckTimer)]     at java.util.TimerThread.run(Timer.java:505)





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite 2.7 Errors

Posted by Andrey Kuznetsov <st...@gmail.com>.
Hi, Philip!

There should be a stacktrace of the blocked worker itself in the log, with
warn level, before the message you cite, but after "Blocked system-critical
thread has been detected." Could you please share that trace? It can help
to understand failure cause.

Best regards,
Andrey Kuznetsov.

чт, 21 марта 2019, 18:34 Ilya Kasnacheev ilya.kasnacheev@gmail.com:

> Hello!
>
> With NoOp handler this should be a purely cosmetic message.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 21 мар. 2019 г. в 18:15, Philip Wu <pw...@enfusionsystems.com>:
>
>> Thanks, llya!
>>
>> Actually it happened in PROD system again last night ... even with
>> NoOpFailureHandler.
>>
>> I am rolling back to Ignite 2.5 or 2.6 for now. Thanks!
>>
>> 2019-03-20 22:28:45,044 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)]
>> Critical system error detected. Will be handled accordingly to configured
>> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
>> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
>> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
>> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
>> finished=false, heartbeatTs=1553137996031]]]: class
>> org.apache.ignite.IgniteException: GridWorker
>> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
>> finished=false, heartbeatTs=1553137996031]
>>         at
>>
>> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>>         at
>>
>> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>>         at
>>
>> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
>>         at
>>
>> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
>>         at
>>
>> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
>>         at
>>
>> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
>>         at
>>
>> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
>>         at
>> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>>         at
>>
>> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
>>         at
>> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>

Re: Ignite 2.7 Errors

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

With NoOp handler this should be a purely cosmetic message.

Regards,
-- 
Ilya Kasnacheev


чт, 21 мар. 2019 г. в 18:15, Philip Wu <pw...@enfusionsystems.com>:

> Thanks, llya!
>
> Actually it happened in PROD system again last night ... even with
> NoOpFailureHandler.
>
> I am rolling back to Ignite 2.5 or 2.6 for now. Thanks!
>
> 2019-03-20 22:28:45,044 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
> finished=false, heartbeatTs=1553137996031]]]: class
> org.apache.ignite.IgniteException: GridWorker
> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
> finished=false, heartbeatTs=1553137996031]
>         at
>
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>         at
>
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>         at
>
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
>         at
>
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
>         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
>         at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Ignite 2.7 Errors

Posted by Philip Wu <pw...@enfusionsystems.com>.
Thanks, llya!

Actually it happened in PROD system again last night ... even with
NoOpFailureHandler.

I am rolling back to Ignite 2.5 or 2.6 for now. Thanks!

2019-03-20 22:28:45,044 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
finished=false, heartbeatTs=1553137996031]]]: class
org.apache.ignite.IgniteException: GridWorker
[name=grid-nio-worker-tcp-comm-1, igniteInstanceName=XXXGrid,
finished=false, heartbeatTs=1553137996031]
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
        at
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
        at
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
        at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite 2.7 Errors

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

We are tweaking this mechanism so maybe you will wish to reenable it in 2.8.

Regards,
-- 
Ilya Kasnacheev


ср, 20 мар. 2019 г. в 18:30, Philip Wu <pw...@enfusionsystems.com>:

> Thank you, IIya!
>
> We ended up using
>
> cfg.setFailureHandler(new NoOpFailureHandler());
>
> it silenced the errors and no more stack dumps, etc. and it seems to work
> like in 2.5 and 2.6, with no other changes.
>
> I am still curious if in the future I can take that line out if 2.7 is more
> stable or 2.8.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Ignite 2.7 Errors

Posted by Philip Wu <pw...@enfusionsystems.com>.
Thank you, IIya!

We ended up using 

cfg.setFailureHandler(new NoOpFailureHandler());

it silenced the errors and no more stack dumps, etc. and it seems to work
like in 2.5 and 2.6, with no other changes.

I am still curious if in the future I can take that line out if 2.7 is more
stable or 2.8.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite 2.7 Errors

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

This is a new features and one which often misfires, see
https://apacheignite.readme.io/docs/critical-failures-handling

Other than what is described, increasing failureDetectionTimeout often
helps.

Regards,
-- 
Ilya Kasnacheev


вт, 19 мар. 2019 г. в 16:34, Philip Wu <pw...@enfusionsystems.com>:

> Hi, recently we upgraded Ignite from 2.5 to 2.7, got the following error.
>
> Is this configurational, or known bug in 2.7?
>
>
> 2019-03-18 15:44:23,383 SEVERE [ (tcp-disco-msg-worker-#2%XXXGrid%)]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=grid-nio-worker-tcp-comm-9,
> igniteInstanceName=XXXGrid, finished=false, heartbeatTs=1552941767243]]]:
> class org.apache.ignite.IgniteException: GridWorker
> [name=grid-nio-worker-tcp-comm-9, igniteInstanceName=XXXGrid,
> finished=false, heartbeatTs=1552941767243]
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>