You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by "luongbd.hust" <lu...@gmail.com> on 2019/03/25 03:24:52 UTC

Triggering Rebalancing Programmatically get error while requesting

logs.rar <http://apache-ignite-users.70518.x6.nabble.com/file/t2354/logs.rar> 
Hi all,

I am trying to install a life cycle to automatically set up baseline
topology.
I registered the event and wrote the code as instructed in the link below
https://apacheignite.readme.io/docs/baseline-topology
<https://apacheignite.readme.io/docs/baseline-topology>  

*I use testcase as follows*
- Continually make requests to write data to the cache
- Turn on the nodes in the ipfinder

When the number of nodes increases from 2 to 3, the following error is
received in the console

/[09:39:03,020][SEVERE][tcp-disco-msg-worker-#2%TravelInventoryTesting%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=grid-timeout-worker,
blockedFor=36s]
[09:39:03,020][SEVERE][tcp-disco-msg-worker-#2%TravelInventoryTesting%][]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=grid-timeout-worker,
igniteInstanceName=TravelInventoryTesting, finished=false,
heartbeatTs=1553481506244]]]
class org.apache.ignite.IgniteException: GridWorker
[name=grid-timeout-worker, igniteInstanceName=TravelInventoryTesting,
finished=false, heartbeatTs=1553481506244]
	at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
	at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
	at
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
	at
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
	at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
	at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
	at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
	at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
	at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
	at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)/

I have attached the logs of the nodes.

*Thanks and best regards*





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Triggering Rebalancing Programmatically get error while requesting

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Can you please re-run this case with "Critical Failures Handling" disabled,
let it hang for some time, and then share logs of this run?

In this case it is reacting to timeout and not error, so maybe there's no
error in the first place. I can see waiting on partition release future,
but to understand its implications I need to see more logs.

Regards,
-- 
Ilya Kasnacheev


пн, 25 мар. 2019 г. в 12:17, luongbd.hust <lu...@gmail.com>:

> Thank Ilya Kasnacheev
> I tried the way as you instructed.
> But everything remains unchanged.
> Cluster still does not meet the requests from clients.
> And I am understanding that "Critical Failures Handling" cannot change
> errors that occur.
> *Thank and best regards*
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Triggering Rebalancing Programmatically get error while requesting

Posted by "luongbd.hust" <lu...@gmail.com>.

Thank Ilya Kasnacheev
I tried the way as you instructed.
But everything remains unchanged.
Cluster still does not meet the requests from clients.
And I am understanding that "Critical Failures Handling" cannot change
errors that occur.
*Thank and best regards*




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Triggering Rebalancing Programmatically get error while requesting

Posted by "luongbd.hust" <lu...@gmail.com>.

Thanks ilya.
I'm currently switching to another task.
I am trying to come back to this issue soon.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Triggering Rebalancing Programmatically get error while requesting

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Unfortunately, it is hard to say what is going on without thread dumps. Can
you collect those using `jstack` utility?

I suspect you have some kind of deadlock.

There are suspicious things in your logs, but it's not completely clear
what happens here.

Regards,
-- 
Ilya Kasnacheev


ср, 27 мар. 2019 г. в 10:43, luongbd.hust <lu...@gmail.com>:

> Yes
> I spent a lot of time trying to understand the cause of the error.
> Including my company's time working so I don't want to waste it without
> solving the problem.
> So I decided to ask the community for help.
> Because of my own ability, it is difficult to understand an open source
> project like this.
> I only understand the level of application for the product.
> Sorry for the trouble.
> I still hope someone can help me solve this problem.
> Currently I have no way to solve this problem
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Triggering Rebalancing Programmatically get error while requesting

Posted by "luongbd.hust" <lu...@gmail.com>.

Yes
I spent a lot of time trying to understand the cause of the error.
Including my company's time working so I don't want to waste it without
solving the problem.
So I decided to ask the community for help.
Because of my own ability, it is difficult to understand an open source
project like this.
I only understand the level of application for the product.
Sorry for the trouble.
I still hope someone can help me solve this problem.
Currently I have no way to solve this problem



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Triggering Rebalancing Programmatically get error while requesting

Posted by Yakov Zhdanov <yz...@apache.org>.

Ilya, have you had a chance to look into threaddumps?

--Yakov


ср, 27 мар. 2019 г. в 06:18, luongbd.hust <lu...@gmail.com>:

> Thank you for your enthusiasm
>
> I attached the logs for a longer time after the error occurred.
>
> logs.rar
> <http://apache-ignite-users.70518.x6.nabble.com/file/t2354/logs.rar>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Triggering Rebalancing Programmatically get error while requesting

Posted by "luongbd.hust" <lu...@gmail.com>.

Thank you for your enthusiasm

I attached the logs for a longer time after the error occurred.

logs.rar
<http://apache-ignite-users.70518.x6.nabble.com/file/t2354/logs.rar>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Triggering Rebalancing Programmatically get error while requesting

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Can you please collect thread dumps from all nodes (after waiting around a
minute once the error appears)?

Regards,
-- 
Ilya Kasnacheev


вт, 26 мар. 2019 г. в 05:34, luongbd.hust <lu...@gmail.com>:

> hi Ilya,
>
> I tried to follow the way you instructed.
> But nothing has changed.
> I have attached a log and configuration when testing.
>
> disable-fail-handling.rar
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2354/disable-fail-handling.rar>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Triggering Rebalancing Programmatically get error while requesting

Posted by "luongbd.hust" <lu...@gmail.com>.

hi Ilya,

I tried to follow the way you instructed.
But nothing has changed.
I have attached a log and configuration when testing.

disable-fail-handling.rar
<http://apache-ignite-users.70518.x6.nabble.com/file/t2354/disable-fail-handling.rar>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Triggering Rebalancing Programmatically get error while requesting

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Have you tried disabling failure detection, see if errors goes away?

Regards,
-- 
Ilya Kasnacheev


пн, 25 мар. 2019 г. в 06:25, luongbd.hust <lu...@gmail.com>:

> logs.rar <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2354/logs.rar>
> Hi all,
>
> I am trying to install a life cycle to automatically set up baseline
> topology.
> I registered the event and wrote the code as instructed in the link below
> https://apacheignite.readme.io/docs/baseline-topology
> <https://apacheignite.readme.io/docs/baseline-topology>
>
> *I use testcase as follows*
> - Continually make requests to write data to the cache
> - Turn on the nodes in the ipfinder
>
> When the number of nodes increases from 2 to 3, the following error is
> received in the console
>
> /[09:39:03,020][SEVERE][tcp-disco-msg-worker-#2%TravelInventoryTesting%][G]
> Blocked system-critical thread has been detected. This can lead to
> cluster-wide undefined behaviour [threadName=grid-timeout-worker,
> blockedFor=36s]
> [09:39:03,020][SEVERE][tcp-disco-msg-worker-#2%TravelInventoryTesting%][]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=grid-timeout-worker,
> igniteInstanceName=TravelInventoryTesting, finished=false,
> heartbeatTs=1553481506244]]]
> class org.apache.ignite.IgniteException: GridWorker
> [name=grid-timeout-worker, igniteInstanceName=TravelInventoryTesting,
> finished=false, heartbeatTs=1553481506244]
>         at
>
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
>         at
>
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
>         at
>
> org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
>         at
>
> org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
>         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
>         at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)/
>
> I have attached the logs of the nodes.
>
> *Thanks and best regards*
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>