You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by ndipiazza3565 <ni...@lucidworks.com> on 2018/09/12 22:31:09 UTC

Re: Failed to wait for initial partition map exchange

I'm trying to build up a list of possible causes for this issue.

I'm only really interested in the issues that occur after successful
production deployments. Meaning the environment has been up for some time
successfully, but then later on our ignite nodes will not start and stick 

But as of now, a certain bad behavior from a single node in the ignite
cluster can cause a deadlock 

* Anything that causes one of the ignite nodes to become unresponsive 
  * oom
  * high gc
  * high cpu
  * high disk usage
* Network issues?

I'm trying to get a list of the causes for this issue so I can troubleshoot
further. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Failed to wait for initial partition map exchange

Posted by ndipiazza3565 <ni...@lucidworks.com>.

No. Persistence is disabled in my case. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Failed to wait for initial partition map exchange

Posted by eugene miretsky <eu...@gmail.com>.

Do you have persistence enabled?

On Wed, Sep 12, 2018 at 6:31 PM ndipiazza3565 <
nicholas.dipiazza@lucidworks.com> wrote:

> I'm trying to build up a list of possible causes for this issue.
>
> I'm only really interested in the issues that occur after successful
> production deployments. Meaning the environment has been up for some time
> successfully, but then later on our ignite nodes will not start and stick
>
> But as of now, a certain bad behavior from a single node in the ignite
> cluster can cause a deadlock
>
> * Anything that causes one of the ignite nodes to become unresponsive
>   * oom
>   * high gc
>   * high cpu
>   * high disk usage
> * Network issues?
>
> I'm trying to get a list of the causes for this issue so I can troubleshoot
> further.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Failed to wait for initial partition map exchange

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Regarding PME problems.
OOM will cause this. High GC could cause this under some circumstances.
High CPU or Disk usage should not cause this. Network inavailability (such
as closed communication port) could also cause it.

But the prime cause is programming errors. Either those are errors on
Apache Ignite side (caused by some strange circumstances since all normal
cases should be normally tested), or they are in your code.

Such as deadlocks. If you have deadlocks in your code exposed to Apache
Ignite, or you are manage to lock up Apache Ignite in other ways
(listeners, invokes and continuous queries are notorious for that, since
there are limitations on operations you can use from within them), you can
catch infinite PME very easily.

However, it's hard to say without reviewing logs and thread dumps./

Regards,
-- 
Ilya Kasnacheev


чт, 13 сент. 2018 г. в 1:31, ndipiazza3565 <nicholas.dipiazza@lucidworks.com
>:

> I'm trying to build up a list of possible causes for this issue.
>
> I'm only really interested in the issues that occur after successful
> production deployments. Meaning the environment has been up for some time
> successfully, but then later on our ignite nodes will not start and stick
>
> But as of now, a certain bad behavior from a single node in the ignite
> cluster can cause a deadlock
>
> * Anything that causes one of the ignite nodes to become unresponsive
>   * oom
>   * high gc
>   * high cpu
>   * high disk usage
> * Network issues?
>
> I'm trying to get a list of the causes for this issue so I can troubleshoot
> further.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>