You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Ariel Tubaltsev <tu...@gmail.com> on 2018/09/28 21:02:58 UTC

Cluster is not responsive after node segmentation and reconciliation

Apache 2.4
A cluster of 3 in-memory nodes, REPLICATED, TRANSACTIONAL.

- All 3 nodes got segmented around the same time (Local node SEGMENTED)
- After reconciliation, all records are lost
- Cluster starts to accumulate transactions: (Pending transaction deadlock
detection futures)
At this point, clients requests won't be served any more

It can also go to the state when a  oom.rtf
<http://apache-ignite-users.70518.x6.nabble.com/file/t1588/oom.rtf>  node
can not join the grid: ERROR GridServiceProcessor:482 - Error when executing
service: null.

Also, clients may end up with JVM OOM (log attached).

Questions:
- is it a known issue? 
- Would be persistence help here?
- Any treatment for OOM?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster is not responsive after node segmentation and reconciliation

Posted by Ariel Tubaltsev <tu...@gmail.com>.
The file I've attached contains JVM logs. Let me post it below.
About eviction - the cluster didn't have too many records, less than 1000.
Do you still think eviction could be useful here?

Event: 0.728 Thread 0x00007fee4404b800 Implicit null exception at
0x00007fee353fef75 to 0x00007fee353ff335

Event: 0.728 Thread 0x00007fee4404b800 Implicit null exception at
0x00007fee353ca136 to 0x00007fee353ca4fd

Event: 0.993 Thread 0x00007fee4404b800 Exception   (0x0000000587b0aad0)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/interpreter

Event: 1.406 Thread 0x00007fee4404b800 Exception   (0x0000000583704f10)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/syst

Event: 1.406 Thread 0x00007fee4404b800 Exception   (0x00000005837062f8)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionary.cpp,
line 210]

Event: 1.406 Thread 0x00007fee4404b800 Exception   (0x00000005837077c0)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionary.cpp,
line 210]

Event: 1.408 Thread 0x00007fee4404b800 Exception   (0x000000058370f998)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/sy

Event: 1.446 Thread 0x00007fee4404b800 Exception   (0x0000000583b7ee60)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/shar

Event: 1.446 Thread 0x00007fee4404b800 Exception   (0x0000000583b81e98)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/sh

Event: 1.446 Thread 0x00007fee4404b800 Exception   (0x0000000583b84e20)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/

Event: 1.447 Thread 0x00007fee4404b800 Exception   (0x0000000583b8eaf8)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/sh

Event: 1.448 Thread 0x00007fee4404b800 Exception   (0x0000000583ba5a88)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/v

Event: 1.457 Thread 0x00007fee4404b800 Exception   (0x0000000583c06f50)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/syst

Event: 1.457 Thread 0x00007fee4404b800 Exception   (0x0000000583c09668)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionary.

Event: 1.457 Thread 0x00007fee4404b800 Exception   (0x0000000583c0bd20)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionar

Event: 1.458 Thread 0x00007fee4404b800 Exception   (0x0000000583c1bba8)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/sy

Event: 1.463 Thread 0x00007fee4404b800 Exception   (0x0000000583d12738)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDic

Event: 1.463 Thread 0x00007fee4404b800 Exception   (0x0000000583d23168)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionary.cpp,

Event: 1.898 Thread 0x00007fee4404b800 Exception  > (0x00000005880744b8)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/prims/jni.cpp,
line 1615]



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster is not responsive after node segmentation and reconciliation

Posted by Ariel Tubaltsev <tu...@gmail.com>.
ok, thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster is not responsive after node segmentation and reconciliation

Posted by "Maxim.Pudov" <pu...@gmail.com>.
You definitely need to increase heap size to prevent OOM errors.
./ignite.sh -J-Xmx4g
If it doesn't help, provide full logs from all nodes.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster is not responsive after node segmentation and reconciliation

Posted by Ariel Tubaltsev <tu...@gmail.com>.
Understood. Those logs are pretty lengthy, let me bring some interesting
pieces here.
I can separate this part, with communication errors and out of memory.
ignite-cluster-oom.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t1588/ignite-cluster-oom.log>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster is not responsive after node segmentation and reconciliation

Posted by Ariel Tubaltsev <tu...@gmail.com>.
Understood. Those logs are pretty lengthy, let me bring some interesting
pieces here.
I can separate this part, with communication errors and out of memory
ignite-cluster-oom.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t1588/ignite-cluster-oom.log>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster is not responsive after node segmentation and reconciliation

Posted by Ariel Tubaltsev <tu...@gmail.com>.
Another interesting piece - sudden ClassNotFoundException - it obviously
works in a steady state.
ignite-cluster-pairs.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t1588/ignite-cluster-pairs.log>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster is not responsive after node segmentation and reconciliation

Posted by "Maxim.Pudov" <pu...@gmail.com>.
Without ignite logs it's hard to say what's going on. This file is located at
$IGNITE_HOME/work/log



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster is not responsive after node segmentation and reconciliation

Posted by Ariel Tubaltsev <tu...@gmail.com>.
You are right, this log is JVM log. Let me post some messages below if it can
be of any help.
The cluster had less than 1000 records at the time of event. Do you still
think eviction can help there?

Event: 0.993 Thread 0x00007fee4404b800 Exception   (0x0000000587b0aad0)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/interpreter

Event: 1.406 Thread 0x00007fee4404b800 Exception   (0x0000000583704f10)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/syst

Event: 1.406 Thread 0x00007fee4404b800 Exception   (0x00000005837062f8)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionary.cpp,
line 210]

Event: 1.406 Thread 0x00007fee4404b800 Exception   (0x00000005837077c0)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionary.cpp,
line 210]

Event: 1.408 Thread 0x00007fee4404b800 Exception   (0x000000058370f998)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/sy

Event: 1.446 Thread 0x00007fee4404b800 Exception   (0x0000000583b7ee60)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/shar

Event: 1.446 Thread 0x00007fee4404b800 Exception   (0x0000000583b81e98)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/sh

Event: 1.446 Thread 0x00007fee4404b800 Exception   (0x0000000583b84e20)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster is not responsive after node segmentation and reconciliation

Posted by Ariel Tubaltsev <tu...@gmail.com>.
Yeah, this log is JVM log. Let me past it below if it could be of any help.
The cluster had less than 1000 records at the time of event.
Do you think eviction still can help?

Event: 0.728 Thread 0x00007fee4404b800 Implicit null exception at
0x00007fee353fef75 to 0x00007fee353ff335

Event: 0.728 Thread 0x00007fee4404b800 Implicit null exception at
0x00007fee353ca136 to 0x00007fee353ca4fd

Event: 0.993 Thread 0x00007fee4404b800 Exception   (0x0000000587b0aad0)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/interpreter

Event: 1.406 Thread 0x00007fee4404b800 Exception   (0x0000000583704f10)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/syst

Event: 1.406 Thread 0x00007fee4404b800 Exception   (0x00000005837062f8)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionary.cpp,
line 210]

Event: 1.406 Thread 0x00007fee4404b800 Exception   (0x00000005837077c0)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionary.cpp,
line 210]

Event: 1.408 Thread 0x00007fee4404b800 Exception   (0x000000058370f998)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/sy

Event: 1.446 Thread 0x00007fee4404b800 Exception   (0x0000000583b7ee60)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/shar

Event: 1.446 Thread 0x00007fee4404b800 Exception   (0x0000000583b81e98)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/sh

Event: 1.446 Thread 0x00007fee4404b800 Exception   (0x0000000583b84e20)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/

Event: 1.447 Thread 0x00007fee4404b800 Exception   (0x0000000583b8eaf8)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/sh

Event: 1.448 Thread 0x00007fee4404b800 Exception   (0x0000000583ba5a88)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/v

Event: 1.457 Thread 0x00007fee4404b800 Exception   (0x0000000583c06f50)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/syst

Event: 1.457 Thread 0x00007fee4404b800 Exception   (0x0000000583c09668)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionary.

Event: 1.457 Thread 0x00007fee4404b800 Exception   (0x0000000583c0bd20)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionar

Event: 1.458 Thread 0x00007fee4404b800 Exception   (0x0000000583c1bba8)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/sy

Event: 1.463 Thread 0x00007fee4404b800 Exception   (0x0000000583d12738)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDic

Event: 1.463 Thread 0x00007fee4404b800 Exception   (0x0000000583d23168)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/classfile/systemDictionary.cpp,

Event: 1.898 Thread 0x00007fee4404b800 Exception  > (0x00000005880744b8)
thrown at
[/builddir/build/BUILD/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/openjdk/hotspot/src/share/vm/prims/jni.cpp,
line 1615]



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster is not responsive after node segmentation and reconciliation

Posted by "Maxim.Pudov" <pu...@gmail.com>.
Please, check the file you attached, it doesn't contain any logs.
Persistence will help to save data and prevent running out of memory.
Another option is to set an eviction policy:
https://apacheignite.readme.io/docs/evictions




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/