You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by mahesh76private <ma...@gmail.com> on 2019/01/16 02:54:23 UTC

Baselined node rejoining crashes other baseline nodes - Duplicate Key Error

I have two nodes on which we have 3 tables which are partitioned.  Index are
also built on these tables. 

For 24 hours caches work fine.  The tables are definitely distributed across
both the nodes

Node 2 reboots due to some issue - goes out of the baseline - comes back and
joins the baseline.  Other baseline nodes crash and in the logs we see
duplicate Key error

[10:38:35,437][INFO]tcp-disco-srvr-#2[TcpDiscoverySpi] TCP discovery
accepted incoming connection [rmtAddr=/192.168.1.7, rmtPort=45102]
[10:38:35,437][INFO]tcp-disco-srvr-#2[TcpDiscoverySpi] TCP discovery
spawning a new thread for connection [rmtAddr=/192.168.1.7, rmtPort=45102]
[10:38:35,437][INFO]tcp-disco-sock-reader-#12[TcpDiscoverySpi] Started
serving remote node connection [rmtAddr=/192.168.1.7:45102, rmtPort=45102]
[10:38:35,451][INFO]tcp-disco-sock-reader-#12[TcpDiscoverySpi] Finished
serving remote node connection [rmtAddr=/192.168.1.7:45102, rmtPort=45102
[10:38:35,457][SEVERE]tcp-disco-msg-worker-#3[TcpDiscoverySpi]
TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node
in order to prevent cluster wide instability.
*java.lang.IllegalStateException: Duplicate key
at org.apache.ignite.cache.QueryEntity.checkIndexes(QueryEntity.java:223)
at org.apache.ignite.cache.QueryEntity.makePatch(QueryEntity.java:174)*


Logs and confurations are attached here 
https://issues.apache.org/jira/browse/IGNITE-8728
 
please offer any suggestions 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Baselined node rejoining crashes other baseline nodes - DuplicateKeyError

Posted by Stanislav Lukyanov <st...@gmail.com>.

Hi,

I’ve reproduced this and have a fix – I guess it’ll be available with 2.8.
Meanwhile I can only suggest not to create indexes without an explicit name.

Stan

From: mahesh76private
Sent: 16 января 2019 г. 12:39
To: user@ignite.apache.org
Subject: RE: Baselined node rejoining crashes other baseline nodes - DuplicateKeyError

Stan, thanks for the visibility. 

-1-
Over the last year, we move from various versions of ignite 2.4, 2.5 to 2.7.
I always keep work folder in tact. 
-2-
Over a period of development, we might have tried to create index a second
or many times on the same column on which an index already existed. Now,
could that cause a confusion at ignite level, especially in a multi-node
scenario? Was something out of sync? Was a check missing?
-3-
Over a period of time, we dropped the table several times and recreated the
table several times and indexes. Was something stable left out in work
folder. We always used 2 or more nodes. 
-4-
Over a period of time, we saw issues with index creation as well. My
colleague posted another strange behaviour with index creation. See the
issue here,
http://apache-ignite-users.70518.x6.nabble.com/Failing-to-create-index-on-Ignite-table-column-td26252.html#a26258
Summary is if we don't give index names the ignite gives exceptions.                                 

Something seems to be wrong with Ignite index handling in multi-node
environment. 

Regarding your point 2 (jira), absolutely, makes sense not to crash the node
on this exception. We have about 100GB data (tables) on ignite and the only
work around right now seems to be 

Boot node 1. Keep its work folder. 
Boot node 2 after removing its work folder

This scenario though works, gives the cluster a down-time of about 1-2 hours
and this is not acceptable for our customers. 

--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Baselined node rejoining crashes other baseline nodes - Duplicate KeyError

Posted by mahesh76private <ma...@gmail.com>.

Stan, thanks for the visibility.

-1-
Over the last year, we move from various versions of ignite 2.4, 2.5 to 2.7.
I always keep work folder in tact.
-2-
Over a period of development, we might have tried to create index a second
or many times on the same column on which an index already existed. Now,
could that cause a confusion at ignite level, especially in a multi-node
scenario? Was something out of sync? Was a check missing?
-3-
Over a period of time, we dropped the table several times and recreated the
table several times and indexes. Was something stable left out in work
folder. We always used 2 or more nodes.
-4-
Over a period of time, we saw issues with index creation as well. My
colleague posted another strange behaviour with index creation. See the
issue here,
http://apache-ignite-users.70518.x6.nabble.com/Failing-to-create-index-on-Ignite-table-column-td26252.html#a26258
Summary is if we don't give index names the ignite gives exceptions.

Something seems to be wrong with Ignite index handling in multi-node
environment.

Regarding your point 2 (jira), absolutely, makes sense not to crash the node
on this exception. We have about 100GB data (tables) on ignite and the only
work around right now seems to be

Boot node 1. Keep its work folder.
Boot node 2 after removing its work folder

This scenario though works, gives the cluster a down-time of about 1-2 hours
and this is not acceptable for our customers.

--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Baselined node rejoining crashes other baseline nodes - Duplicate KeyError

Posted by Stanislav Lukyanov <st...@gmail.com>.

Hi,

Left a comment in the issue.
In short, the problem is that you got a duplicate index on one of your nodes somehow, 
even though it shouldn’t happen. Need to figure out, how.

Can you tell what you do with the cluster when it is running?
I’m particularly interested in any of the actions related to cache/table/index creation and deletion.

Stan

From: mahesh76private
Sent: 16 января 2019 г. 5:54
To: user@ignite.apache.org
Subject: Baselined node rejoining crashes other baseline nodes - Duplicate KeyError

I have two nodes on which we have 3 tables which are partitioned.  Index are
also built on these tables. 

For 24 hours caches work fine.  The tables are definitely distributed across
both the nodes

Node 2 reboots due to some issue - goes out of the baseline - comes back and
joins the baseline.  Other baseline nodes crash and in the logs we see
duplicate Key error

[10:38:35,437][INFO]tcp-disco-srvr-#2[TcpDiscoverySpi] TCP discovery
accepted incoming connection [rmtAddr=/192.168.1.7, rmtPort=45102]
[10:38:35,437][INFO]tcp-disco-srvr-#2[TcpDiscoverySpi] TCP discovery
spawning a new thread for connection [rmtAddr=/192.168.1.7, rmtPort=45102]
[10:38:35,437][INFO]tcp-disco-sock-reader-#12[TcpDiscoverySpi] Started
serving remote node connection [rmtAddr=/192.168.1.7:45102, rmtPort=45102]
[10:38:35,451][INFO]tcp-disco-sock-reader-#12[TcpDiscoverySpi] Finished
serving remote node connection [rmtAddr=/192.168.1.7:45102, rmtPort=45102
[10:38:35,457][SEVERE]tcp-disco-msg-worker-#3[TcpDiscoverySpi]
TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node
in order to prevent cluster wide instability.
*java.lang.IllegalStateException: Duplicate key
at org.apache.ignite.cache.QueryEntity.checkIndexes(QueryEntity.java:223)
at org.apache.ignite.cache.QueryEntity.makePatch(QueryEntity.java:174)*


Logs and confurations are attached here 
https://issues.apache.org/jira/browse/IGNITE-8728
please offer any suggestions 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/