You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by percent620 <pe...@163.com> on 2016/09/18 06:05:11 UTC

ignite memory issues:Urgent in production

Hello, I have a urgent issues on our production env for ignite issues.

I have deployed ignite cluster with standalone server for 7 server nodes,
each ignite node memory is 40G. totally is 270G.

[10:05:11] Topology snapshot [ver=2356, servers=7, clients=0, CPUs=1488,
heap=270GB]

....

we have set all the ignite connection is "client" mode, when we have 60
clients(each clients is 4GB), then ignite with the following information 

*[10:05:11] Topology snapshot [ver=2356, servers=7, clients=60, CPUs=1488,
heap=510GB]*

sometimes all the ignite shut down quickly  and error message is 
[13:08:12,549][SEVERE][exchange-worker-#136%null%][GridDhtPartitionsExchangeFuture]
Failed to reinitialize local partitions (preloading will be stopped):
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=2582,
minorTopVer=0], nodeId=8837eae8, evt=NODE_FAILED]
java.lang.NullPointerException
	at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:734)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:473)
	at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1440)
	at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
	at java.lang.Thread.run(Thread.java:745)
[13:08:12] Ignite node stopped OK [uptime=25:07:08:283]



I have 2 questions as below
1) Can you please tell me what's wrong with this error message?
2)
*[10:05:11] Topology snapshot [ver=2356, servers=7, clients=60, CPUs=1488,
heap=510GB]*
client total memory is 240GB(60 client nodes * 4GB), is this is root cause? 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/ignite-memory-issues-Urgent-in-production-tp7817.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: ignite memory issues:Urgent in production

Posted by Taras Ledkov <tl...@gridgain.com>.
Lets separate the issues.

1. Clients affect the displayed total grid's heap.
I think it is not cause of the server failures.

2. Is server nodes shutdown or crashed / failed?
If node is failed please attach thelog of the failed server. Please 
attach the full log (from the server node start) if the log is not huge.

On 20.09.2016 12:52, percent620 wrote:
> Can anyone help me to fix this issue as this issue happens in our production
> env?
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/ignite-memory-issues-Urgent-in-production-tp7817p7842.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.

-- 
Taras Ledkov
Mail-To: tledkov@gridgain.com


Re: ignite memory issues:Urgent in production

Posted by percent620 <pe...@163.com>.
Can anyone help me to fix this issue as this issue happens in our production
env?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/ignite-memory-issues-Urgent-in-production-tp7817p7842.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: ignite memory issues:Urgent in production

Posted by percent620 <pe...@163.com>.
*Another server logs and I found that several ignite server automaticlly
shutdown.
*
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send
message to remote node: TcpDiscoveryNode
[id=f59d7d01-b01d-46b2-b679-17b73313ae98, addrs=[yyyyy, 127.0.0.1],
sockAddrs=[yyyyy/yyyyy:0, /127.0.0.1:0], discPort=0, order=2486,
intOrder=1265, lastExchangeTime=1474169373925, loc=false,
ver=1.7.0#20160801-sha1:383273e3, isClient=true]
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1996)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1936)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1304)
	... 30 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect
to node (is node still alive?). Make sure that each ComputeTask and cache
Transaction has a timeout set in order to prevent parties from waiting
forever in case of network issues
[nodeId=f59d7d01-b01d-46b2-b679-17b73313ae98, addrs=[yyyyy/yyyyy:47100,
/127.0.0.1:47100]]
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2499)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2140)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2034)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1970)
	... 32 more
	Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: yyyyy/yyyyy:47100
		at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2504)
		... 35 more
	Caused by: java.net.ConnectException: Connection refused
		at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
		at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
		at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:117)
		at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2363)
		... 35 more
	Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: /127.0.0.1:47100
		at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2504)
		... 35 more
	Caused by: class org.apache.ignite.IgniteCheckedException: Remote node ID
is not as expected [expected=f59d7d01-b01d-46b2-b679-17b73313ae98,
rcvd=a4df12c5-fe9e-4b3f-b652-0ec02111dc7b]
		at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2614)
		at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2371)
		... 35 more
[11:29:44,714][SEVERE][marshaller-cache-#228%null%][CacheContinuousQueryHandler]
Failed to send event notification to node:
19ca5b90-ae92-41fa-ae54-d1427e41185d
class org.apache.ignite.IgniteCheckedException: Failed to send message (node
may have left the grid or TCP connection cannot be established due to
firewall issues) [node=TcpDiscoveryNode
[id=19ca5b90-ae92-41fa-ae54-d1427e41185d, addrs=[yyyyy, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, yyyyy/yyyyy:0], discPort=0, order=2481,
intOrder=1260, lastExchangeTime=1474169373764, loc=false,
ver=1.7.0#20160801-sha1:383273e3, isClient=true], topic=T4
[topic=TOPIC_CACHE, id1=1fd3a002-42a8-3e13-a1aa-bf164b7f2d64,
id2=19ca5b90-ae92-41fa-ae54-d1427e41185d, id3=1], msg=GridContinuousMessage
[type=MSG_EVT_NOTIFICATION, routineId=bf2fb8b0-db98-4f6e-8fa4-514d00dcf5e7,
data=null, futId=null], policy=2]
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1309)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1540)
	at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1337)
	at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1308)
	at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1290)
	at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:945)
	at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:888)
	at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
	at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
	at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
	at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:347)
	at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2583)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2252)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1652)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1490)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:2950)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:130)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268)
	at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:266)
	at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:748)
	at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:353)
	at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:277)
	at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:88)
	at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:231)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1238)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:866)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:106)
	at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:829)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send
message to remote node: TcpDiscoveryNode
[id=19ca5b90-ae92-41fa-ae54-d1427e41185d, addrs=[yyyyy, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, yyyyy/yyyyy:0], discPort=0, order=2481,
intOrder=1260, lastExchangeTime=1474169373764, loc=false,
ver=1.7.0#20160801-sha1:383273e3, isClient=true]
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1996)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1936)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1304)
	... 30 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect
to node (is node still alive?). Make sure that each ComputeTask and cache
Transaction has a timeout set in order to prevent parties from waiting
forever in case of network issues
[nodeId=19ca5b90-ae92-41fa-ae54-d1427e41185d, addrs=[yyyyy/yyyyy:47100,
/127.0.0.1:47100]]
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2499)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2140)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2034)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1970)
	... 32 more
	Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: yyyyy/yyyyy:47100
		at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2504)
		... 35 more
	Caused by: java.net.ConnectException: Connection refused
		at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
		at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
		at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:117)
		at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2363)
		... 35 more
	Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: /127.0.0.1:47100
		at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2504)
		... 35 more
	Caused by: class org.apache.ignite.IgniteCheckedException: Remote node ID
is not as expected [expected=19ca5b90-ae92-41fa-ae54-d1427e41185d,
rcvd=a4df12c5-fe9e-4b3f-b652-0ec02111dc7b]
		at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2614)
		at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2371)
		... 35 more




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/ignite-memory-issues-Urgent-in-production-tp7817p7818.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.