You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by ignite_user2016 <ri...@gmail.com> on 2017/07/06 18:19:54 UTC

frequet disconnection in ignite cluster

hello Igniters,

we are seeing frequent disconnection between ignite instances, we have IP
based clusters which has following configuration - 

Ignite version - 1.7.0

 <property name="discoverySpi">
            <bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                <property name="ipFinder">
                    
                    
                    <bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                        
                        <property name="addresses">
                            <list>
                                
                                <value>HOST_IP1:47500..47509</value>
                                <value>HOST_IP2:47500..47509</value>
                            </list>
                        </property>
                    </bean>
                </property>
            </bean>
        </property>

See the error log - 

[09:53:46,139][WARN ][tcp-disco-msg-worker-#2%WebGrid%][TcpDiscoverySpi]
Local node has detected failed nodes and started cluster-wide procedure. To
speed up failure detection please see 'Failure Detection' section under
javadoc for 'TcpDiscoverySpi'

[09:54:56,060][WARN
][exchange-worker-#54%WebGrid%][GridCachePartitionExchangeManager] Failed to
wait for partition map exchange [topVer=AffinityTopologyVersion
[topVer=22132, minorTopVer=0], node=d3719fe1-84cf-4fe5-91dd-2d10abb1b3d2].
Dumping pending objects that might be the cause:
[09:54:56,060][WARN
][exchange-worker-#54%WebGrid%][GridCachePartitionExchangeManager] Ready
affinity version: AffinityTopologyVersion [topVer=22131, minorTopVer=0]
[09:54:56,062][WARN
][exchange-worker-#54%WebGrid%][GridCachePartitionExchangeManager] Last
exchange future: GridDhtPartitionsExchangeFuture [dummy=false,
forcePreload=false, reassign=false, discoEvt=DiscoveryEvent
[evtNode=TcpDiscoveryNode [id=d2ffb86c-5305-4cb3-96a0-874be73d610a,
addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, host_ip2],
sockAddrs=[host2/host_ip2:47501, 0:0:0:0:0:0:0:1%lo:47501,
/127.0.0.1:47501], discPort=47501, order=22131, intOrder=11068,
lastExchangeTime=1499352867440, loc=false, ver=1.7.0#20160801-sha1:383273e3,
isClient=false], topVer=22132, nodeId8=d3719fe1, msg=Node left:
TcpDiscoveryNode [id=d2ffb86c-5305-4cb3-96a0-874be73d610a,
addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, host_ip2],
sockAddrs=[host2/host_ip2:47501, 0:0:0:0:0:0:0:1%lo:47501,
/127.0.0.1:47501], discPort=47501, order=22131, intOrder=11068,
lastExchangeTime=1499352867440, loc=false, ver=1.7.0#20160801-sha1:383273e3,
isClient=false], type=NODE_LEFT, tstamp=1499352886042], crd=TcpDiscoveryNode
[id=64ce302c-9743-47bc-bf27-641015a37b81, addrs=[127.0.0.1, host_ip1],
sockAddrs=[/127.0.0.1:47500, host1/host_ip1:47500], discPort=47500, order=1,
intOrder=1, lastExchangeTime=1498849915139, loc=false,
ver=1.7.0#20160801-sha1:383273e3, isClient=false],
exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
[topVer=22132, minorTopVer=0], nodeId=d2ffb86c, evt=NODE_LEFT], added=true,
initFut=GridFutureAdapter [resFlag=2, res=true, startTime=1499352886042,
endTime=1499352886052, ignoreInterrupts=false, state=DONE],init=true,
topSnapshot=null, lastVer=null, partReleaseFut=GridCompoundFuture [rdc=null,
initFlag=1, lsnrCalls=3, done=true, cancelled=false, err=null, futs=[true,
true, true]], affChangeMsg=null, skipPreload=false,
clientOnlyExchange=false, initTs=1499352886042, centralizedAff=true,
evtLatch=0, remaining=[64ce302c-9743-47bc-bf27-641015a37b81],
srvNodes=[TcpDiscoveryNode [id=64ce302c-9743-47b
c-bf27-641015a37b81, addrs=[127.0.0.1, host_ip1],
sockAddrs=[/127.0.0.1:47500, host1/host_ip1:47500], discPort=47500, order=1,
intOrder=1, lastExchangeTime=1498849915139, loc=false,
ver=1.7.0#20160801-sha1:383273e3, isClient=false], TcpDiscoveryNode
[id=d3719fe1-84cf-4fe5-91dd-2d10abb1b3d2, addrs=[127.0.0.1, host_ip2],
sockAddrs=[/127.0.0.1:47500, host2/host_ip2:47500], discPort=47500, order=4,
intOrder=3, lastExchangeTime=1499352895809, loc=true,
ver=1.7.0#20160801-sha1:383273e3, isClient=false]], super=GridFutureAdapter
[resFlag=0, res=nul
l, startTime=1499352886042, endTime=0, ignoreInterrupts=false, state=INIT]]

[10:08:37,232][WARN
][exchange-worker-#54%WebGrid%][GridCachePartitionExchangeManager] Failed to
wait for partition map exchange [topVer=AffinityTopologyVersion
[topVer=22134, minorTopVer=0], node=
d3719fe1-84cf-4fe5-91dd-2d10abb1b3d2]. Dumping pending objects that might be
the cause:
[10:08:47,287][WARN
][exchange-worker-#54%WebGrid%][GridCachePartitionExchangeManager] Failed to
wait for partition map exchange [topVer=AffinityTopologyVersion
[topVer=22134, minorTopVer=0], node=
d3719fe1-84cf-4fe5-91dd-2d10abb1b3d2]. Dumping pending objects that might be
the cause:

class org.apache.ignite.IgniteException: Failed to wait for affinity ready
future for topology version: AffinityTopologyVersion [topVer=22134,
minorTopVer=0]
        at
org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.awaitTopologyVersion(GridAffinityAssignmentCache.java:526)
        at
org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:434)
        at
org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.assignments(GridAffinityAssignmentCache.java:331)
        at
org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.assignments(GridCacheAffinityManager.java:165)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.initPartitions0(GridDhtPartitionTopologyImpl.java:373)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.initPartitions(GridDhtPartitionTopologyImpl.java:340)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1057)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:86)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:324)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processMessage(GridDhtPartitionsExchangeFuture.java:1400)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$400(GridDhtPartitionsExchangeFuture.java:86)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:1369)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:1357)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:263)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:226)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceive(GridDhtPartitionsExchangeFuture.java:1357)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:1030)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1200(GridCachePartitionExchangeManager.java:112)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:316)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:314)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:1807)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:1789)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:748)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:353)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:277)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:88)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:231)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1238)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:866)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:106)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:829)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to wait
for topology update, cache (or node) is stopping.
        at
org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.cancelFutures(GridCacheAffinityManager.java:92)
        at
org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:904)
        at
org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:1914)
        at
org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:1860)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2266)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2229)
        at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:323)
        at org.apache.ignite.Ignition.stop(Ignition.java:224)
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$8.run(GridDiscoveryManager.java:1946)
        ... 1 more



Can any one guide me what tuning are require on configuration ? 

I have also noticed that CPU and JVM memory gradually rising by days on
Ignite servers.

Thanks for all your help..

Rishi



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/frequet-disconnection-in-ignite-cluster-tp14411.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: frequet disconnection in ignite cluster

Posted by ignite_user2016 <ri...@gmail.com>.

Hello Val,

I have asked for such changes would keep you informed.

PS - our world is very slow so any changes would take 1-2 week of time.

Thanks..

On Fri, Jul 7, 2017 at 2:06 PM, vkulichenko [via Apache Ignite Users] <
ml+s70518n14505h9@n6.nabble.com> wrote:

> Rishi,
>
> Does the issue go away if you stop triggering monitoring every 5 minutes.
> Such frequency is very small, it should not cause any issues of course.
>
> -Val
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-ignite-users.70518.x6.nabble.com/frequet-
> disconnection-in-ignite-cluster-tp14411p14505.html
> To start a new topic under Apache Ignite Users, email
> ml+s70518n1h85@n6.nabble.com
> To unsubscribe from Apache Ignite Users, click here
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cmlzaGl5YWduaWtAZ21haWwuY29tfDF8MTMwNTI4OTg1Mw==>
> .
> NAML
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Rishi Yagnik




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/frequet-disconnection-in-ignite-cluster-tp14411p14506.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: frequet disconnection in ignite cluster

Posted by vkulichenko <va...@gmail.com>.

Rishi,

Does the issue go away if you stop triggering monitoring every 5 minutes.
Such frequency is very small, it should not cause any issues of course.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/frequet-disconnection-in-ignite-cluster-tp14411p14505.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: frequet disconnection in ignite cluster

Posted by ignite_user2016 <ri...@gmail.com>.

Hello ..

Thanks for sharing the thread..

we have 4 core, 8 GB RAM mostly Ignite is used for 2nd level cache.

We have set our JVM to 1G RAM.

Then, wondering how would we monitor ignite without impacting the system ?

Thanks ...

On Thu, Jul 6, 2017 at 9:54 PM, tysli2016 [via Apache Ignite Users] <
ml+s70518n14442h56@n6.nabble.com> wrote:

> Hi Rishi,
>
> seems it's not a good idea to connect ignite repeatedly, I observed a
> similar memory issue.
> would you mind to share your server configurations (cores, memory)?
>
> http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-
> 6-0-suspected-memory-leak-from-DynamicCacheDescriptor-td9443i20.html
>
> http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-
> node-cluster-with-visor-running-repeatedly-Ignite-1-9-tt12409.html
>
> Tom
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-ignite-users.70518.x6.nabble.com/frequet-
> disconnection-in-ignite-cluster-tp14411p14442.html
> To start a new topic under Apache Ignite Users, email
> ml+s70518n1h85@n6.nabble.com
> To unsubscribe from Apache Ignite Users, click here
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cmlzaGl5YWduaWtAZ21haWwuY29tfDF8MTMwNTI4OTg1Mw==>
> .
> NAML
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Rishi Yagnik




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/frequet-disconnection-in-ignite-cluster-tp14411p14490.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: frequet disconnection in ignite cluster

Posted by tysli2016 <To...@cityline.com.hk>.

Hi Rishi,

seems it's not a good idea to connect ignite repeatedly, I observed a
similar memory issue.
would you mind to share your server configurations (cores, memory)?

http://apache-ignite-users.70518.x6.nabble.com/Ignite-1-6-0-suspected-memory-leak-from-DynamicCacheDescriptor-td9443i20.html

http://apache-ignite-users.70518.x6.nabble.com/OOME-on-2-node-cluster-with-visor-running-repeatedly-Ignite-1-9-tt12409.html

Tom



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/frequet-disconnection-in-ignite-cluster-tp14411p14442.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: frequet disconnection in ignite cluster

Posted by ignite_user2016 <ri...@gmail.com>.

Thank you.. Val for all your help.

I will investigate further..

We are monitoring ignite every 5 mins with shell script could that cause memory to go high ?

Take Care,
Rishi

> On Jul 6, 2017, at 4:50 PM, vkulichenko [via Apache Ignite Users] <ml...@n6.nabble.com> wrote:
> 
> Rishi, 
> 
> This is usually caused by either network or memory issues. Check that you don't have any network glitches and that there are no GC pauses, memory leaks, etc. on the nodes. 
> 
> -Val 
> 
> If you reply to this email, your message will be added to the discussion below:
> http://apache-ignite-users.70518.x6.nabble.com/frequet-disconnection-in-ignite-cluster-tp14411p14420.html
> To start a new topic under Apache Ignite Users, email ml+s70518n1h85@n6.nabble.com 
> To unsubscribe from Apache Ignite Users, click here.
> NAML




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/frequet-disconnection-in-ignite-cluster-tp14411p14441.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: frequet disconnection in ignite cluster

Posted by vkulichenko <va...@gmail.com>.

Rishi,

This is usually caused by either network or memory issues. Check that you
don't have any network glitches and that there are no GC pauses, memory
leaks, etc. on the nodes.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/frequet-disconnection-in-ignite-cluster-tp14411p14420.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.