You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by akurbanov <an...@gmail.com> on 2018/07/04 13:37:20 UTC

Re: Deadlock on IgniteCache:put?

Hi,

Your code is waiting on future completion, and obviously it cannot be
completed because failed node did not respond. Could you please share logs
for all nodes in your cluster when put is blocked and share Ignite
configuration so we could find out whether root cause is connected with
network and node segmentation or there is any other failure?

Regards



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock on IgniteCache:put?

Posted by akurbanov <an...@gmail.com>.
Hi,

Thanks for the logs, but I meant if you could attach full Ignite nodes logs
for investigation as these parts are not enough to pinpoint the root cause.

From what I got so far: first of all, try to remove timeouts and set
property failureDetectionTimeout for both TcpDiscoverySpi and
TcpCommunicationSpi, you can do this by propagating this property directly
into IgniteConfiguration instance.

Could you also verify if it's the same node being kicked out from cluster
after 10 minutes, or failing node is different in each case?

From this short logs I can only see that there are nodes with ids:
N1. d0daf95e-49c0-4071-b6bc-cd7e279c0582
N2. a0830a00-1970-43b4-b143-6e1947f0059f
N3. 4313271e-8273-4f28-bcab-aac3a43e6722

N1 does not appear in rcvd= section, so I assume that there is problem on
this node. Most probable reasons are network and GC, so I need full logs for
nodes and GC logging enabled:

Add following to JVM opts:

-XX:+PrintAdaptiveSizePolicy -XX:+PrintGC
-XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC

Add -DIGNITE_QUIET=false to node startup cmd and collect logs.

Regards,
Anton



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Deadlock on IgniteCache:put?

Posted by Gri <gr...@enonic.com>.
Finally managed to reproduce but it went slightly different this time.
The node was not removed from the cluster after the 10 minutes like I said.
It seemed to work again after 15 minutes.
But meanwhile all the requests I sent to my node1 that were using the cache
remained pending.

Logs
  Node1
  /2018-07-04 15:14:41,232 WARN  o.apache.ignite.internal.diagnostic - Found
long running cache future [startTime=15:13:03.909, curTime=15:14:41.223,
fut=GridNearAtomicSingleUpdateFuture [reqState=Primary
[id=d0daf95e-49c0-4071-b6bc-cd7e279c0582, opRes=true, expCnt=2, rcvdCnt=1,
primaryRes=false, done=false,
waitFor=[a0830a00-1970-43b4-b143-6e1947f0059f],
rcvd=[4313271e-8273-4f28-bcab-aac3a43e6722]],
super=GridNearAtomicAbstractUpdateFuture [remapCnt=100,
topVer=AffinityTopologyVersion [topVer=5, minorTopVer=1], remapTopVer=null,
err=null, futId=507917, super=GridFutureAdapter [ignoreInterrupts=false,
state=INIT, res=null, hash=1016566303]]]]
  2018-07-04 15:15:41,236 WARN  o.apache.ignite.internal.diagnostic - Found
long running cache future [startTime=15:14:38.102, curTime=15:15:41.227,
fut=GridDhtAtomicSingleUpdateFuture [allUpdated=true,
super=GridDhtAtomicAbstractUpdateFuture [futId=622594, resCnt=1,
addedReader=false, dhtRes={d0daf95e-49c0-4071-b6bc-cd7e279c0582=[res=false,
size=1, nearSize=0], a0830a00-1970-43b4-b143-6e1947f0059f=[res=true, size=1,
nearSize=0]}]]]
  2018-07-04 15:15:41,240 WARN  o.apache.ignite.internal.diagnostic - Found
long running cache future [startTime=15:14:38.102, curTime=15:15:41.227,
fut=GridNearAtomicSingleUpdateFuture [reqState=Primary
[id=4313271e-8273-4f28-bcab-aac3a43e6722, opRes=true, expCnt=2, rcvdCnt=1,
primaryRes=true, done=false, waitFor=[d0daf95e-49c0-4071-b6bc-cd7e279c0582],
rcvd=[a0830a00-1970-43b4-b143-6e1947f0059f]],
super=GridNearAtomicAbstractUpdateFuture [remapCnt=100,
topVer=AffinityTopologyVersion [topVer=5, minorTopVer=1], remapTopVer=null,
err=null, futId=622593, super=GridFutureAdapter [ignoreInterrupts=false,
state=INIT, res=null, hash=2076320652]]]]
  2018-07-04 15:17:41,247 WARN  o.apache.ignite.internal.diagnostic - Found
long running cache future [startTime=15:14:53.759, curTime=15:17:41.246,
fut=GridNearAtomicSingleUpdateFuture [reqState=Primary
[id=d0daf95e-49c0-4071-b6bc-cd7e279c0582, opRes=true, expCnt=2, rcvdCnt=1,
primaryRes=false, done=false,
waitFor=[a0830a00-1970-43b4-b143-6e1947f0059f],
rcvd=[4313271e-8273-4f28-bcab-aac3a43e6722]],
super=GridNearAtomicAbstractUpdateFuture [remapCnt=100,
topVer=AffinityTopologyVersion [topVer=5, minorTopVer=1], remapTopVer=null,
err=null, futId=507918, super=GridFutureAdapter [ignoreInterrupts=false,
state=INIT, res=null, hash=870007217]]]]
  2018-07-04 15:17:41,248 WARN  o.apache.ignite.internal.diagnostic - Found
long running cache future [startTime=15:14:44.440, curTime=15:17:41.246,
fut=GridNearAtomicSingleUpdateFuture [reqState=Primary
[id=d0daf95e-49c0-4071-b6bc-cd7e279c0582, opRes=true, expCnt=2, rcvdCnt=1,
primaryRes=false, done=false,
waitFor=[a0830a00-1970-43b4-b143-6e1947f0059f],
rcvd=[4313271e-8273-4f28-bcab-aac3a43e6722]],
super=GridNearAtomicAbstractUpdateFuture [remapCnt=100,
topVer=AffinityTopologyVersion [topVer=5, minorTopVer=1], remapTopVer=null,
err=null, futId=409607, super=GridFutureAdapter [ignoreInterrupts=false,
state=INIT, res=null, hash=1258252543]]]]/

  Node2
  /2018-07-04 15:14:37,826 WARN  o.apache.ignite.internal.diagnostic - Found
long running cache future [startTime=15:13:03.909, curTime=15:14:37.824,
fut=GridDhtAtomicSingleUpdateFuture [allUpdated=true,
super=GridDhtAtomicAbstractUpdateFuture [futId=475137, resCnt=1,
addedReader=false, dhtRes={a0830a00-1970-43b4-b143-6e1947f0059f=[res=false,
size=1, nearSize=0], 4313271e-8273-4f28-bcab-aac3a43e6722=[res=true, size=1,
nearSize=0]}]]]
  2018-07-04 15:17:37,834 WARN  o.apache.ignite.internal.diagnostic - Found
long running cache future [startTime=15:16:23.876, curTime=15:17:37.830,
fut=GridDhtAtomicSingleUpdateFuture [allUpdated=true,
super=GridDhtAtomicAbstractUpdateFuture [futId=475138, resCnt=1,
addedReader=false, dhtRes={a0830a00-1970-43b4-b143-6e1947f0059f=[res=false,
size=1, nearSize=0], 4313271e-8273-4f28-bcab-aac3a43e6722=[res=true, size=1,
nearSize=0]}]]]
  2018-07-04 15:17:37,837 WARN  o.apache.ignite.internal.diagnostic - Found
long running cache future [startTime=15:16:23.876, curTime=15:17:37.830,
fut=GridDhtAtomicSingleUpdateFuture [allUpdated=true,
super=GridDhtAtomicAbstractUpdateFuture [futId=491521, resCnt=1,
addedReader=false, dhtRes={a0830a00-1970-43b4-b143-6e1947f0059f=[res=false,
size=1, nearSize=0], 4313271e-8273-4f28-bcab-aac3a43e6722=[res=true, size=1,
nearSize=0]}]]]/

  Node3
  Nothing


Configuration
/Using org.apache.ignite.configuration.IgniteConfiguration, and setting the
following values:
  setIgniteInstanceName <= <name>
  setConsistentId <= <name>
  setIgniteHome <= <path>
  setAddressResolver <= new BasicAddressResolver(<mappings>)
  setMarshaller <= new OptimizedMarshaller( true )
  setDataStorageConfiguration <= new DataStorageConfiguration()
    getDefaultDataRegionConfiguration().setMaxSize <= <heap max size>
  setConnectorConfiguration <= null
  setClientConnectorConfiguration <= null
  setLocalHost <= <localhost>
  setDiscoverySpi <= new TcpDiscoverySpi()
    setLocalPort <= 47500
    setLocalPortRange <= 0
    setLocalAddress <= <localhost>
    setReconnectCount <= 10
    setAckTimeout <= 2000L
    setJoinTimeout <= 0L
    setNetworkTimeout <= 5000L
    setSocketTimeout <= 2000L
    setStatisticsPrintFrequency <= 0
  setGridLogger <=  new Log4JWrapper(...)
  setMetricsLogFrequency <= 0
  setClassLoader <= new ContainerSweepClassLoader(<bundle>, <classloader>)/
  
    
  
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/