You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Ray <ra...@cisco.com> on 2018/10/30 06:21:35 UTC

Create index got stuck and freeze whole cluster.

I'm using a five nodes Ignite 2.6 cluster.
When I try try to create index on table with10 million records using sql
"create index on table(a,b,c,d)", the whole cluster freezes and prints the
following log for 40 minutes.

2018-10-30T02:48:44,086][WARN
][exchange-worker-#162][GridDhtPartitionsExchangeFuture] Unable to await
partitions release latch within timeout: ServerLatch [permits=4,
pendingAcks=[20aa5929-3f26-4923-87a3-27b4f6d4f744,
ec5be25e-6601-468c-9f0e-7ab7c8caa9e9, 45819b05-a338-4bc4-b104-f0c7567fd49d,
cbb80db7-b342-4b97-ba61-97d57c194a1a], super=CompletableLatch [id=exchange,
topVer=AffinityTopologyVersion [topVer=202, minorTopVer=1]]]

I noticed one of the servers(log in server3.zip) is stuck in checkpoint
process, and this server acts as coordinator in PME.
In the log I see only 856610 pages needs to be flushed to disk, but the
checkpoint takes 32 minutes to finish.
While another node takes 7 minutes to finish writing 919060 pages to disk.
Also the disk usage on the slow checkpoint server is not 100%.

Here's the whole log file for 5 servers.
server1.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server1.zip>  
server2.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server2.zip>  
server3.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server3.zip>  
server4.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server4.zip>  
server5.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server5.zip>  




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Create index got stuck and freeze whole cluster.

Posted by Stanislav Lukyanov <st...@gmail.com>.

Hi,

The only thing I can say is that your troubles seem to have started way before.

I see a bunch of “Found long running cache future” repeating, and then exchange for
stopping SQL_PUBLIC_USERLEVEL cache that never completes. 
Would need logs going further (at least minutes) into the past to see what went wrong.

Stan

From: Ray
Sent: 30 октября 2018 г. 9:21
To: user@ignite.apache.org
Subject: Create index got stuck and freeze whole cluster.

I'm using a five nodes Ignite 2.6 cluster.
When I try try to create index on table with10 million records using sql
"create index on table(a,b,c,d)", the whole cluster freezes and prints the
following log for 40 minutes.

2018-10-30T02:48:44,086][WARN
][exchange-worker-#162][GridDhtPartitionsExchangeFuture] Unable to await
partitions release latch within timeout: ServerLatch [permits=4,
pendingAcks=[20aa5929-3f26-4923-87a3-27b4f6d4f744,
ec5be25e-6601-468c-9f0e-7ab7c8caa9e9, 45819b05-a338-4bc4-b104-f0c7567fd49d,
cbb80db7-b342-4b97-ba61-97d57c194a1a], super=CompletableLatch [id=exchange,
topVer=AffinityTopologyVersion [topVer=202, minorTopVer=1]]]

I noticed one of the servers(log in server3.zip) is stuck in checkpoint
process, and this server acts as coordinator in PME.
In the log I see only 856610 pages needs to be flushed to disk, but the
checkpoint takes 32 minutes to finish.
While another node takes 7 minutes to finish writing 919060 pages to disk.
Also the disk usage on the slow checkpoint server is not 100%.

Here's the whole log file for 5 servers.
server1.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server1.zip>  
server2.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server2.zip>  
server3.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server3.zip>  
server4.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server4.zip>  
server5.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server5.zip>  




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/