You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexey Goncharuk (JIRA)" <ji...@apache.org> on 2017/11/27 10:56:00 UTC

[jira] [Commented] (IGNITE-6113) Partition eviction prevents exchange from completion

    [ https://issues.apache.org/jira/browse/IGNITE-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266650#comment-16266650 ] 

Alexey Goncharuk commented on IGNITE-6113:
------------------------------------------

The issue with the code is that we wait for partition rent in the assign() method. This blocks exchange thread from progress until the partition is evicted.

In my understanding, we already have all the machinery to fix this issue:
1) Move partition to MOVING state, but mark it as required to be evicted
2) In demander, do not send first demand message until we iterated over the full partition and attempted to clean it. After the iteration is done, even if the partition is not empty, we can safely start rebalancing. Also, during the clear, we can skip versions that arrived after the rebalancing has started (need to figure out if this can be implemented)
3) After iteration is finished, start rebalancing for this particular partition.

Also, note the suspicious code in the GridDhtPreloader#assing() method:
{code}
                if (histSupplier != null) {
                    if (part.state() != MOVING) {
                        if (log.isDebugEnabled())
                            log.debug("Skipping partition assignment (state is not MOVING): " + part);

                        continue; // For.
                    }
{code}
Most likely, we need the same logic here, even if we have a history supplier.

> Partition eviction prevents exchange from completion
> ----------------------------------------------------
>
>                 Key: IGNITE-6113
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6113
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.1
>            Reporter: Vladislav Pyatkov
>
> I has waited for 3 hours for completion without any success.
> exchange-worker is blocked.
> {noformat}
> "exchange-worker-#92%DPL_GRID%grid554.ca.sbrf.ru%" #173 prio=5 os_prio=0 tid=0x00007f0835c2e000 nid=0xb907 runnable [0x00007e74ab1d0000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007efee630a7c0> (a org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition$1)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:189)
>         at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.assign(GridDhtPreloader.java:340)
>         at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1801)
>         at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - None
> {noformat}
> {noformat}
> "sys-#124%DPL_GRID%grid554.ca.sbrf.ru%" #278 prio=5 os_prio=0 tid=0x00007e731c02d000 nid=0xbf4d runnable [0x00007e734e7f7000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>         at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>         at sun.nio.ch.IOUtil.write(IOUtil.java:51)
>         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
>         - locked <0x00007f056161bf88> (a java.lang.Object)
>         at org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.writeBuffer(FileWriteAheadLogManager.java:1829)
>         at org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:1572)
>         at org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:1421)
>         at org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager$FileWriteHandle.access$800(FileWriteAheadLogManager.java:1331)
>         at org.gridgain.grid.cache.db.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:339)
>         at org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.beforeReleaseWrite(PageMemoryImpl.java:1287)
>         at org.gridgain.grid.internal.processors.cache.database.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1142)
>         at org.gridgain.grid.internal.processors.cache.database.pagemem.PageImpl.releaseWrite(PageImpl.java:167)
>         at org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writeUnlock(PageHandler.java:193)
>         at org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:242)
>         at org.apache.ignite.internal.processors.cache.database.tree.util.PageHandler.writePage(PageHandler.java:119)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.doRemoveFromLeaf(BPlusTree.java:2886)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.removeFromLeaf(BPlusTree.java:2865)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree$Remove.access$6900(BPlusTree.java:2515)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1607)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.removeDown(BPlusTree.java:1574)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.doRemove(BPlusTree.java:1481)
>         at org.apache.ignite.internal.processors.cache.database.tree.BPlusTree.remove(BPlusTree.java:1451)
>         at org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.remove(H2TreeIndex.java:307)
>         at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.doUpdate(GridH2Table.java:637)
>         at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:517)
>         at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:664)
>         at org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:1186)
>         at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:467)
>         at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1090)
>         at org.gridgain.grid.cache.db.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:993)
>         at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:357)
>         at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3621)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:599)
>         - locked <0x00007f054d45bad8> (a org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCacheEntry)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:956)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:793)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:856)
>         at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$9.call(GridDhtPreloader.java:843)
>         at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6660)
>         at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:925)
>         at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)