You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Anton Vinogradov (Jira)" <ji...@apache.org> on 2022/10/07 08:14:00 UTC

[jira] [Updated] (IGNITE-17496) LWM may be after HWM (reserved) on primary after the node restart

     [ https://issues.apache.org/jira/browse/IGNITE-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anton Vinogradov updated IGNITE-17496:
--------------------------------------
    Parent Issue: IGNITE-17845  (was: IGNITE-15167)

> LWM may be after HWM (reserved) on primary after the node restart
> -----------------------------------------------------------------
>
>                 Key: IGNITE-17496
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17496
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Anton Vinogradov
>            Assignee: Anton Vinogradov
>            Priority: Major
>              Labels: iep-31
>             Fix For: 2.14
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> java.lang.AssertionError: LWM after HWM: lwm=10010, hwm=10003, cntr=Counter [lwm=10010, missed=[10011 - 10012, 10021, 10031 - 10032, 10043 - 10044], maxApplied=10047, hwm=10004]
>     at org.apache.ignite.internal.processors.cache.PartitionUpdateCounterTrackingImpl.reserve(PartitionUpdateCounterTrackingImpl.java:265)
>     at org.apache.ignite.internal.processors.cache.PartitionUpdateCounterErrorWrapper.reserve(PartitionUpdateCounterErrorWrapper.java:58)
>     at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.getAndIncrementUpdateCounter(IgniteCacheOffheapManagerImpl.java:1620)
>     at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.getAndIncrementUpdateCounter(GridCacheOffheapManager.java:2538)
>     at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.getAndIncrementUpdateCounter(GridDhtLocalPartition.java:942)
>     at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.calculatePartitionUpdateCounters(IgniteTxLocalAdapter.java:510)
>     at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1360)
>     at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:730)
>     at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1136)
>     at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:400)
>     at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:581)
>     at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:378)
>     at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:201)
>     at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:175)
>     at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:135)
>     at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:223)
>     at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:221)
>     at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1151)
>     at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:592)
>     at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
>     at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
>     at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:110)
>     at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:309)
>     at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
>     at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
>     at org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:243)
>     at org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
>     at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
>     at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:637)
>     at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
>     at java.lang.Thread.run(Thread.java:750) {code}
> It looks like we have incorrect initialization problem here.
> For example, at startup primary has the following counter
> {code:java}
> [lwm=10006, missed=[10007 - 10008, 10017 - 10020, 10031 - 10033, 10039 - 10042, 10055], hwm=10059, reserved=10006]{code}
> but when updates started we'll got an exception
> {code:java}
> LWM after reserved: lwm=10016, reserved=10008, cntr=Counter [lwm=10016, missed=[10017 - 10020, 10031 - 10033, 10039 - 10042, 10055], hwm=10059, reserved=10009]{code}
> this happens because first gap was closed and {{lwm}} changed from {{10006}} to {{10016}} because of closed {{{}0007 - 10008{}}}.
> And main prodlem here that we're trying to reuse already used counters, so, {{reserved}} should be set to {{hwm}} at initialization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)