You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Maksim Timonin (Jira)" <ji...@apache.org> on 2022/11/21 12:33:00 UTC

[jira] [Assigned] (IGNITE-17908) AssertionError LWM after reserved on data insertion after the cluster restart

     [ https://issues.apache.org/jira/browse/IGNITE-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maksim Timonin reassigned IGNITE-17908:
---------------------------------------

    Assignee: Maksim Timonin

> AssertionError LWM after reserved on data insertion after the cluster restart
> -----------------------------------------------------------------------------
>
>                 Key: IGNITE-17908
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17908
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Anton Vinogradov
>            Assignee: Maksim Timonin
>            Priority: Major
>              Labels: ise
>         Attachments: LwmAfterReservedTest.java
>
>
> After the cluster restart you may see the following assertion:
> {code}
> java.lang.AssertionError: LWM after reserved: lwm=2030, reserved=2010, cntr=Counter [lwm=2030, missed=[], hwm=2030, reserved=2011]
> 	at org.apache.ignite.internal.processors.cache.PartitionUpdateCounterTrackingImpl.reserve(PartitionUpdateCounterTrackingImpl.java:270)
> 	at org.apache.ignite.internal.processors.cache.PartitionUpdateCounterErrorWrapper.reserve(PartitionUpdateCounterErrorWrapper.java:58)
> 	at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.getAndIncrementUpdateCounter(IgniteCacheOffheapManagerImpl.java:1594)
> 	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.getAndIncrementUpdateCounter(GridCacheOffheapManager.java:2483)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.getAndIncrementUpdateCounter(GridDhtLocalPartition.java:942)
> 	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.calculatePartitionUpdateCounters(IgniteTxLocalAdapter.java:510)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1356)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:726)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1132)
> 	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareAsyncLocal(GridNearTxLocal.java:4282)
> 	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareColocatedTx(IgniteTxHandler.java:303)
> 	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.proceedPrepare(GridNearOptimisticTxPrepareFuture.java:565)
> 	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepareSingle(GridNearOptimisticTxPrepareFuture.java:392)
> 	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepare0(GridNearOptimisticTxPrepareFuture.java:335)
> 	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepareOnTopology(GridNearOptimisticTxPrepareFutureAdapter.java:205)
> 	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepare(GridNearOptimisticTxPrepareFutureAdapter.java:129)
> 	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareNearTxLocal(GridNearTxLocal.java:3946)
> 	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.commitNearTxLocalAsync(GridNearTxLocal.java:3994)
> 	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.optimisticPutFuture(GridNearTxLocal.java:3051)
> 	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAsync0(GridNearTxLocal.java:729)
> 	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAsync(GridNearTxLocal.java:484)
> 	at org.apache.ignite.internal.processors.cache.GridCacheAdapter$20.op(GridCacheAdapter.java:2511)
> 	at org.apache.ignite.internal.processors.cache.GridCacheAdapter$20.op(GridCacheAdapter.java:2509)
> 	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4284)
> 	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put0(GridCacheAdapter.java:2509)
> 	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2487)
> 	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2466)
> 	at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1332)
> 	at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:867)
> 	at org.apache.ignite.util.BrokenRebalanceTest.testCountersOnCrachRecovery(BrokenRebalanceTest.java:191)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2506)
> {code}
> when clusted was under the load and has a counters gaps on restart.
> See  [^LwmAfterReservedTest.java] 
> Possible solution is to fix {{PartitionUpdateCounterTrackingImpl#reservedCntr}} calculation 
> from 
> {code}
> long max = Math.max(val, curLwm);
> {code}
> to something like
> {code}
> long max = Math.max(val, curLwm);
> max = Math.max(max, highestAppliedCounter());
> {code}
> Another possible fix is to get rid or {{PartitionUpdateCounterTrackingImpl#reservedCntr}} and always use {{highestAppliedCounter()}}.
> This may slowdown the system, but look like a correct fix. So, some optimisation may be required.
> Please make sure that counters are correct after the fix using idle_verify.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)