You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by John Smith <ja...@gmail.com> on 2020/07/03 16:03:11 UTC

Re: What does all partition owners have left the grid on the client side mean?

Hi Evgenii, did you have a chance to look at the latest logs?

On Thu, 25 Jun 2020 at 11:32, John Smith <ja...@gmail.com> wrote:

> Ok
>
> stdout.copy.zip
>
> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>
> On Thu, 25 Jun 2020 at 11:01, John Smith <ja...@gmail.com> wrote:
>
>> Because in between it's all the business logs. Let me make sure I didn't
>> filter anything relevant. So maybe in those 13 hours nothing happened?
>>
>>
>> On Thu, 25 Jun 2020 at 10:53, Evgenii Zhuravlev <e....@gmail.com>
>> wrote:
>>
>>> This doesn't seem to be a full log. There is a gap for more than 13
>>> hours in the log :
>>> {"appTimestamp":"2020-06-23T23:06:41.658+00:00","threadName":"ignite-update-notifier-timer","level":"WARN","loggerName":"org.apache.ignite.internal.processors.cluster.GridUpdateNotifier","message":"New
>>> version is available at ignite.apache.org: 2.8.1"}
>>> {"appTimestamp":"2020-06-24T12:58:42.294+00:00","threadName":"disco-event-worker-#35%xxxxxx%","level":"INFO","loggerName":"org.apache.ignite.internal.managers.discovery.GridDiscoveryManager","message":"Node
>>> left topology: TcpDiscoveryNode [id=02949ae0-4eea-4dc9-8aed-b3f50e8d7238,
>>> addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, xxx.xxx.xxx.73],
>>> sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0,
>>> xxxxxx-task-0003/xxx.xxx.xxx.73:0], discPort=0, order=1258, intOrder=632,
>>> lastExchangeTime=1592890182021, loc=false,
>>> ver=2.7.0#20181130-sha1:256ae401, isClient=true]"}
>>>
>>> I don't see any exceptions in the log. When did the issue happen? Can
>>> you share the full log?
>>>
>>> Evgenii
>>>
>>> чт, 25 июн. 2020 г. в 07:36, John Smith <ja...@gmail.com>:
>>>
>>>> Hi Evgenii, same folder shared stdout.copy
>>>>
>>>> Just in case:
>>>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>>>
>>>> On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <
>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>
>>>>> No, it's not. It's not clear when it happened and what was with the
>>>>> cluster and the client node itself at this moment.
>>>>>
>>>>> Evgenii
>>>>>
>>>>> ср, 24 июн. 2020 г. в 18:16, John Smith <ja...@gmail.com>:
>>>>>
>>>>>> Ok I'll try... The stack trace isn't enough?
>>>>>>
>>>>>> On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <
>>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>>
>>>>>>> John, right, didn't notice them before. Can you share the full log
>>>>>>> for the client node with an issue?
>>>>>>>
>>>>>>> Evgenii
>>>>>>>
>>>>>>> ср, 24 июн. 2020 г. в 12:29, John Smith <ja...@gmail.com>:
>>>>>>>
>>>>>>>> I thought I did! The link doesn't have them?
>>>>>>>>
>>>>>>>> On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <
>>>>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Can you share full log files from server nodes?
>>>>>>>>>
>>>>>>>>> ср, 24 июн. 2020 г. в 10:47, John Smith <ja...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> The logs for server are here:
>>>>>>>>>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>>>>>>>>>
>>>>>>>>>> The error from the client:
>>>>>>>>>>
>>>>>>>>>> javax.cache.CacheException: class
>>>>>>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>>>>>>>>> Failed to execute cache operation (all partition owners have left the grid,
>>>>>>>>>> partition data has been lost) [cacheName=cache1, part=580,
>>>>>>>>>> key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
>>>>>>>>>> at
>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
>>>>>>>>>> at
>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
>>>>>>>>>> at
>>>>>>>>>> io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
>>>>>>>>>> at
>>>>>>>>>> io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
>>>>>>>>>> at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
>>>>>>>>>> at
>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>>>>> at
>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>>>>> at
>>>>>>>>>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>>>>>>>>> at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>> Caused by:
>>>>>>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>>>>>>>>> Failed to execute cache operation (all partition owners have left the grid,
>>>>>>>>>> partition data has been lost) [cacheName=cache1, part=580,
>>>>>>>>>> key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
>>>>>>>>>> at
>>>>>>>>>> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
>>>>>>>>>> at
>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
>>>>>>>>>> at
>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
>>>>>>>>>> at
>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
>>>>>>>>>> at
>>>>>>>>>> com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
>>>>>>>>>> at
>>>>>>>>>> com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
>>>>>>>>>> at
>>>>>>>>>> com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
>>>>>>>>>> at
>>>>>>>>>> com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
>>>>>>>>>> at io.reactivex.Flowable.subscribe(Flowable.java:14918)
>>>>>>>>>> at io.reactivex.Flowable.subscribe(Flowable.java:14865)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
>>>>>>>>>> at
>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
>>>>>>>>>> at
>>>>>>>>>> com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
>>>>>>>>>> at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
>>>>>>>>>> at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
>>>>>>>>>> at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
>>>>>>>>>> at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
>>>>>>>>>> at
>>>>>>>>>> com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
>>>>>>>>>> at
>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
>>>>>>>>>> at
>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
>>>>>>>>>> at
>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
>>>>>>>>>> at
>>>>>>>>>> io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
>>>>>>>>>> at
>>>>>>>>>> io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
>>>>>>>>>> at
>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
>>>>>>>>>> at
>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
>>>>>>>>>> at
>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
>>>>>>>>>> at
>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
>>>>>>>>>> ... 7 common frames omitted
>>>>>>>>>>
>>>>>>>>>> On Wed, 24 Jun 2020 at 13:28, John Smith <ja...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Not sure about the wrong configuration... All the apps work this
>>>>>>>>>>> seems to happen every few weeks. We don't have any particular heavy load.
>>>>>>>>>>>
>>>>>>>>>>> I just bounced the client application and the errors went away.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <
>>>>>>>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> It means that there are no nodes in the cluster that holds
>>>>>>>>>>>> certain partitions. So, probably you have a wrong configuration or some of
>>>>>>>>>>>> the nodes left the cluster and you don't have backups in the cluster for
>>>>>>>>>>>> these partitions. I believe more can be found from logs.
>>>>>>>>>>>>
>>>>>>>>>>>> Evgenii
>>>>>>>>>>>>
>>>>>>>>>>>> ср, 24 июн. 2020 г. в 09:52, John Smith <java.dev.mtl@gmail.com
>>>>>>>>>>>> >:
>>>>>>>>>>>>
>>>>>>>>>>>>> Also I'm assuming that the thin client wouldn't be susceptible
>>>>>>>>>>>>> to this error?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, 24 Jun 2020 at 12:38, John Smith <
>>>>>>>>>>>>> java.dev.mtl@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The cluster is showing active when running control.sh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But the client is showing "all partition owners have left
>>>>>>>>>>>>>> the grid"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The client node is marked as client=true so it's not a server
>>>>>>>>>>>>>> node.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is this split brain as well?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: What does all partition owners have left the grid on the client side mean?

Posted by John Smith <ja...@gmail.com>.

Yeah I restarted the server nodes. But I guess the client didn't
reconnect.... Hummmmm....

On Tue., Jul. 7, 2020, 5:52 p.m. Evgenii Zhuravlev, <
e.zhuravlev.wk@gmail.com> wrote:

> John,
>
> Unfortunately, I didn't find messages about lost partitions for this
> cache, there is a chance that it happened before. What Partition Loss
> policy do you have?
>
> Logs says that there is a problem with partition distribution:
>  "Local node affinity assignment distribution is not ideal [cache=cache1,
> expectedPrimary=512.00, actualPrimary=493, expectedBackups=512.00,
> actualBackups=171, warningThreshold=50.00%]"
> How do you restart nodes? Do you wait until rebalance completed?
>
> Evgenii
>
>
>
> пт, 3 июл. 2020 г. в 09:03, John Smith <ja...@gmail.com>:
>
>> Hi Evgenii, did you have a chance to look at the latest logs?
>>
>> On Thu, 25 Jun 2020 at 11:32, John Smith <ja...@gmail.com> wrote:
>>
>>> Ok
>>>
>>> stdout.copy.zip
>>>
>>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>>
>>> On Thu, 25 Jun 2020 at 11:01, John Smith <ja...@gmail.com> wrote:
>>>
>>>> Because in between it's all the business logs. Let me make sure I
>>>> didn't filter anything relevant. So maybe in those 13 hours nothing
>>>> happened?
>>>>
>>>>
>>>> On Thu, 25 Jun 2020 at 10:53, Evgenii Zhuravlev <
>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>
>>>>> This doesn't seem to be a full log. There is a gap for more than 13
>>>>> hours in the log :
>>>>> {"appTimestamp":"2020-06-23T23:06:41.658+00:00","threadName":"ignite-update-notifier-timer","level":"WARN","loggerName":"org.apache.ignite.internal.processors.cluster.GridUpdateNotifier","message":"New
>>>>> version is available at ignite.apache.org: 2.8.1"}
>>>>> {"appTimestamp":"2020-06-24T12:58:42.294+00:00","threadName":"disco-event-worker-#35%xxxxxx%","level":"INFO","loggerName":"org.apache.ignite.internal.managers.discovery.GridDiscoveryManager","message":"Node
>>>>> left topology: TcpDiscoveryNode [id=02949ae0-4eea-4dc9-8aed-b3f50e8d7238,
>>>>> addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, xxx.xxx.xxx.73],
>>>>> sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0,
>>>>> xxxxxx-task-0003/xxx.xxx.xxx.73:0], discPort=0, order=1258, intOrder=632,
>>>>> lastExchangeTime=1592890182021, loc=false,
>>>>> ver=2.7.0#20181130-sha1:256ae401, isClient=true]"}
>>>>>
>>>>> I don't see any exceptions in the log. When did the issue happen? Can
>>>>> you share the full log?
>>>>>
>>>>> Evgenii
>>>>>
>>>>> чт, 25 июн. 2020 г. в 07:36, John Smith <ja...@gmail.com>:
>>>>>
>>>>>> Hi Evgenii, same folder shared stdout.copy
>>>>>>
>>>>>> Just in case:
>>>>>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>>>>>
>>>>>> On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <
>>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>>
>>>>>>> No, it's not. It's not clear when it happened and what was with the
>>>>>>> cluster and the client node itself at this moment.
>>>>>>>
>>>>>>> Evgenii
>>>>>>>
>>>>>>> ср, 24 июн. 2020 г. в 18:16, John Smith <ja...@gmail.com>:
>>>>>>>
>>>>>>>> Ok I'll try... The stack trace isn't enough?
>>>>>>>>
>>>>>>>> On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <
>>>>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> John, right, didn't notice them before. Can you share the full log
>>>>>>>>> for the client node with an issue?
>>>>>>>>>
>>>>>>>>> Evgenii
>>>>>>>>>
>>>>>>>>> ср, 24 июн. 2020 г. в 12:29, John Smith <ja...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> I thought I did! The link doesn't have them?
>>>>>>>>>>
>>>>>>>>>> On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <
>>>>>>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Can you share full log files from server nodes?
>>>>>>>>>>>
>>>>>>>>>>> ср, 24 июн. 2020 г. в 10:47, John Smith <java.dev.mtl@gmail.com
>>>>>>>>>>> >:
>>>>>>>>>>>
>>>>>>>>>>>> The logs for server are here:
>>>>>>>>>>>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>>>>>>>>>>>
>>>>>>>>>>>> The error from the client:
>>>>>>>>>>>>
>>>>>>>>>>>> javax.cache.CacheException: class
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>>>>>>>>>>> Failed to execute cache operation (all partition owners have left the grid,
>>>>>>>>>>>> partition data has been lost) [cacheName=cache1, part=580,
>>>>>>>>>>>> key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
>>>>>>>>>>>> at
>>>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
>>>>>>>>>>>> at
>>>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
>>>>>>>>>>>> at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
>>>>>>>>>>>> at
>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>>>>>>> at
>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>>> Caused by:
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>>>>>>>>>>> Failed to execute cache operation (all partition owners have left the grid,
>>>>>>>>>>>> partition data has been lost) [cacheName=cache1, part=580,
>>>>>>>>>>>> key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
>>>>>>>>>>>> at
>>>>>>>>>>>> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
>>>>>>>>>>>> at
>>>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
>>>>>>>>>>>> at
>>>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
>>>>>>>>>>>> at
>>>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
>>>>>>>>>>>> at
>>>>>>>>>>>> com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
>>>>>>>>>>>> at
>>>>>>>>>>>> com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
>>>>>>>>>>>> at
>>>>>>>>>>>> com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
>>>>>>>>>>>> at
>>>>>>>>>>>> com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
>>>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
>>>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
>>>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
>>>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
>>>>>>>>>>>> at io.reactivex.Flowable.subscribe(Flowable.java:14918)
>>>>>>>>>>>> at io.reactivex.Flowable.subscribe(Flowable.java:14865)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
>>>>>>>>>>>> at
>>>>>>>>>>>> com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
>>>>>>>>>>>> at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
>>>>>>>>>>>> at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
>>>>>>>>>>>> at
>>>>>>>>>>>> com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
>>>>>>>>>>>> at
>>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
>>>>>>>>>>>> ... 7 common frames omitted
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, 24 Jun 2020 at 13:28, John Smith <
>>>>>>>>>>>> java.dev.mtl@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Not sure about the wrong configuration... All the apps work
>>>>>>>>>>>>> this seems to happen every few weeks. We don't have any particular heavy
>>>>>>>>>>>>> load.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I just bounced the client application and the errors went away.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <
>>>>>>>>>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It means that there are no nodes in the cluster that holds
>>>>>>>>>>>>>> certain partitions. So, probably you have a wrong configuration or some of
>>>>>>>>>>>>>> the nodes left the cluster and you don't have backups in the cluster for
>>>>>>>>>>>>>> these partitions. I believe more can be found from logs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Evgenii
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ср, 24 июн. 2020 г. в 09:52, John Smith <
>>>>>>>>>>>>>> java.dev.mtl@gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also I'm assuming that the thin client wouldn't be
>>>>>>>>>>>>>>> susceptible to this error?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, 24 Jun 2020 at 12:38, John Smith <
>>>>>>>>>>>>>>> java.dev.mtl@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The cluster is showing active when running control.sh
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> But the client is showing "all partition owners have left
>>>>>>>>>>>>>>>> the grid"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The client node is marked as client=true so it's not a
>>>>>>>>>>>>>>>> server node.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is this split brain as well?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Re: What does all partition owners have left the grid on the client side mean?

Posted by Evgenii Zhuravlev <e....@gmail.com>.

John,

Unfortunately, I didn't find messages about lost partitions for this cache,
there is a chance that it happened before. What Partition Loss policy do
you have?

Logs says that there is a problem with partition distribution:
 "Local node affinity assignment distribution is not ideal [cache=cache1,
expectedPrimary=512.00, actualPrimary=493, expectedBackups=512.00,
actualBackups=171, warningThreshold=50.00%]"
How do you restart nodes? Do you wait until rebalance completed?

Evgenii



пт, 3 июл. 2020 г. в 09:03, John Smith <ja...@gmail.com>:

> Hi Evgenii, did you have a chance to look at the latest logs?
>
> On Thu, 25 Jun 2020 at 11:32, John Smith <ja...@gmail.com> wrote:
>
>> Ok
>>
>> stdout.copy.zip
>>
>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>
>> On Thu, 25 Jun 2020 at 11:01, John Smith <ja...@gmail.com> wrote:
>>
>>> Because in between it's all the business logs. Let me make sure I didn't
>>> filter anything relevant. So maybe in those 13 hours nothing happened?
>>>
>>>
>>> On Thu, 25 Jun 2020 at 10:53, Evgenii Zhuravlev <
>>> e.zhuravlev.wk@gmail.com> wrote:
>>>
>>>> This doesn't seem to be a full log. There is a gap for more than 13
>>>> hours in the log :
>>>> {"appTimestamp":"2020-06-23T23:06:41.658+00:00","threadName":"ignite-update-notifier-timer","level":"WARN","loggerName":"org.apache.ignite.internal.processors.cluster.GridUpdateNotifier","message":"New
>>>> version is available at ignite.apache.org: 2.8.1"}
>>>> {"appTimestamp":"2020-06-24T12:58:42.294+00:00","threadName":"disco-event-worker-#35%xxxxxx%","level":"INFO","loggerName":"org.apache.ignite.internal.managers.discovery.GridDiscoveryManager","message":"Node
>>>> left topology: TcpDiscoveryNode [id=02949ae0-4eea-4dc9-8aed-b3f50e8d7238,
>>>> addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, xxx.xxx.xxx.73],
>>>> sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0,
>>>> xxxxxx-task-0003/xxx.xxx.xxx.73:0], discPort=0, order=1258, intOrder=632,
>>>> lastExchangeTime=1592890182021, loc=false,
>>>> ver=2.7.0#20181130-sha1:256ae401, isClient=true]"}
>>>>
>>>> I don't see any exceptions in the log. When did the issue happen? Can
>>>> you share the full log?
>>>>
>>>> Evgenii
>>>>
>>>> чт, 25 июн. 2020 г. в 07:36, John Smith <ja...@gmail.com>:
>>>>
>>>>> Hi Evgenii, same folder shared stdout.copy
>>>>>
>>>>> Just in case:
>>>>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>>>>
>>>>> On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <
>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>
>>>>>> No, it's not. It's not clear when it happened and what was with the
>>>>>> cluster and the client node itself at this moment.
>>>>>>
>>>>>> Evgenii
>>>>>>
>>>>>> ср, 24 июн. 2020 г. в 18:16, John Smith <ja...@gmail.com>:
>>>>>>
>>>>>>> Ok I'll try... The stack trace isn't enough?
>>>>>>>
>>>>>>> On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <
>>>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>>>
>>>>>>>> John, right, didn't notice them before. Can you share the full log
>>>>>>>> for the client node with an issue?
>>>>>>>>
>>>>>>>> Evgenii
>>>>>>>>
>>>>>>>> ср, 24 июн. 2020 г. в 12:29, John Smith <ja...@gmail.com>:
>>>>>>>>
>>>>>>>>> I thought I did! The link doesn't have them?
>>>>>>>>>
>>>>>>>>> On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <
>>>>>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Can you share full log files from server nodes?
>>>>>>>>>>
>>>>>>>>>> ср, 24 июн. 2020 г. в 10:47, John Smith <ja...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> The logs for server are here:
>>>>>>>>>>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>>>>>>>>>>
>>>>>>>>>>> The error from the client:
>>>>>>>>>>>
>>>>>>>>>>> javax.cache.CacheException: class
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>>>>>>>>>> Failed to execute cache operation (all partition owners have left the grid,
>>>>>>>>>>> partition data has been lost) [cacheName=cache1, part=580,
>>>>>>>>>>> key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
>>>>>>>>>>> at
>>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
>>>>>>>>>>> at
>>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
>>>>>>>>>>> at
>>>>>>>>>>> io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
>>>>>>>>>>> at
>>>>>>>>>>> io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
>>>>>>>>>>> at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
>>>>>>>>>>> at
>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>>>>>> at
>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>>>>>> at
>>>>>>>>>>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>>>>>>>>>> at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>> Caused by:
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>>>>>>>>>> Failed to execute cache operation (all partition owners have left the grid,
>>>>>>>>>>> partition data has been lost) [cacheName=cache1, part=580,
>>>>>>>>>>> key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
>>>>>>>>>>> at
>>>>>>>>>>> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
>>>>>>>>>>> at
>>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
>>>>>>>>>>> at
>>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
>>>>>>>>>>> at
>>>>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
>>>>>>>>>>> at
>>>>>>>>>>> com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
>>>>>>>>>>> at
>>>>>>>>>>> com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
>>>>>>>>>>> at
>>>>>>>>>>> com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
>>>>>>>>>>> at
>>>>>>>>>>> com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
>>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
>>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
>>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
>>>>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
>>>>>>>>>>> at io.reactivex.Flowable.subscribe(Flowable.java:14918)
>>>>>>>>>>> at io.reactivex.Flowable.subscribe(Flowable.java:14865)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
>>>>>>>>>>> at
>>>>>>>>>>> io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
>>>>>>>>>>> at
>>>>>>>>>>> com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
>>>>>>>>>>> at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
>>>>>>>>>>> at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
>>>>>>>>>>> at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
>>>>>>>>>>> at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
>>>>>>>>>>> at
>>>>>>>>>>> com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
>>>>>>>>>>> at
>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
>>>>>>>>>>> at
>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
>>>>>>>>>>> at
>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
>>>>>>>>>>> at
>>>>>>>>>>> io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
>>>>>>>>>>> at
>>>>>>>>>>> io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
>>>>>>>>>>> at
>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
>>>>>>>>>>> at
>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
>>>>>>>>>>> at
>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
>>>>>>>>>>> at
>>>>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
>>>>>>>>>>> ... 7 common frames omitted
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 24 Jun 2020 at 13:28, John Smith <ja...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Not sure about the wrong configuration... All the apps work
>>>>>>>>>>>> this seems to happen every few weeks. We don't have any particular heavy
>>>>>>>>>>>> load.
>>>>>>>>>>>>
>>>>>>>>>>>> I just bounced the client application and the errors went away.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <
>>>>>>>>>>>> e.zhuravlev.wk@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> It means that there are no nodes in the cluster that holds
>>>>>>>>>>>>> certain partitions. So, probably you have a wrong configuration or some of
>>>>>>>>>>>>> the nodes left the cluster and you don't have backups in the cluster for
>>>>>>>>>>>>> these partitions. I believe more can be found from logs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Evgenii
>>>>>>>>>>>>>
>>>>>>>>>>>>> ср, 24 июн. 2020 г. в 09:52, John Smith <
>>>>>>>>>>>>> java.dev.mtl@gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also I'm assuming that the thin client wouldn't be
>>>>>>>>>>>>>> susceptible to this error?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, 24 Jun 2020 at 12:38, John Smith <
>>>>>>>>>>>>>> java.dev.mtl@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The cluster is showing active when running control.sh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But the client is showing "all partition owners have left
>>>>>>>>>>>>>>> the grid"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The client node is marked as client=true so it's not a
>>>>>>>>>>>>>>> server node.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is this split brain as well?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>