You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by si...@barclays.com on 2021/07/20 21:42:58 UTC

CacheInvalidStateException | all partition owners have left the grid, partition data has been lost

We have a 3 node cluster with persistence enabled, sometimes I see below error happening.

Ignite Server log

2021-07-19 21:12:13,650 INFO [http-nio-9050-exec-1] o.a.c.h.Http11Processor [DirectJDKLog.java:175] Error parsing HTTP request header
Note: further occurrences of HTTP request parsing errors will be logged at DEBUG level.
java.lang.IllegalArgumentException: Invalid character found in method name [0x160x030x030x010xb40x010x000x010xb00x030x030x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x01p0x000x010x000x020x000x030x000x040x000x050x000x060x000x070x000x080x000x090x000x0a0x000x0b0x000x0c0x000x0d0x000x0e0x000x0f0x000x100x000x110x000x120x000x130x000x140x000x150x000x160x000x170x000x180x000x190x000x1a0x000x1b0x00/0x0000x0010x0020x0030x0040x0050x0060x0070x0080x0090x00:0x00;0x00<0x00=0x00>0x00?0x00@0x00g0x00h0x00i0x00j0x00k0x00l0x00m0x00A0x00B0x00C0x00D0x00E0x00F0x000x840x000x850x000x860x000x870x000x880x000x890x000xba0x000xbb0x000xbc0x000xbd0x000xbe0x000xbf0x000xc00x000xc10x000xc20x000xc30x000xc40x000xc50x000x9c0x000x9d0x000x9e0x000x9f0x000xa00x000xa10x000xa20x000xa30x000xa40x000xa50x000xa60x000xa70xc00x010xc00x020xc00x030xc00x040xc00x050xc00x060xc00x070xc00x080xc00x090xc00x0a0xc00x0b0xc00x0c0xc00x0d0xc00x0e0xc00x0f0xc00x100xc00x110xc00x120xc00x130xc00x140xc00x150xc00x160xc00x170xc00x180xc00x190xc0#0xc0$0xc0%0xc0&0xc0'0xc0(0xc0)0xc0*0xc0+0xc0,0xc0-0xc0.0xc0/0xc010xc000xc020xc0s0xc0r0xc0t0xc0u0xc0v0xc0w0xc0x0xc0z0xc0y0xc0{0xc0|0xc0}0xc0~0xc00x7f0xc00x800xc00x810xc00x820xc00x830xc00x840xc00x850xc00x860xc00x870xc00x880xc00x890xc00x8a0xc00x8b0xc00x8c0xc00x8d0xc00x8e0xc00x8f0xc00x900xc00x910xc00x920xc00x930xc00x940xc00x950xc00x960xc00x970xc00x980xc00x990xc00x9a0xc00x9b0x000x960x000x970x000x980x000x990x000x9a0x000x9b0xcc0xa80xcc0xa90xcc0xaa0xcc0xab0xcc0xac0xcc0xad0xcc0xae0x020x000x010x000x160x000x0a0x000x0a0x000x080x000x170x000x190x000x180x000x160x000x0b0x000x040x030x000x01...]. HTTP method names must be tokens
        at org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:418)
        at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:260)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)
2021-07-19 21:12:14,114 WARN [grid-nio-worker-tcp-rest-0-#38%blade-cache%] o.a.i.i.p.r.p.t.GridTcpRestProtocol [JavaLogger.java:295] Client disconnected abruptly due to network connection loss or because the connection was left open on application shutdown. [cls=class o.a.i.i.util.nio.GridNioException, msg=Failed to parse incoming packet (invalid packet start) [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0 lim=441 cap=8192], super=AbstractNioClientWorker [idx=0, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-0, igniteInstanceName=blade-cache, finished=false, heartbeatTs=1626743534110, hashCode=507494530, interrupted=false, runner=grid-nio-worker-tcp-rest-0-#38%blade-cache%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, closeSocket=true, outboundMessagesQueueSizeMetric=null, super=GridNioSessionImpl [locAddr=/10.148.213.242:11211, rmtAddr=/10.148.213.242:54330, createTime=1626743534110, closeTime=0, bytesSent=0, bytesRcvd=441, bytesSent0=0, bytesRcvd0=441, sndSchedTime=1626743534110, lastSndTime=1626743534110, lastRcvTime=1626743534110, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [marsh=JdkMarshaller [clsFilter=o.a.i.marshaller.MarshallerUtils$1@dafac83], routerClient=false], directMode=false]], accepted=true, markedForClose=false]], b=16]]
2021-07-19 21:12:14,117 WARN [grid-nio-worker-tcp-rest-1-#39%blade-cache%] o.a.i.i.p.r.p.t.GridTcpRestProtocol [JavaLogger.java:295] Client disconnected abruptly due to network connection loss or because the connection was left open on application shutdown. [cls=class o.a.i.i.util.nio.GridNioException, msg=Failed to parse incoming packet (invalid packet start) [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0 lim=441 cap=8192], super=AbstractNioClientWorker [idx=1, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-1, igniteInstanceName=blade-cache, finished=false, heartbeatTs=1626743534110, hashCode=1070416289, interrupted=false, runner=grid-nio-worker-tcp-rest-1-#39%blade-cache%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, closeSocket=true, outboundMessagesQueueSizeMetric=null, super=GridNioSessionImpl [locAddr=/10.148.213.242:11211, rmtAddr=/10.148.213.242:54332, createTime=1626743534110, closeTime=0, bytesSent=0, bytesRcvd=441, bytesSent0=0, bytesRcvd0=441, sndSchedTime=1626743534110, lastSndTime=1626743534110, lastRcvTime=1626743534110, readsPaused=false, filte


Client log

Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute the cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=USER_IGNITE_CACHE_PRIMARY, partition=79, key=UserKeyCacheObjectImpl [part=79, val=429E98A8-F594-4D1B-BE68-23BA6E027E25, hasValBytes=false]]

During this time , I verified the  topology and saw all nodes to be up and running fine.  I had to restart all the nodes to recover from this error.

Cache Configuration


public CacheConfiguration<String, User> getPrimaryUserCacheConfiguration() {
    final CacheConfiguration<String, User> cacheConfiguration = new CacheConfiguration<>(CacheIdentifiers.USER_IGNITE_CACHE_PRIMARY.toString());

    cacheConfiguration.setName(CacheIdentifiers.USER_IGNITE_CACHE_PRIMARY.toString());
    cacheConfiguration.setIndexedTypes(String.class, User.class);
    cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);
    cacheConfiguration.setStoreKeepBinary(true);
   RendezvousAffinityFunction rendezvousAffinityFunction = new RendezvousAffinityFunction();
    rendezvousAffinityFunction.setPartitions(512);
    cacheConfiguration.setBackups(1);
    cacheConfiguration.setAffinity(rendezvousAffinityFunction);
    //cacheConfiguration.setOnheapCacheEnabled(true);
    return cacheConfiguration;
}

Could someone please help me understand why this error would occur and how to recover from this  ?

Also I have configured  health check to identify this situation but I was not able to determine with the below check. Am I doing something wrong?


private Collection<Integer> getForLostPartitions(){
    Collection<Integer> lostPartitions = ignite.cache(CacheIdentifiers.USER_IGNITE_CACHE_PRIMARY.toString()).lostPartitions();
    return lostPartitions;
}


if(checkForLostPartitions()){
    Health.down().withDetail("Lost Partitions",getForLostPartitions().size()).withDetail("Partitions Lost ", getForLostPartitions().toArray()).build();
}

Any help would be appreciated.

Thanks





_________________________________________________________________________________________________________________________________________________________________________________________________________________________________
�This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; https://www.investmentbank.barclays.com/disclosures/barclays-global-markets-disclosures.html regarding our standard terms for the Investment Bank of Barclays where we trade with you in principal-to-principal wholesale markets transactions; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.�  
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________
If you are incorporated or operating in Australia, please see https://www.home.barclays/disclosures/importantapacdisclosures.html for important disclosure.
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________
How we use personal information  see our privacy notice https://www.investmentbank.barclays.com/disclosures/personalinformationuse.html 
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Re: CacheInvalidStateException | all partition owners have left the grid, partition data has been lost

Posted by Ilya Korol <ll...@gmail.com>.
Hi,

Which Ignite version do you use? Have you enabled any optional ignite libs? Am I right that you're trying to call Ignite REST API? Do you know call to which endpoint produces described error? Which client do you use for calling this endpoint? Can you repeat this call via other tools like CURL and check that error is reproducible? Do you use some kind of SSL termination for ignite REST API?

Also it looks like you cluster is also facing some network/discovery issues (because of "all partition owners have left the grid, partition data has been lost" messages). Are you sure that your network is OK?

PS. I didn't find any tomcat usage in ignite repo (at least in current master), so I'm struggling to imagine how can this issue may appear in ignite, so please give us more details of your Ignite environment.



On 2021/07/20 21:42:58, <s....@barclays.com> wrote:
 > We have a 3 node cluster with persistence enabled, sometimes I see 
below error happening.>
 >
 > Ignite Server log>
 >
 > 2021-07-19 21:12:13,650 INFO [http-nio-9050-exec-1] 
o.a.c.h.Http11Processor [DirectJDKLog.java:175] Error parsing HTTP 
request header>
 > Note: further occurrences of HTTP request parsing errors will be 
logged at DEBUG level.>
 > java.lang.IllegalArgumentException: Invalid character found in method 
name 
[0x160x030x030x010xb40x010x000x010xb00x030x030x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x000x01p0x000x010x000x020x000x030x000x040x000x050x000x060x000x070x000x080x000x090x000x0a0x000x0b0x000x0c0x000x0d0x000x0e0x000x0f0x000x100x000x110x000x120x000x130x000x140x000x150x000x160x000x170x000x180x000x190x000x1a0x000x1b0x00/0x0000x0010x0020x0030x0040x0050x0060x0070x0080x0090x00:0x00;0x00<0x00=0x00>0x00?0x00@0x00g0x00h0x00i0x00j0x00k0x00l0x00m0x00A0x00B0x00C0x00D0x00E0x00F0x000x840x000x850x000x860x000x870x000x880x000x890x000xba0x000xbb0x000xbc0x000xbd0x000xbe0x000xbf0x000xc00x000xc10x000xc20x000xc30x000xc40x000xc50x000x9c0x000x9d0x000x9e0x000x9f0x000xa00x000xa10x000xa20x000xa30x000xa40x000xa50x000xa60x000xa70xc00x010xc00x020xc00x030xc00x040xc00x050xc00x060xc00x070xc00x080xc00x090xc00x0a0xc00x0b0xc00x0c0xc00x0d0xc00x0e0xc00x0f0xc00x100xc00x110xc00x120xc00x130xc00x140xc00x150xc00x160xc00x170xc00x180xc00x190xc0#0xc0$0xc0%0xc0&0xc0'0xc0(0xc0)0xc0*0xc0+0xc0,0xc0-0xc0.0xc0/0xc010xc000xc020xc0s0xc0r0xc0t0xc0u0xc0v0xc0w0xc0x0xc0z0xc0y0xc0{0xc0|0xc0}0xc0~0xc00x7f0xc00x800xc00x810xc00x820xc00x830xc00x840xc00x850xc00x860xc00x870xc00x880xc00x890xc00x8a0xc00x8b0xc00x8c0xc00x8d0xc00x8e0xc00x8f0xc00x900xc00x910xc00x920xc00x930xc00x940xc00x950xc00x960xc00x970xc00x980xc00x990xc00x9a0xc00x9b0x000x960x000x970x000x980x000x990x000x9a0x000x9b0xcc0xa80xcc0xa90xcc0xaa0xcc0xab0xcc0xac0xcc0xad0xcc0xae0x020x000x010x000x160x000x0a0x000x0a0x000x080x000x170x000x190x000x180x000x160x000x0b0x000x040x030x000x01...]. 
HTTP method names must be tokens>
 > at 
org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:418)> 

 > at 
org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:260)>
 > at 
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)> 

 > at 
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868)> 

 > at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590)> 

 > at 
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)> 

 > at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)> 

 > at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)> 

 > at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)> 

 > at java.lang.Thread.run(Thread.java:748)>
 > 2021-07-19 21:12:14,114 WARN 
[grid-nio-worker-tcp-rest-0-#38%blade-cache%] 
o.a.i.i.p.r.p.t.GridTcpRestProtocol [JavaLogger.java:295] Client 
disconnected abruptly due to network connection loss or because the 
connection was left open on application shutdown. [cls=class 
o.a.i.i.util.nio.GridNioException, msg=Failed to parse incoming packet 
(invalid packet start) [ses=GridSelectorNioSessionImpl 
[worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0 
lim=441 cap=8192], super=AbstractNioClientWorker [idx=0, bytesRcvd=0, 
bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker 
[name=grid-nio-worker-tcp-rest-0, igniteInstanceName=blade-cache, 
finished=false, heartbeatTs=1626743534110, hashCode=507494530, 
interrupted=false, 
runner=grid-nio-worker-tcp-rest-0-#38%blade-cache%]]], writeBuf=null, 
readBuf=null, inRecovery=null, outRecovery=null, closeSocket=true, 
outboundMessagesQueueSizeMetric=null, super=GridNioSessionImpl 
[locAddr=/10.148.213.242:11211, rmtAddr=/10.148.213.242:54330, 
createTime=1626743534110, closeTime=0, bytesSent=0, bytesRcvd=441, 
bytesSent0=0, bytesRcvd0=441, sndSchedTime=1626743534110, 
lastSndTime=1626743534110, lastRcvTime=1626743534110, readsPaused=false, 
filterChain=FilterChain[filters=[GridNioCodecFilter 
[parser=GridTcpRestParser [marsh=JdkMarshaller 
[clsFilter=o.a.i.marshaller.MarshallerUtils$1@dafac83], 
routerClient=false], directMode=false]], accepted=true, 
markedForClose=false]], b=16]]>
 > 2021-07-19 21:12:14,117 WARN 
[grid-nio-worker-tcp-rest-1-#39%blade-cache%] 
o.a.i.i.p.r.p.t.GridTcpRestProtocol [JavaLogger.java:295] Client 
disconnected abruptly due to network connection loss or because the 
connection was left open on application shutdown. [cls=class 
o.a.i.i.util.nio.GridNioException, msg=Failed to parse incoming packet 
(invalid packet start) [ses=GridSelectorNioSessionImpl 
[worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0 
lim=441 cap=8192], super=AbstractNioClientWorker [idx=1, bytesRcvd=0, 
bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker 
[name=grid-nio-worker-tcp-rest-1, igniteInstanceName=blade-cache, 
finished=false, heartbeatTs=1626743534110, hashCode=1070416289, 
interrupted=false, 
runner=grid-nio-worker-tcp-rest-1-#39%blade-cache%]]], writeBuf=null, 
readBuf=null, inRecovery=null, outRecovery=null, closeSocket=true, 
outboundMessagesQueueSizeMetric=null, super=GridNioSessionImpl 
[locAddr=/10.148.213.242:11211, rmtAddr=/10.148.213.242:54332, 
createTime=1626743534110, closeTime=0, bytesSent=0, bytesRcvd=441, 
bytesSent0=0, bytesRcvd0=441, sndSchedTime=1626743534110, 
lastSndTime=1626743534110, lastRcvTime=1626743534110, readsPaused=false, 
filte>
 >
 >
 > Client log>
 >
 > Caused by: 
org.apache.ignite.internal.processors.cache.CacheInvalidStateException: 
Failed to execute the cache operation (all partition owners have left 
the grid, partition data has been lost) 
[cacheName=USER_IGNITE_CACHE_PRIMARY, partition=79, 
key=UserKeyCacheObjectImpl [part=79, 
val=429E98A8-F594-4D1B-BE68-23BA6E027E25, hasValBytes=false]]>
 >
 > During this time , I verified the topology and saw all nodes to be up 
and running fine. I had to restart all the nodes to recover from this 
error.>
 >
 > Cache Configuration>
 >
 >
 > public CacheConfiguration<String, User> 
getPrimaryUserCacheConfiguration() {>
 > final CacheConfiguration<String, User> cacheConfiguration = new 
CacheConfiguration<>(CacheIdentifiers.USER_IGNITE_CACHE_PRIMARY.toString());> 

 >
 > 
cacheConfiguration.setName(CacheIdentifiers.USER_IGNITE_CACHE_PRIMARY.toString());> 

 > cacheConfiguration.setIndexedTypes(String.class, User.class);>
 > cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);>
 > cacheConfiguration.setStoreKeepBinary(true);>
 > RendezvousAffinityFunction rendezvousAffinityFunction = new 
RendezvousAffinityFunction();>
 > rendezvousAffinityFunction.setPartitions(512);>
 > cacheConfiguration.setBackups(1);>
 > cacheConfiguration.setAffinity(rendezvousAffinityFunction);>
 > //cacheConfiguration.setOnheapCacheEnabled(true);>
 > return cacheConfiguration;>
 > }>
 >
 > Could someone please help me understand why this error would occur 
and how to recover from this ?>
 >
 > Also I have configured health check to identify this situation but I 
was not able to determine with the below check. Am I doing something 
wrong?>
 >
 >
 > private Collection<Integer> getForLostPartitions(){>
 > Collection<Integer> lostPartitions = 
ignite.cache(CacheIdentifiers.USER_IGNITE_CACHE_PRIMARY.toString()).lostPartitions();> 

 > return lostPartitions;>
 > }>
 >
 >
 > if(checkForLostPartitions()){>
 > Health.down().withDetail("Lost 
Partitions",getForLostPartitions().size()).withDetail("Partitions Lost 
", getForLostPartitions().toArray()).build();>
 > }>
 >
 > Any help would be appreciated.>
 >
 > Thanks>
 >
 >
 >
 >
 >
 > 
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________> 

 > �This message is for information purposes only, it is not a 
recommendation, advice, offer or solicitation to buy or sell a product 
or service nor an official confirmation of any transaction. It is 
directed at persons who are professionals and is not intended for retail 
customer use. Intended for recipient only. This message is subject to 
the terms at: www.barclays.com/emaildisclaimer.>
 >
 > For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary 
from Barclays Sales and/or Trading, who are active market participants; 
https://www.investmentbank.barclays.com/disclosures/barclays-global-markets-disclosures.html 
regarding our standard terms for the Investment Bank of Barclays where 
we trade with you in principal-to-principal wholesale markets 
transactions; and in respect of Barclays Research, including disclosures 
relating to specific issuers, please see 
http://publicresearch.barclays.com.� >
 > 
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________> 

 > If you are incorporated or operating in Australia, please see 
https://www.home.barclays/disclosures/importantapacdisclosures.html for 
important disclosure.>
 > 
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________> 

 > How we use personal information see our privacy notice 
https://www.investmentbank.barclays.com/disclosures/personalinformationuse.html 
 >
 > 
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________> 

 >