You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Chris Berry <ch...@gmail.com> on 2018/03/30 15:26:43 UTC

Failed to process selector key

Hello,

We have a 36 Node Ignite Cluster in Production taking a relatively high
volume of Requests (6-10M Requests/hour)
In general, it behaves beautifully. 
But we see occasional Spikes in response time (1-2 per hour - affecting 1-6
Nodes)
Overall, these affect a handful of Requests (8-10 Requests)
But still, they are annoying to my downstream Clients.

In many of these Spikes, we see the following ERROR:

```
[2018-03-30T05:30:09,690Z](grid-nio-worker-tcp-comm-4-#45)([]) ERROR -
TcpCommunicationSpi - Failed to process selector key
[ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
[super=AbstractNioClientWorker [idx=4, bytesRcvd=12231681761,
bytesSent=1859740, bytesRcvd0=793212, bytesSent0=238, select=true,
super=GridWorker [name=grid-nio-worker-tcp-comm-4, igniteInstanceName=null,
finished=false, hashCode=792369266, interrupted=false,
runner=grid-nio-worker-tcp-comm-4-#45]]],
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
inRecovery=null, outRecovery=null, super=GridNioSessionImpl
[locAddr=/172.17.0.2:47111, rmtAddr=/10.120.0.216:48046,
createTime=1522387808681, closeTime=0, bytesSent=38, bytesRcvd=42,
bytesSent0=38, bytesRcvd0=42, sndSchedTime=1522387808681,
lastSndTime=1522387808681, lastRcvTime=1522387808681, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@792f8249, directMode=true],
GridConnectionBytesVerifyFilter], accepted=true]]]
! java.io.IOException: Connection reset by peer
```

My question.
Is this a symptom or a cause??
What exactly does this Exception mean??

My gut tells me that when I see these Spikes, Ignite may be doing something
under the covers??
And the ` Failed to process selector key` is indicative of that…
But I don't have enough information to isolate anything further

The app sees a high enough volume that it is very difficult to get the
logging tweaked to where we can find the needle in this haystack.
In other words, we cannot really turn on DEBUG logging and such - unless it
is _very_ targeted.

Can anyone give me any further information that may help me figure out what
is happening??
Is the `Failed to process selector key` just a symptom of a deeper problem??

Is there some configuration parameter that I can tweak to help Ignite not
see these issues??

Thanks much,
-- Chris 






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Failed to process selector key

Posted by Chris Berry <ch...@gmail.com>.
Thank you for responding.

No. It is definitely not fixed.

Looking at all ERROR & WARN logging on the Remote Host mentioned in the
`Failed to process selector key` message for a given Spike. 

```
[2018-03-30T12:26:26,051Z](grid-nio-worker-tcp-comm-2-#43)([]) ERROR -
TcpCommunicationSpi - Failed to process selector key
[ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
[super=AbstractNioClientWorker [idx=2, bytesRcvd=19287225257,
bytesSent=2896848, bytesRcvd0=824477, bytesSent0=118, select=true,
super=GridWorker [name=grid-nio-worker-tcp-comm-2, igniteInstanceName=null,
finished=false, hashCode=674407495, interrupted=false,
runner=grid-nio-worker-tcp-comm-2-#43]]],
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
inRecovery=null, outRecovery=null, super=GridNioSessionImpl
[locAddr=/172.17.0.2:47111, rmtAddr=/10.120.9.76:55768,
createTime=1522412785041, closeTime=0, bytesSent=38, bytesRcvd=42,
bytesSent0=38, bytesRcvd0=42, sndSchedTime=1522412785041,
lastSndTime=1522412785041, lastRcvTime=1522412785041, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@792f8249, directMode=true],
GridConnectionBytesVerifyFilter], accepted=true]]]
! java.io.IOException: Connection reset by peer
```

All ERRORs are simply `ComputeTaskTimeoutCheckedException` and `
GridTaskWorker - Task has timed out `

The only WARN that stands out is:

```
[2018-03-30T12:26:24,233Z](grid-nio-worker-tcp-comm-3-#44)([]) WARN -
TcpCommunicationSpi - Communication SPI session write timed out (consider
increasing 'socketWriteTimeout' configuration property)
[remoteAddr=/10.120.27.237:47111, writeTimeout=4000]
```

So, perhaps I need to simply increase my timeouts??

My current settings are these:
```
    communications:
       messageQueueLimit: <%= ENV['COMMUNICATIONS_MSSG_QUEUE_LIMIT'] || 1024
%>
       idleConnectionTimeout: <%= ENV['COMMUNICATIONS_IDLE_CONN_TIMEOUT'] ||
30000 %>
       socketWriteTimeout: <%= ENV['SOCKET_WRITE_TIMEOUT'] || 4000 %>
```

So only 4sec for the socketWriteTimeout.

I will give this a try.

Cheers, 
-- Chris




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Failed to process selector key

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

Connection reset by peer means that remote node closes connection.
Obviously, the reason may be in remote node logs.

Is the issue solved? or Can you share full logs?

On Fri, Mar 30, 2018 at 6:55 PM, Chris Berry <ch...@gmail.com> wrote:

> Sorry.
> I should add.
> This is a 36 Node Compute Grid.
> And our Spikes are always preceded by: `ComputeTaskTimeoutCheckedExcep
> tions`
>
> Although, these are clearly symptoms and not the ultimate cause.
>
> Thanks,
> -- Chris
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Failed to process selector key

Posted by Chris Berry <ch...@gmail.com>.
Sorry.
I should add. 
This is a 36 Node Compute Grid.
And our Spikes are always preceded by: `ComputeTaskTimeoutCheckedExceptions` 

Although, these are clearly symptoms and not the ultimate cause.

Thanks, 
-- Chris



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/