You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Denis <de...@camfex.cz> on 2015/10/22 23:13:33 UTC

Tserver's strange state.

Hi

Sometimes my Tablet Servers go into a strange state: they have some
very old scans (see picture: http://i.imgur.com/2sOUM99.png) and being
in this state they cannot be decomissioned gracefully using "accumulo
stop" - number of their tablets decreases down to some fixed number
(say from 6K tablets to 2K), not to zero.
It is diffucult to reproduce.
Now I have a live system with 2 tabletservers in this state.
Any suggestions how to catch the bug?

Re: Tserver's strange state.

Posted by Denis <de...@camfex.cz>.
Both servers has the errors in the logs like these:

========
2015-10-22 03:28:00,599 ERROR
org.apache.accumulo.core.client.impl.Writer: error sending update to
10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: 120000 millis timeout while waiting
for channel to be ready for re
ad. ch : java.nio.channels.SocketChannel[connected
local=/10.2.142.1:36148 remote=/10.2.130.1:9997]
2015-10-22 03:28:04,283 ERROR
org.apache.accumulo.core.client.impl.Writer: error sending update to
10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: 120000 millis timeout while waiting
for channel to be ready for re
ad. ch : java.nio.channels.SocketChannel[connected
local=/10.2.142.1:37047 remote=/10.2.130.1:9997]
2015-10-22 03:28:06,116 ERROR
org.apache.accumulo.core.client.impl.Writer: error sending update to
10.2.130.1:9997: org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: 120000 millis timeout while waiting
for channel to be ready for re
ad. ch : java.nio.channels.SocketChannel[connected
local=/10.2.142.1:37167 remote=/10.2.130.1:9997]
========

On 10/22/15, Denis <de...@camfex.cz> wrote:
> Hi
>
> Sometimes my Tablet Servers go into a strange state: they have some
> very old scans (see picture: http://i.imgur.com/2sOUM99.png) and being
> in this state they cannot be decomissioned gracefully using "accumulo
> stop" - number of their tablets decreases down to some fixed number
> (say from 6K tablets to 2K), not to zero.
> It is diffucult to reproduce.
> Now I have a live system with 2 tabletservers in this state.
> Any suggestions how to catch the bug?
>