You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Daning Wang <da...@netseer.com> on 2012/04/05 19:30:40 UTC
Request timeout and host marked down
Hi all,
We are using Hector and ofter we see lots of timeout exception in the log,
I know that the hector can failover to other node, but I want to reduce the
number of timeouts.
any hector parameter I should change to reduce this error?
also, on the server side, any kind of tunning need to do for the timeout?
Thanks in advance.
12/04/04 15:13:20 ERROR
com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
12/04/04 15:13:25 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.28.78.123(10.28.78.123):9160
12/04/04 15:13:25 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
<ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160};
IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
12/04/04 15:13:44 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.240.113.171(10.240.113.171):9160
12/04/04 15:13:44 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
<ConcurrentCassandraClientPoolByHost>:{10.240.113.171(10.240.113.171):9160};
IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.28.78.123(10.28.78.123):9160
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
<ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160};
IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.123.83.114(10.123.83.114):9160
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
<ConcurrentCassandraClientPoolByHost>:{10.123.83.114(10.123.83.114):9160};
IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.6.115.239(10.6.115.239):9160
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
<ConcurrentCassandraClientPoolByHost>:{10.6.115.239(10.6.115.239):9160};
IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
12/04/04 15:13:49 ERROR
com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
12/04/04 15:13:49 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.120.205.48(10.120.205.48):9160
12/04/04 15:13:49 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
<ConcurrentCassandraClientPoolByHost>:{10.120.205.48(10.120.205.48):9160};
IsActive?: true; Active: 3; Blocked: 0; Idle: 3; NumBeforeExhausted: 17
12/04/04 15:13:50 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.28.20.200(10.28.20.200):9160
12/04/04 15:13:50 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
<ConcurrentCassandraClientPoolByHost>:{10.28.20.200(10.28.20.200):9160};
IsActive?: true; Active: 2; Blocked: 0; Idle: 4; NumBeforeExhausted: 18
12/04/04 15:13:51 ERROR
com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
Re: Request timeout and host marked down
Posted by Daning Wang <da...@netseer.com>.
Thanks Aaron, will seek help from hector team.
On Tue, Apr 10, 2012 at 3:41 AM, aaron morton <aa...@thelastpickle.com>wrote:
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> ... 31 more
>
> This looks like a client side timeout to me.
>
> AFAIK it will use this
>
> http://rantav.github.com/hector//source/content/API/core/1.0-1/me/prettyprint/cassandra/service/CassandraHost.html#getCassandraThriftSocketTimeout()
>
> if it's > 0 otherwise the value of the CASSANDRA_THRIFT_SOCKET_TIMEOUT JVM
> param
>
> otherwise 0 i think.
>
> Hector is one of the many things I am not an expert on. Try the hector
> user list if you are still having problems.
>
>
>
> [cassy@s2.dsat4 ~]$ ~/bin/nodetool -h localhost tpstats
> Pool Name Active Pending Completed Blocked
> All time blocked
> ReadStage 3 3 414129625
> 0 0
>
> Looks fine.
>
> Hope that helps.
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10/04/2012, at 8:08 AM, Daning Wang wrote:
>
> Thanks Aaron! Here is the exception, is that the timeout between nodes?
> any parameter I can change to reduce timeout?
>
> me.prettyprint.hector.api.exceptions.HectorTransportException:
> org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
> at
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33)
> at
> me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:130)
> at
> me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:100)
> at
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
> at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246)
> at
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
> at
> me.prettyprint.cassandra.model.CqlQuery.execute(CqlQuery.java:99)
> at
> com.netseer.cassandra.cache.dao.CacheReader.getRows(CacheReader.java:267)
> at
> com.netseer.cassandra.cache.dao.CacheReader.getCache0(CacheReader.java:55)
> at
> com.netseer.cassandra.cache.dao.CacheDao.getCaches(CacheDao.java:85)
> at
> com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:71)
> at
> com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:149)
> at
> com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:55)
> at
> com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:28)
> at
> com.netseer.dsat.cache.CassandraDSATCacheImpl.get(CassandraDSATCacheImpl.java:62)
> at
> com.netseer.dsat.cache.CassandraDSATCacheImpl.getTimedValue(CassandraDSATCacheImpl.java:144)
> at
> com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:427)
> at
> com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:423)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1698)
> at
> org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1682)
> at
> me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:106)
> ... 21 more
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> ... 31 more
>
>
> *and there is the tpstats*
>
> [cassy@s2.dsat4 ~]$ ~/bin/nodetool -h localhost tpstats
> Pool Name Active Pending Completed Blocked
> All time blocked
> ReadStage 3 3 414129625
> 0 0
> RequestResponseStage 0 0 300591600
> 0 0
> MutationStage 0 0 96585276
> 0 0
> ReadRepairStage 0 0 94185465
> 0 0
> ReplicateOnWriteStage 0 0 0
> 0 0
> GossipStage 0 0 2684813
> 0 0
> AntiEntropyStage 0 0 5436
> 0 0
> MigrationStage 0 0 22
> 0 0
> MemtablePostFlusher 0 0 3553
> 0 0
> StreamStage 0 0 167
> 0 0
> FlushWriter 0 0 3582
> 0 23
> MiscStage 0 0 1163
> 0 0
> AntiEntropySessions 0 0 399
> 0 0
> InternalResponseStage 0 0 0
> 0 0
> HintedHandoff 0 0 2746
> 0 0
>
> Message type Dropped
> RANGE_SLICE 0
> READ_REPAIR 17931
> BINARY 0
> READ 5185149
> MUTATION 232317
> REQUEST_RESPONSE 1317
>
>
>
>
>
>
> On Sun, Apr 8, 2012 at 2:15 PM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> You need to see if the timeout is from the client to the server, or
>> between the server nodes.
>>
>> If it's server side a TimedOutException will be thrown from thrift. Take
>> a look at the nodetool tpstats on the servers, you will probably see lots
>> of "Pending" tasks. Basically the cluster is overloaded. Consider:
>>
>> * check the IO, CPU, GC state on the servers.
>> * ensuring the data and requests are evenly spread around the cluster.
>> * reducing the number of columns read in a select.
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 6/04/2012, at 5:30 AM, Daning Wang wrote:
>>
>> > Hi all,
>> >
>> > We are using Hector and ofter we see lots of timeout exception in the
>> log, I know that the hector can failover to other node, but I want to
>> reduce the number of timeouts.
>> >
>> > any hector parameter I should change to reduce this error?
>> >
>> > also, on the server side, any kind of tunning need to do for the
>> timeout?
>> >
>> >
>> > Thanks in advance.
>> >
>> >
>> > 12/04/04 15:13:20 ERROR
>> com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
>> > 12/04/04 15:13:25 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
>> TRIGGERED for host 10.28.78.123(10.28.78.123):9160
>> > 12/04/04 15:13:25 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
>> shutdown:
>> <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160};
>> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
>> > 12/04/04 15:13:44 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
>> TRIGGERED for host 10.240.113.171(10.240.113.171):9160
>> > 12/04/04 15:13:44 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
>> shutdown:
>> <ConcurrentCassandraClientPoolByHost>:{10.240.113.171(10.240.113.171):9160};
>> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
>> > 12/04/04 15:13:46 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
>> TRIGGERED for host 10.28.78.123(10.28.78.123):9160
>> > 12/04/04 15:13:46 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
>> shutdown:
>> <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160};
>> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
>> > 12/04/04 15:13:46 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
>> TRIGGERED for host 10.123.83.114(10.123.83.114):9160
>> > 12/04/04 15:13:46 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
>> shutdown:
>> <ConcurrentCassandraClientPoolByHost>:{10.123.83.114(10.123.83.114):9160};
>> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
>> > 12/04/04 15:13:46 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
>> TRIGGERED for host 10.6.115.239(10.6.115.239):9160
>> > 12/04/04 15:13:46 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
>> shutdown:
>> <ConcurrentCassandraClientPoolByHost>:{10.6.115.239(10.6.115.239):9160};
>> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
>> > 12/04/04 15:13:49 ERROR
>> com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
>> > 12/04/04 15:13:49 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
>> TRIGGERED for host 10.120.205.48(10.120.205.48):9160
>> > 12/04/04 15:13:49 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
>> shutdown:
>> <ConcurrentCassandraClientPoolByHost>:{10.120.205.48(10.120.205.48):9160};
>> IsActive?: true; Active: 3; Blocked: 0; Idle: 3; NumBeforeExhausted: 17
>> > 12/04/04 15:13:50 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
>> TRIGGERED for host 10.28.20.200(10.28.20.200):9160
>> > 12/04/04 15:13:50 ERROR
>> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
>> shutdown:
>> <ConcurrentCassandraClientPoolByHost>:{10.28.20.200(10.28.20.200):9160};
>> IsActive?: true; Active: 2; Blocked: 0; Idle: 4; NumBeforeExhausted: 18
>> > 12/04/04 15:13:51 ERROR
>> com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
>>
>>
>
>
Re: Request timeout and host marked down
Posted by aaron morton <aa...@thelastpickle.com>.
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> ... 31 more
This looks like a client side timeout to me.
AFAIK it will use this
http://rantav.github.com/hector//source/content/API/core/1.0-1/me/prettyprint/cassandra/service/CassandraHost.html#getCassandraThriftSocketTimeout()
if it's > 0 otherwise the value of the CASSANDRA_THRIFT_SOCKET_TIMEOUT JVM param
otherwise 0 i think.
Hector is one of the many things I am not an expert on. Try the hector user list if you are still having problems.
>
> [cassy@s2.dsat4 ~]$ ~/bin/nodetool -h localhost tpstats
> Pool Name Active Pending Completed Blocked All time blocked
> ReadStage 3 3 414129625 0 0
Looks fine.
Hope that helps.
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 10/04/2012, at 8:08 AM, Daning Wang wrote:
> Thanks Aaron! Here is the exception, is that the timeout between nodes? any parameter I can change to reduce timeout?
>
> me.prettyprint.hector.api.exceptions.HectorTransportException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
> at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33)
> at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:130)
> at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:100)
> at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
> at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246)
> at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
> at me.prettyprint.cassandra.model.CqlQuery.execute(CqlQuery.java:99)
> at com.netseer.cassandra.cache.dao.CacheReader.getRows(CacheReader.java:267)
> at com.netseer.cassandra.cache.dao.CacheReader.getCache0(CacheReader.java:55)
> at com.netseer.cassandra.cache.dao.CacheDao.getCaches(CacheDao.java:85)
> at com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:71)
> at com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:149)
> at com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:55)
> at com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:28)
> at com.netseer.dsat.cache.CassandraDSATCacheImpl.get(CassandraDSATCacheImpl.java:62)
> at com.netseer.dsat.cache.CassandraDSATCacheImpl.getTimedValue(CassandraDSATCacheImpl.java:144)
> at com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:427)
> at com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:423)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
> at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1698)
> at org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1682)
> at me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:106)
> ... 21 more
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> ... 31 more
>
>
> and there is the tpstats
>
> [cassy@s2.dsat4 ~]$ ~/bin/nodetool -h localhost tpstats
> Pool Name Active Pending Completed Blocked All time blocked
> ReadStage 3 3 414129625 0 0
> RequestResponseStage 0 0 300591600 0 0
> MutationStage 0 0 96585276 0 0
> ReadRepairStage 0 0 94185465 0 0
> ReplicateOnWriteStage 0 0 0 0 0
> GossipStage 0 0 2684813 0 0
> AntiEntropyStage 0 0 5436 0 0
> MigrationStage 0 0 22 0 0
> MemtablePostFlusher 0 0 3553 0 0
> StreamStage 0 0 167 0 0
> FlushWriter 0 0 3582 0 23
> MiscStage 0 0 1163 0 0
> AntiEntropySessions 0 0 399 0 0
> InternalResponseStage 0 0 0 0 0
> HintedHandoff 0 0 2746 0 0
>
> Message type Dropped
> RANGE_SLICE 0
> READ_REPAIR 17931
> BINARY 0
> READ 5185149
> MUTATION 232317
> REQUEST_RESPONSE 1317
>
>
>
>
>
>
> On Sun, Apr 8, 2012 at 2:15 PM, aaron morton <aa...@thelastpickle.com> wrote:
> You need to see if the timeout is from the client to the server, or between the server nodes.
>
> If it's server side a TimedOutException will be thrown from thrift. Take a look at the nodetool tpstats on the servers, you will probably see lots of "Pending" tasks. Basically the cluster is overloaded. Consider:
>
> * check the IO, CPU, GC state on the servers.
> * ensuring the data and requests are evenly spread around the cluster.
> * reducing the number of columns read in a select.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/04/2012, at 5:30 AM, Daning Wang wrote:
>
> > Hi all,
> >
> > We are using Hector and ofter we see lots of timeout exception in the log, I know that the hector can failover to other node, but I want to reduce the number of timeouts.
> >
> > any hector parameter I should change to reduce this error?
> >
> > also, on the server side, any kind of tunning need to do for the timeout?
> >
> >
> > Thanks in advance.
> >
> >
> > 12/04/04 15:13:20 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
> > 12/04/04 15:13:25 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.78.123(10.28.78.123):9160
> > 12/04/04 15:13:25 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> > 12/04/04 15:13:44 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.240.113.171(10.240.113.171):9160
> > 12/04/04 15:13:44 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.240.113.171(10.240.113.171):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> > 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.78.123(10.28.78.123):9160
> > 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> > 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.123.83.114(10.123.83.114):9160
> > 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.123.83.114(10.123.83.114):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> > 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.6.115.239(10.6.115.239):9160
> > 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.6.115.239(10.6.115.239):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> > 12/04/04 15:13:49 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
> > 12/04/04 15:13:49 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.120.205.48(10.120.205.48):9160
> > 12/04/04 15:13:49 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.120.205.48(10.120.205.48):9160}; IsActive?: true; Active: 3; Blocked: 0; Idle: 3; NumBeforeExhausted: 17
> > 12/04/04 15:13:50 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.20.200(10.28.20.200):9160
> > 12/04/04 15:13:50 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.20.200(10.28.20.200):9160}; IsActive?: true; Active: 2; Blocked: 0; Idle: 4; NumBeforeExhausted: 18
> > 12/04/04 15:13:51 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
>
>
Re: Request timeout and host marked down
Posted by Daning Wang <da...@netseer.com>.
Thanks Aaron! Here is the exception, is that the timeout between nodes? any
parameter I can change to reduce timeout?
me.prettyprint.hector.api.exceptions.HectorTransportException:
org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out
at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33)
at
me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:130)
at
me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:100)
at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246)
at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
at me.prettyprint.cassandra.model.CqlQuery.execute(CqlQuery.java:99)
at
com.netseer.cassandra.cache.dao.CacheReader.getRows(CacheReader.java:267)
at
com.netseer.cassandra.cache.dao.CacheReader.getCache0(CacheReader.java:55)
at
com.netseer.cassandra.cache.dao.CacheDao.getCaches(CacheDao.java:85)
at
com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:71)
at
com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:149)
at
com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:55)
at
com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:28)
at
com.netseer.dsat.cache.CassandraDSATCacheImpl.get(CassandraDSATCacheImpl.java:62)
at
com.netseer.dsat.cache.CassandraDSATCacheImpl.getTimedValue(CassandraDSATCacheImpl.java:144)
at
com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:427)
at
com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:423)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1698)
at
org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1682)
at
me.prettyprint.cassandra.model.CqlQuery$1.execute(CqlQuery.java:106)
... 21 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 31 more
*and there is the tpstats*
[cassy@s2.dsat4 ~]$ ~/bin/nodetool -h localhost tpstats
Pool Name Active Pending Completed Blocked All
time blocked
ReadStage 3 3 414129625
0 0
RequestResponseStage 0 0 300591600
0 0
MutationStage 0 0 96585276
0 0
ReadRepairStage 0 0 94185465
0 0
ReplicateOnWriteStage 0 0 0
0 0
GossipStage 0 0 2684813
0 0
AntiEntropyStage 0 0 5436
0 0
MigrationStage 0 0 22
0 0
MemtablePostFlusher 0 0 3553
0 0
StreamStage 0 0 167
0 0
FlushWriter 0 0 3582
0 23
MiscStage 0 0 1163
0 0
AntiEntropySessions 0 0 399
0 0
InternalResponseStage 0 0 0
0 0
HintedHandoff 0 0 2746
0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 17931
BINARY 0
READ 5185149
MUTATION 232317
REQUEST_RESPONSE 1317
On Sun, Apr 8, 2012 at 2:15 PM, aaron morton <aa...@thelastpickle.com>wrote:
> You need to see if the timeout is from the client to the server, or
> between the server nodes.
>
> If it's server side a TimedOutException will be thrown from thrift. Take a
> look at the nodetool tpstats on the servers, you will probably see lots of
> "Pending" tasks. Basically the cluster is overloaded. Consider:
>
> * check the IO, CPU, GC state on the servers.
> * ensuring the data and requests are evenly spread around the cluster.
> * reducing the number of columns read in a select.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/04/2012, at 5:30 AM, Daning Wang wrote:
>
> > Hi all,
> >
> > We are using Hector and ofter we see lots of timeout exception in the
> log, I know that the hector can failover to other node, but I want to
> reduce the number of timeouts.
> >
> > any hector parameter I should change to reduce this error?
> >
> > also, on the server side, any kind of tunning need to do for the timeout?
> >
> >
> > Thanks in advance.
> >
> >
> > 12/04/04 15:13:20 ERROR
> com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
> > 12/04/04 15:13:25 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
> TRIGGERED for host 10.28.78.123(10.28.78.123):9160
> > 12/04/04 15:13:25 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
> shutdown:
> <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160};
> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> > 12/04/04 15:13:44 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
> TRIGGERED for host 10.240.113.171(10.240.113.171):9160
> > 12/04/04 15:13:44 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
> shutdown:
> <ConcurrentCassandraClientPoolByHost>:{10.240.113.171(10.240.113.171):9160};
> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> > 12/04/04 15:13:46 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
> TRIGGERED for host 10.28.78.123(10.28.78.123):9160
> > 12/04/04 15:13:46 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
> shutdown:
> <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160};
> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> > 12/04/04 15:13:46 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
> TRIGGERED for host 10.123.83.114(10.123.83.114):9160
> > 12/04/04 15:13:46 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
> shutdown:
> <ConcurrentCassandraClientPoolByHost>:{10.123.83.114(10.123.83.114):9160};
> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> > 12/04/04 15:13:46 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
> TRIGGERED for host 10.6.115.239(10.6.115.239):9160
> > 12/04/04 15:13:46 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
> shutdown:
> <ConcurrentCassandraClientPoolByHost>:{10.6.115.239(10.6.115.239):9160};
> IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> > 12/04/04 15:13:49 ERROR
> com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
> > 12/04/04 15:13:49 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
> TRIGGERED for host 10.120.205.48(10.120.205.48):9160
> > 12/04/04 15:13:49 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
> shutdown:
> <ConcurrentCassandraClientPoolByHost>:{10.120.205.48(10.120.205.48):9160};
> IsActive?: true; Active: 3; Blocked: 0; Idle: 3; NumBeforeExhausted: 17
> > 12/04/04 15:13:50 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
> TRIGGERED for host 10.28.20.200(10.28.20.200):9160
> > 12/04/04 15:13:50 ERROR
> me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
> shutdown:
> <ConcurrentCassandraClientPoolByHost>:{10.28.20.200(10.28.20.200):9160};
> IsActive?: true; Active: 2; Blocked: 0; Idle: 4; NumBeforeExhausted: 18
> > 12/04/04 15:13:51 ERROR
> com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
>
>
Re: Request timeout and host marked down
Posted by aaron morton <aa...@thelastpickle.com>.
You need to see if the timeout is from the client to the server, or between the server nodes.
If it's server side a TimedOutException will be thrown from thrift. Take a look at the nodetool tpstats on the servers, you will probably see lots of "Pending" tasks. Basically the cluster is overloaded. Consider:
* check the IO, CPU, GC state on the servers.
* ensuring the data and requests are evenly spread around the cluster.
* reducing the number of columns read in a select.
Cheers
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 6/04/2012, at 5:30 AM, Daning Wang wrote:
> Hi all,
>
> We are using Hector and ofter we see lots of timeout exception in the log, I know that the hector can failover to other node, but I want to reduce the number of timeouts.
>
> any hector parameter I should change to reduce this error?
>
> also, on the server side, any kind of tunning need to do for the timeout?
>
>
> Thanks in advance.
>
>
> 12/04/04 15:13:20 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
> 12/04/04 15:13:25 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.78.123(10.28.78.123):9160
> 12/04/04 15:13:25 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> 12/04/04 15:13:44 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.240.113.171(10.240.113.171):9160
> 12/04/04 15:13:44 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.240.113.171(10.240.113.171):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.78.123(10.28.78.123):9160
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.78.123(10.28.78.123):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.123.83.114(10.123.83.114):9160
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.123.83.114(10.123.83.114):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.6.115.239(10.6.115.239):9160
> 12/04/04 15:13:46 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.6.115.239(10.6.115.239):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
> 12/04/04 15:13:49 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms
> 12/04/04 15:13:49 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.120.205.48(10.120.205.48):9160
> 12/04/04 15:13:49 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.120.205.48(10.120.205.48):9160}; IsActive?: true; Active: 3; Blocked: 0; Idle: 3; NumBeforeExhausted: 17
> 12/04/04 15:13:50 ERROR me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN TRIGGERED for host 10.28.20.200(10.28.20.200):9160
> 12/04/04 15:13:50 ERROR me.prettyprint.cassandra.connection.HConnectionManager: Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{10.28.20.200(10.28.20.200):9160}; IsActive?: true; Active: 2; Blocked: 0; Idle: 4; NumBeforeExhausted: 18
> 12/04/04 15:13:51 ERROR com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 10000 ms