You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Xiangfei Ni <xi...@cm-dt.com> on 2018/03/27 04:35:55 UTC

答复: 答复: A node down every day in a 6 nodes cluster

I have checked the dmesg and message logs ,there is no eth* content in it.so I think there was no network connection issue.

Best Regards,

倪项菲/ David Ni
中移德电网络科技有限公司
Virtue Intelligent Network Ltd, co.
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
Mob: +86 13797007811|Tel: + 86 27 5024 2516

发件人: daemeon reiydelle <da...@gmail.com>
发送时间: 2018年3月27日 11:42
收件人: user <us...@cassandra.apache.org>
主题: Re: 答复: A node down every day in a 6 nodes cluster

Look for errors on your network interface. I think you have periodic errors in your network connectivity


<======>
"Who do you think made the first stone spear? The Asperger guy.
If you get rid of the autism genetics, there would be no Silicon Valley"
Temple Grandin
Daemeon C.M. Reiydelle
San Francisco 1.415.501.0198
London 44 020 8144 9872

On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xi...@cm-dt.com>> wrote:
Hi Jeff,
    I need to restart the node manually every time,only one node has this problem.
    I have attached the nodetool output,thanks.

Best Regards,

倪项菲/ David Ni
中移德电网络科技有限公司
Virtue Intelligent Network Ltd, co.
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
Mob: +86 13797007811<tel:+86%20137%209700%207811>|Tel: + 86 27 5024 2516<tel:+86%2027%205024%202516>

发件人: Jeff Jirsa <jj...@gmail.com>>
发送时间: 2018年3月27日 11:03
收件人: user@cassandra.apache.org<ma...@cassandra.apache.org>
主题: Re: A node down every day in a 6 nodes cluster

That warning isn’t sufficient to understand why the node is going down


Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is likely a good idea

Are the nodes coming up on their own? Or are you restarting them?

Paste the output of nodetool tpstats and nodetool cfstats



--
Jeff Jirsa


On Mar 26, 2018, at 7:56 PM, Xiangfei Ni <xi...@cm-dt.com>> wrote:
Hi Cassandra experts,
  I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster is just in one DC,
  Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is 3,a node downs one time every day,the system.log shows below info:
WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128 CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize #<User nev_tsp_sa> for <table nev_prod_tsp.latest_rt_alarm>
ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129 QueryMessage.java:128 - Unexpected error during query
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) ~[guava-18.0.jar:na]
        at com.google.common.cache.LocalCache.get(LocalCache.java:3937) ~[guava-18.0.jar:na]
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) ~[guava-18.0.jar:na]
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) ~[guava-18.0.jar:na]
        at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.service.ClientState.authorize(ClientState.java:419) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:352) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:329) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:316) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:300) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:211) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:185) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:219) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:204) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513) [apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407) [apache-cassandra-3.9.jar:3.9]
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.39.Final.jar:4.0.39.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366) [netty-all-4.0.39.Final.jar:4.0.39.Final]
        at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35) [netty-all-4.0.39.Final.jar:4.0.39.Final]
        at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:357) [netty-all-4.0.39.Final.jar:4.0.39.Final]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
        at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) [apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [apache-cassandra-3.9.jar:3.9]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
        at org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:102) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.auth.PermissionsCache.lambda$new$0(PermissionsCache.java:37) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.auth.AuthCache$1.load(AuthCache.java:183) ~[apache-cassandra-3.9.jar:3.9]
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) ~[guava-18.0.jar:na]
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) ~[guava-18.0.jar:na]
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282) ~[guava-18.0.jar:na]
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197) ~[guava-18.0.jar:na]
        ... 26 common frames omitted
Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
        at org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1718) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1667) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1608) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1527) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:975) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:271) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:232) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.auth.CassandraAuthorizer.addPermissionsForRole(CassandraAuthorizer.java:227) ~[apache-cassandra-3.9.jar:3.9]
        at org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:93) ~[apache-cassandra-3.9.jar:3.9]
        ... 32 common frames omitted
WARN  [Native-Transport-Requests-23] 2018-03-26 18:53:17,131 CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize #<User nev_tsp_sa> for <table nev_prod_tsp.rt_alarm_unite>
ERROR [Native-Transport-Requests-64] 2018-03-26 18:53:17,135 QueryMessage.java:128 - Unexpected error during query
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) ~[guava-18.0.jar:na]

I have confirmed that nev_tsp_sa has all rights on nev_prod_tsp keyspace:
cassandra@cqlsh:system_auth> select * from role_permissions where role = 'nev_tsp_sa';

role       | resource          | permissions
------------+-------------------+--------------------------------------------------------------
nev_tsp_sa | data/nev_prod_tsp | {'ALTER', 'AUTHORIZE', 'CREATE', 'DROP', 'MODIFY', 'SELECT'}

the cache disk can be read/write as normal.

Highly appreciated if anyone can help,thanks very much !


Best Regards,

倪项菲/ David Ni
中移德电网络科技有限公司
Virtue Intelligent Network Ltd, co.
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
Mob: +86 13797007811<tel:+86%20137%209700%207811>|Tel: + 86 27 5024 2516<tel:+86%2027%205024%202516>



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org<ma...@cassandra.apache.org>
For additional commands, e-mail: user-help@cassandra.apache.org<ma...@cassandra.apache.org>