You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Rauan Maemirov <ra...@maemirov.com> on 2011/04/30 12:14:27 UTC

Determining the issues of marking node down

I have a test cluster with 3 nodes, earlier I've installed OpsCenter to
watch my cluster. Every day I see, that the same one node goes down (at
different time, but every day). Then I just run `service cassandra start` to
fix that problem. system.log doesn't show me anything strange. What are the
steps to determine issues? I didn't change logging properties (and
cassandra.yaml is not far away from the default), so maybe there must be
some options to be switched to debug?

Btw, the node that goes down is the most loaded (in storage capacity). Maybe
the problem is in OPP?
Once I've ran loadbalance command and it changed token for the first node
from 0 to one of the keys (without touching another 2, I've generated tokens
with tokens.py).

Re: Determining the issues of marking node down

Posted by aaron morton <aa...@thelastpickle.com>.
If the node is crashing with OutOfMemory it will be in the cassandra logs. Search them for "ERROR". Alternatively if you've installed a package the stdout and stderr may be redirected to a file called something like output.log in the same location as the log file.

You can change the logging using the log4j-server.properties file typically in the same location as cassandra.yaml. By detail they will be logging errors and warnings though. 

What does "nodetool ring" say about the token distribution ? If you are using the OPP you need to make sure your app is evening distributing the keys to avoid hot spots. 

Hope that helps.
Aaron
 
On 30 Apr 2011, at 22:14, Rauan Maemirov wrote:

> I have a test cluster with 3 nodes, earlier I've installed OpsCenter to watch my cluster. Every day I see, that the same one node goes down (at different time, but every day). Then I just run `service cassandra start` to fix that problem. system.log doesn't show me anything strange. What are the steps to determine issues? I didn't change logging properties (and cassandra.yaml is not far away from the default), so maybe there must be some options to be switched to debug?
> 
> Btw, the node that goes down is the most loaded (in storage capacity). Maybe the problem is in OPP?
> Once I've ran loadbalance command and it changed token for the first node from 0 to one of the keys (without touching another 2, I've generated tokens with tokens.py).