You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alex Quan <al...@tinkur.com> on 2010/12/23 19:42:01 UTC

Having trouble getting cassandra to stay up

Hi,

I am a newbie to cassandra and am using cassandra RC 2. I initially have
cassndra working on one node and was able to create keyspace, column
families and populate the database fine. I tried adding a second node by
changing the seed to point to another node and setting listen_address and
rpc_address to blank. I then started up the second node and it seems to have
connected fine using the node tool but after that I couldn't get it to
accept any commands and whenever I tried to make a new keyspace or column
family it would kill my initial node after a message like this:

 INFO 18:19:49,335 switching in a fresh Memtable for Schema at
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.log',
position=9143)
 INFO 18:19:49,335 Enqueuing flush of Memtable-Schema@1358138608(2410 bytes,
5 operations)
Killed

and the next few time I start up the server a similar would pop up until I
am guessing all the stuff is flushed out then it would start fine until I
tried to add anything to it. I tried changing back the yaml file back to the
original setup and this still happens. I don't know what to try to get it to
work properly, if you guys can help I would be really grateful

Alex

Re: Having trouble getting cassandra to stay up

Posted by Stu Hood <st...@gmail.com>.
With a very small amount of memory, the Cassandra process may be getting
killed by the Linux OOM killer, which should result in a log message to the
kernel logs. See
http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killerto
locate the error if it exists.

On Fri, Dec 24, 2010 at 6:46 PM, Dan Hendry <da...@gmail.com>wrote:

> One last clarification given you are running with -f, “fully die”=return to
> command prompt with no action on your part? If you ctrl-c from Cassandra
> when running in foreground mode (ie with –f), the process WILL be killed.
> Try running in background mode (without the –f).
>
>
>
> Removing the contents of /var/lib/Cassandra/ and using the default
> Cassandra.yaml and Cassandra-env.sh is effectively the same as a complete
> reset. You can also just delete then re-un-tar the provided tarball.
>
>
>
> Given the limited amount of ram on a micro instance, you might try using
> JNA (download from
> https://jna.dev.java.net/servlets/ProjectDocumentList?folderID=12329&expandFolder=12329&folderID=0and put the jar in cassandras lib directory, see the
> https://issues.apache.org/jira/browse/CASSANDRA-1214) or setting
> disk_access_mode: standard in cassandra.yaml.
>
>
>
> Other than that, I am out of ideas: perhaps somebody else can comment. I
> have set up Cassandra 0.7 RC2 on various EC2 ubuntu 10.10 instances with no
> issue (although not a micro in quite some time). Having problems with a
> stock ubuntu image, and the provided Cassandra tarball and with no tinkering
> with the cassandra or system settings is very strange. Again if worse comes
> to worse, start with a fresh m1.small instance; it takes me less that a ½
> hour to be up and running from scratch.
>
>
>
> Dan
>
>
>
> *From:* Alex Quan [mailto:alex.quan@tinkur.com]
> *Sent:* December-24-10 17:44
> *To:* user@cassandra.apache.org
> *Subject:* Re: Having trouble getting cassandra to stay up
>
>
>
> I am running the bin/cassandra with the -f option and it does seem to fully
> die and not just stalling.
>
> I have also tried using the cassandra-cli to create keyspace and it works
> for a little bit and then will die slightly after accepting the request the
> vmstat after it dies is as follows:
>
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  0      0 311240    424  23356    0    0    14     4   13    2  0  0 99
> 0
>
> I also tried the cassandra-cli creating keyspace after I deleted all the
> content of cassandra/data and cassandra/commitlog and it still is dying
> almost immediately after the keyspace creation I am not sure why this is the
> case. Is there a way to fully remove cassandra and start off with a fully
> fresh copy?
>
> Thanks
>
> Alex
>
> On Fri, Dec 24, 2010 at 1:42 PM, Dan Hendry <da...@gmail.com>
> wrote:
>
> Hum, very strange.
>
>
>
> More what I was trying to get at was: did the process truly die or was it
> just non-responsive and looking like it was dead? It would be very strange
> if the actual process was dying without any warnings in the logs. Presumably
> you are running bin/cassandra *without* the -f option? What is the output of
> top/vmstat on the dead node after Cassandra has 'died'? Sorry I was not
> clear on this initially.
>
> I have no experience with pycassa but you might want to try using the
> Cassandra CLI to create keyspaces and column families to rule out some sort
> of client weirdness. Also, you haven't made any changes to cassandra-env.sh
> have you? EC2 micros have a very limited amount of ram. I have also seen
> their CPU bursting cause problems but that does not seem to be the issue
> here. I might also suggest you try a m1.small instead just to be safe; they
> are still pretty cheap when you run then as spot-instances.
>
>
>
> As a last ditch effort (given that this is a test cluster), you can delete
> the contents of /var/lib/cassandra/data/*. /var/lib/cassandra/commitlog/* to
> effectively reset your nodes.
>
>
>
> On Fri, Dec 24, 2010 at 12:48 PM, Alex Quan <al...@tinkur.com> wrote:
>
> Sorry but I am not sure how to answer all the question that you have posed
> since a lot of the stuff I am working with is quite new to me and I haven't
> use many of the tools that are talked about but I will try my best to answer
> the question to the best of my knowledge. I am trying to get the cassandra
> to run between 2 nodes that are both Amazon's ec2 micro instances, I believe
> they are using a 64 bit linux ubuntu 10.01 using java version 1.6.0_23. When
> I said killed it was what was outputted into the console when the process
> died so I am not sure what that exactly means. Here is some of the info
> before cassandra went down:
>
> ring:
>
> Address         Status State   Load            Owns
> Token
>
> 111232248257764777335763873822010980488
> 10.127.155.205  Up     Normal  85.17 KB        59.06%
> 41570168072350555868554892080805525145
> 10.122.123.210  Up     Normal  91.1 KB         40.94%
> 111232248257764777335763873822010980488
>
> vmstat before cassandra is up:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  0      0 328196    632  13936    0    0    12     4   13    1  0  0 99
> 0
>
> vmstat after cassandra is up:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  2      0   5660    116  10312    0    0    12     4   13    1  0  0 99
> 0
>
> Then after I run a line like sys.create_keyspace('testing', 1) in pycassa
> with the connections setup to point to my machine I get the following error:
>
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/system_manager.py",
> line 365, in drop_keyspace
>     schema_version = self._conn.system_drop_keyspace(keyspace)
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> line 1255, in system_drop_keyspace
>     return self.recv_system_drop_keyspace()
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> line 1266, in recv_system_drop_keyspace
>     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> line 126, in readMessageBegin
>     sz = self.readI32()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> line 203, in readI32
>     buff = self.trans.readAll(4)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 58, in readAll
>     chunk = self.read(sz-have)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 272, in read
>     self.readFrame()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 276, in readFrame
>     buff = self.__trans.readAll(4)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 58, in readAll
>     chunk = self.read(sz-have)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TSocket.py",
> line 108, in read
>     raise TTransportException(type=TTransportException.END_OF_FILE,
> message='TSocket read 0 bytes')
> thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
>
> and then cassandra on the machine dies, here is the log some of the log of
> the machine that died:
>
>  INFO [FlushWriter:1] 2010-12-24 03:24:01,999 Memtable.java (line 162)
> Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-24-Data.db
> (301 bytes)
>  INFO [main] 2010-12-24 03:24:02,003 Mx4jTool.java (line 73) Will not load
> MX4J, mx4j-tools.jar is not in the classpath
>  INFO [main] 2010-12-24 03:24:02,048 CassandraDaemon.java (line 77) Binding
> thrift service to /0.0.0.0:9160
>  INFO [main] 2010-12-24 03:24:02,050 CassandraDaemon.java (line 91) Using
> TFramedTransport with a max frame size of 15728640 bytes.
>  INFO [main] 2010-12-24 03:24:02,053 CassandraDaemon.java (line 119)
> Listening for thrift clients...
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> (line 639) switching in a fresh Memtable for Migrations at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> position=10873)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> (line 943) Enqueuing flush of Memtable-Migrations@948345082(5902 bytes, 1
> operations)
>  INFO [FlushWriter:1] 2010-12-24 03:26:42,226 Memtable.java (line 155)
> Writing Memtable-Migrations@948345082(5902 bytes, 1 operations)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> (line 639) switching in a fresh Memtable for Schema at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> position=10873)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> (line 943) Enqueuing flush of Memtable-Schema@212165140(2194 bytes, 3
> operations)
>  INFO [FlushWriter:1] 2010-12-24 03:26:45,351 Memtable.java (line 162)
> Completed flushing /var/lib/cassandra/data/system/Migrations-e-11-Data.db
> (6035 bytes)
>  INFO [FlushWriter:1] 2010-12-24 03:26:45,531 Memtable.java (line 155)
> Writing Memtable-Schema@212165140(2194 bytes, 3 operations)
>
> and the log on the machine that stays up:
>
> ERROR [ReadStage:4] 2010-12-24 03:24:01,979 AbstractCassandraDaemon.java
> (line 90) Fatal exception in thread Thread[ReadStage:4,5,main]
> org.apache.avro.AvroTypeException: Found
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}]}},"null"]}]},
> expecting
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_write","type":["boolean","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}],"aliases":["org.apache.cassandra.config.avro.ColumnDef"]}},"null"]}],"aliases":["org.apache.cassandra.config.avro.CfDef"]}
>     at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212)
>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>     at
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
>     at
> org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:98)
>     at
> org.apache.cassandra.db.migration.Migration.deserialize(Migration.java:274)
>     at
> org.apache.cassandra.db.DefinitionsUpdateResponseVerbHandler.doVerb(DefinitionsUpdateResponseVerbHandler.java:56)
>     at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:662)
>  INFO [GossipStage:1] 2010-12-24 03:24:02,151 Gossiper.java (line 583) Node
> /10.127.155.205 has restarted, now UP again
>  INFO [GossipStage:1] 2010-12-24 03:24:02,151 StorageService.java (line
> 670) Node /10.127.155.205 state jump to normal
>  INFO [HintedHandoff:1] 2010-12-24 03:24:02,151 HintedHandOffManager.java
> (line 191) Started hinted handoff for endpoint /10.127.155.205
>  INFO [HintedHandoff:1] 2010-12-24 03:24:02,152 HintedHandOffManager.java
> (line 247) Finished hinted handoff of 0 rows to endpoint /10.127.155.205
>  INFO [WRITE-/10.127.155.205] 2010-12-24 03:26:47,789
> OutboundTcpConnection.java (line 115) error writing to /10.127.155.205
>  INFO [ScheduledTasks:1] 2010-12-24 03:26:58,899 Gossiper.java (line 195)
> InetAddress /10.127.155.205 is now dead.
>
> The ring output on my node that stays up:
>
> Address         Status State   Load            Owns
> Token
>
> 111232248257764777335763873822010980488
> 10.127.155.205  Down   Normal  85.17 KB        59.06%
> 41570168072350555868554892080805525145
> 10.122.123.210  Up     Normal  91.1 KB         40.94%
> 111232248257764777335763873822010980488
>
> I am not sure how to use the jmx tools to connect to these machines so I
> can't really answer that but hopefully this is enough information to
> diagnose my problem, thanks
>
> Alex
>
>
>
> On Thu, Dec 23, 2010 at 4:35 PM, Dan Hendry <da...@gmail.com>
> wrote:
>
> Your details are rather vague, what do you mean by killed? Is the Cassandra
> java process still running? Any other warning or error log messages (from
> either node)? Could you provide the last few Cassandra log lines from each
> machine? Can you connect to the node via JMX? What is the output of nodetool
> ring from the second node (which is presumably still alive)? Is there any
> unusual system activity: high cpu usage, low cpu usage, problems with disk
> IO (can be checked with vmstat).
>
>
>
> Can you provide any further system information? Linux/windows, java
> version, 32/64 bit, amount of ram?
>
>
>
> On Thu, Dec 23, 2010 at 1:42 PM, Alex Quan <al...@tinkur.com> wrote:
>
> Hi,
>
> I am a newbie to cassandra and am using cassandra RC 2. I initially have
> cassndra working on one node and was able to create keyspace, column
> families and populate the database fine. I tried adding a second node by
> changing the seed to point to another node and setting listen_address and
> rpc_address to blank. I then started up the second node and it seems to have
> connected fine using the node tool but after that I couldn't get it to
> accept any commands and whenever I tried to make a new keyspace or column
> family it would kill my initial node after a message like this:
>
>  INFO 18:19:49,335 switching in a fresh Memtable for Schema at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.log',
> position=9143)
>  INFO 18:19:49,335 Enqueuing flush of Memtable-Schema@1358138608(2410
> bytes, 5 operations)
> Killed
>
> and the next few time I start up the server a similar would pop up until I
> am guessing all the stuff is flushed out then it would start fine until I
> tried to add anything to it. I tried changing back the yaml file back to the
> original setup and this still happens. I don't know what to try to get it to
> work properly, if you guys can help I would be really grateful
>
> Alex
>
>
>
>
>
>
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.872 / Virus Database: 271.1.1/3335 - Release Date: 12/24/10
> 02:34:00
>

RE: Having trouble getting cassandra to stay up

Posted by Dan Hendry <da...@gmail.com>.
One last clarification given you are running with -f, “fully die”=return to
command prompt with no action on your part? If you ctrl-c from Cassandra
when running in foreground mode (ie with –f), the process WILL be killed.
Try running in background mode (without the –f).

 

Removing the contents of /var/lib/Cassandra/ and using the default
Cassandra.yaml and Cassandra-env.sh is effectively the same as a complete
reset. You can also just delete then re-un-tar the provided tarball.

 

Given the limited amount of ram on a micro instance, you might try using JNA
(download from
https://jna.dev.java.net/servlets/ProjectDocumentList?folderID=12329
<https://jna.dev.java.net/servlets/ProjectDocumentList?folderID=12329&expand
Folder=12329&folderID=0> &expandFolder=12329&folderID=0 and put the jar in
cassandras lib directory, see the
https://issues.apache.org/jira/browse/CASSANDRA-1214) or setting
disk_access_mode: standard in cassandra.yaml.

 

Other than that, I am out of ideas: perhaps somebody else can comment. I
have set up Cassandra 0.7 RC2 on various EC2 ubuntu 10.10 instances with no
issue (although not a micro in quite some time). Having problems with a
stock ubuntu image, and the provided Cassandra tarball and with no tinkering
with the cassandra or system settings is very strange. Again if worse comes
to worse, start with a fresh m1.small instance; it takes me less that a ½
hour to be up and running from scratch.

 

Dan

 

From: Alex Quan [mailto:alex.quan@tinkur.com] 
Sent: December-24-10 17:44
To: user@cassandra.apache.org
Subject: Re: Having trouble getting cassandra to stay up

 

I am running the bin/cassandra with the -f option and it does seem to fully
die and not just stalling. 

I have also tried using the cassandra-cli to create keyspace and it works
for a little bit and then will die slightly after accepting the request the
vmstat after it dies is as follows:


procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa
 0  0      0 311240    424  23356    0    0    14     4   13    2  0  0 99
0

I also tried the cassandra-cli creating keyspace after I deleted all the
content of cassandra/data and cassandra/commitlog and it still is dying
almost immediately after the keyspace creation I am not sure why this is the
case. Is there a way to fully remove cassandra and start off with a fully
fresh copy?

Thanks

Alex

On Fri, Dec 24, 2010 at 1:42 PM, Dan Hendry <da...@gmail.com>
wrote:

Hum, very strange.

 

More what I was trying to get at was: did the process truly die or was it
just non-responsive and looking like it was dead? It would be very strange
if the actual process was dying without any warnings in the logs. Presumably
you are running bin/cassandra *without* the -f option? What is the output of
top/vmstat on the dead node after Cassandra has 'died'? Sorry I was not
clear on this initially.

I have no experience with pycassa but you might want to try using the
Cassandra CLI to create keyspaces and column families to rule out some sort
of client weirdness. Also, you haven't made any changes to cassandra-env.sh
have you? EC2 micros have a very limited amount of ram. I have also seen
their CPU bursting cause problems but that does not seem to be the issue
here. I might also suggest you try a m1.small instead just to be safe; they
are still pretty cheap when you run then as spot-instances.

 

As a last ditch effort (given that this is a test cluster), you can delete
the contents of /var/lib/cassandra/data/*. /var/lib/cassandra/commitlog/* to
effectively reset your nodes.

 

On Fri, Dec 24, 2010 at 12:48 PM, Alex Quan <al...@tinkur.com> wrote:

Sorry but I am not sure how to answer all the question that you have posed
since a lot of the stuff I am working with is quite new to me and I haven't
use many of the tools that are talked about but I will try my best to answer
the question to the best of my knowledge. I am trying to get the cassandra
to run between 2 nodes that are both Amazon's ec2 micro instances, I believe
they are using a 64 bit linux ubuntu 10.01 using java version 1.6.0_23. When
I said killed it was what was outputted into the console when the process
died so I am not sure what that exactly means. Here is some of the info
before cassandra went down:

ring:

Address         Status State   Load            Owns    Token

 
111232248257764777335763873822010980488
10.127.155.205  Up     Normal  85.17 KB        59.06%
41570168072350555868554892080805525145
10.122.123.210  Up     Normal  91.1 KB         40.94%
111232248257764777335763873822010980488

vmstat before cassandra is up:

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa
 0  0      0 328196    632  13936    0    0    12     4   13    1  0  0 99
0

vmstat after cassandra is up:

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa
 0  2      0   5660    116  10312    0    0    12     4   13    1  0  0 99
0

Then after I run a line like sys.create_keyspace('testing', 1) in pycassa
with the connections setup to point to my machine I get the following error:


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/syst
em_manager.py", line 365, in drop_keyspace
    schema_version = self._conn.system_drop_keyspace(keyspace)
  File
"/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cass
andra/Cassandra.py", line 1255, in system_drop_keyspace
    return self.recv_system_drop_keyspace()
  File
"/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cass
andra/Cassandra.py", line 1266, in recv_system_drop_keyspace
    (fname, mtype, rseqid) = self._iprot.readMessageBegin()
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.eg
g/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin
    sz = self.readI32()
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.eg
g/thrift/protocol/TBinaryProtocol.py", line 203, in readI32
    buff = self.trans.readAll(4)
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.eg
g/thrift/transport/TTransport.py", line 58, in readAll
    chunk = self.read(sz-have)
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.eg
g/thrift/transport/TTransport.py", line 272, in read
    self.readFrame()
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.eg
g/thrift/transport/TTransport.py", line 276, in readFrame
    buff = self.__trans.readAll(4)
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.eg
g/thrift/transport/TTransport.py", line 58, in readAll
    chunk = self.read(sz-have)
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.eg
g/thrift/transport/TSocket.py", line 108, in read
    raise TTransportException(type=TTransportException.END_OF_FILE,
message='TSocket read 0 bytes')
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

and then cassandra on the machine dies, here is the log some of the log of
the machine that died:

 INFO [FlushWriter:1] 2010-12-24 03:24:01,999 Memtable.java (line 162)
Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-24-Data.db
(301 bytes)
 INFO [main] 2010-12-24 03:24:02,003 Mx4jTool.java (line 73) Will not load
MX4J, mx4j-tools.jar is not in the classpath
 INFO [main] 2010-12-24 03:24:02,048 CassandraDaemon.java (line 77) Binding
thrift service to /0.0.0.0:9160
 INFO [main] 2010-12-24 03:24:02,050 CassandraDaemon.java (line 91) Using
TFramedTransport with a max frame size of 15728640 bytes.
 INFO [main] 2010-12-24 03:24:02,053 CassandraDaemon.java (line 119)
Listening for thrift clients...
 INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
(line 639) switching in a fresh Memtable for Migrations at
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.
log', position=10873)
 INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
(line 943) Enqueuing flush of Memtable-Migrations@948345082(5902 bytes, 1
operations)
 INFO [FlushWriter:1] 2010-12-24 03:26:42,226 Memtable.java (line 155)
Writing Memtable-Migrations@948345082(5902 bytes, 1 operations)
 INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
(line 639) switching in a fresh Memtable for Schema at
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.
log', position=10873)
 INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
(line 943) Enqueuing flush of Memtable-Schema@212165140(2194 bytes, 3
operations)
 INFO [FlushWriter:1] 2010-12-24 03:26:45,351 Memtable.java (line 162)
Completed flushing /var/lib/cassandra/data/system/Migrations-e-11-Data.db
(6035 bytes)
 INFO [FlushWriter:1] 2010-12-24 03:26:45,531 Memtable.java (line 155)
Writing Memtable-Schema@212165140(2194 bytes, 3 operations)

and the log on the machine that stays up:

ERROR [ReadStage:4] 2010-12-24 03:24:01,979 AbstractCassandraDaemon.java
(line 90) Fatal exception in thread Thread[ReadStage:4,5,main]
org.apache.avro.AvroTypeException: Found
{"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fie
lds":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"
name":"column_type","type":["string","null"]},{"name":"comparator_type","typ
e":["string","null"]},{"name":"subcomparator_type","type":["string","null"]}
,{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type"
:["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"nam
e":"read_repair_chance","type":["double","null"]},{"name":"gc_grace_seconds"
,"type":["int","null"]},{"name":"default_validation_class","type":["null","s
tring"],"default":null},{"name":"min_compaction_threshold","type":["null","i
nt"],"default":null},{"name":"max_compaction_threshold","type":["null","int"
],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","
null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int"
,"null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","
null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int
"],"default":null},{"name":"memtable_operations_in_millions","type":["null",
"double"],"default":null},{"name":"id","type":["int","null"]},{"name":"colum
n_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnD
ef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","typ
e":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType",
"symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]}
,"null"]},{"name":"index_name","type":["string","null"]}]}},"null"]}]},
expecting
{"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fie
lds":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"
name":"column_type","type":["string","null"]},{"name":"comparator_type","typ
e":["string","null"]},{"name":"subcomparator_type","type":["string","null"]}
,{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type"
:["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"nam
e":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_writ
e","type":["boolean","null"]},{"name":"gc_grace_seconds","type":["int","null
"]},{"name":"default_validation_class","type":["null","string"],"default":nu
ll},{"name":"min_compaction_threshold","type":["null","int"],"default":null}
,{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"
name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},
{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3
600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60}
,{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{
"name":"memtable_operations_in_millions","type":["null","double"],"default":
null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[
{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name
":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name"
:"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"
aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"i
ndex_name","type":["string","null"]}],"aliases":["org.apache.cassandra.confi
g.avro.ColumnDef"]}},"null"]}],"aliases":["org.apache.cassandra.config.avro.
CfDef"]}
    at
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121
)
    at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.jav
a:138)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
    at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.jav
a:142)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118)
    at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.jav
a:142)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
    at
org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:98)
    at
org.apache.cassandra.db.migration.Migration.deserialize(Migration.java:274)
    at
org.apache.cassandra.db.DefinitionsUpdateResponseVerbHandler.doVerb(Definiti
onsUpdateResponseVerbHandler.java:56)
    at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63
)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
    at java.lang.Thread.run(Thread.java:662)
 INFO [GossipStage:1] 2010-12-24 03:24:02,151 Gossiper.java (line 583) Node
/10.127.155.205 has restarted, now UP again
 INFO [GossipStage:1] 2010-12-24 03:24:02,151 StorageService.java (line 670)
Node /10.127.155.205 state jump to normal
 INFO [HintedHandoff:1] 2010-12-24 03:24:02,151 HintedHandOffManager.java
(line 191) Started hinted handoff for endpoint /10.127.155.205
 INFO [HintedHandoff:1] 2010-12-24 03:24:02,152 HintedHandOffManager.java
(line 247) Finished hinted handoff of 0 rows to endpoint /10.127.155.205
 INFO [WRITE-/10.127.155.205] 2010-12-24 03:26:47,789
OutboundTcpConnection.java (line 115) error writing to /10.127.155.205
 INFO [ScheduledTasks:1] 2010-12-24 03:26:58,899 Gossiper.java (line 195)
InetAddress /10.127.155.205 is now dead.

The ring output on my node that stays up:

Address         Status State   Load            Owns    Token

 
111232248257764777335763873822010980488
10.127.155.205  Down   Normal  85.17 KB        59.06%
41570168072350555868554892080805525145
10.122.123.210  Up     Normal  91.1 KB         40.94%
111232248257764777335763873822010980488

I am not sure how to use the jmx tools to connect to these machines so I
can't really answer that but hopefully this is enough information to
diagnose my problem, thanks

Alex





On Thu, Dec 23, 2010 at 4:35 PM, Dan Hendry <da...@gmail.com>
wrote:

Your details are rather vague, what do you mean by killed? Is the Cassandra
java process still running? Any other warning or error log messages (from
either node)? Could you provide the last few Cassandra log lines from each
machine? Can you connect to the node via JMX? What is the output of nodetool
ring from the second node (which is presumably still alive)? Is there any
unusual system activity: high cpu usage, low cpu usage, problems with disk
IO (can be checked with vmstat).

 

Can you provide any further system information? Linux/windows, java version,
32/64 bit, amount of ram? 

 

On Thu, Dec 23, 2010 at 1:42 PM, Alex Quan <al...@tinkur.com> wrote:

Hi,

I am a newbie to cassandra and am using cassandra RC 2. I initially have
cassndra working on one node and was able to create keyspace, column
families and populate the database fine. I tried adding a second node by
changing the seed to point to another node and setting listen_address and
rpc_address to blank. I then started up the second node and it seems to have
connected fine using the node tool but after that I couldn't get it to
accept any commands and whenever I tried to make a new keyspace or column
family it would kill my initial node after a message like this:

 INFO 18:19:49,335 switching in a fresh Memtable for Schema at
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.
log', position=9143)
 INFO 18:19:49,335 Enqueuing flush of Memtable-Schema@1358138608(2410 bytes,
5 operations)
Killed

and the next few time I start up the server a similar would pop up until I
am guessing all the stuff is flushed out then it would start fine until I
tried to add anything to it. I tried changing back the yaml file back to the
original setup and this still happens. I don't know what to try to get it to
work properly, if you guys can help I would be really grateful

Alex

 

 

 

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3335 - Release Date: 12/24/10
02:34:00


Re: Having trouble getting cassandra to stay up

Posted by Alex Quan <al...@tinkur.com>.
I am running the bin/cassandra with the -f option and it does seem to fully
die and not just stalling.

I have also tried using the cassandra-cli to create keyspace and it works
for a little bit and then will die slightly after accepting the request the
vmstat after it dies is as follows:


procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa
 0  0      0 311240    424  23356    0    0    14     4   13    2  0  0 99
0

I also tried the cassandra-cli creating keyspace after I deleted all the
content of cassandra/data and cassandra/commitlog and it still is dying
almost immediately after the keyspace creation I am not sure why this is the
case. Is there a way to fully remove cassandra and start off with a fully
fresh copy?

Thanks

Alex

On Fri, Dec 24, 2010 at 1:42 PM, Dan Hendry <da...@gmail.com>wrote:

> Hum, very strange.
>
> More what I was trying to get at was: did the process truly die or was it
> just non-responsive and looking like it was dead? It would be very strange
> if the actual process was dying without any warnings in the logs. Presumably
> you are running bin/cassandra *without* the -f option? What is the output of
> top/vmstat on the dead node after Cassandra has 'died'? Sorry I was not
> clear on this initially.
>
> I have no experience with pycassa but you might want to try using the
> Cassandra CLI to create keyspaces and column families to rule out some sort
> of client weirdness. Also, you haven't made any changes to cassandra-env.sh
> have you? EC2 micros have a very limited amount of ram. I have also seen
> their CPU bursting cause problems but that does not seem to be the issue
> here. I might also suggest you try a m1.small instead just to be safe; they
> are still pretty cheap when you run then as spot-instances.
>
> As a last ditch effort (given that this is a test cluster), you can delete
> the contents of /var/lib/cassandra/data/*. /var/lib/cassandra/commitlog/* to
> effectively reset your nodes.
>
> On Fri, Dec 24, 2010 at 12:48 PM, Alex Quan <al...@tinkur.com> wrote:
>
>> Sorry but I am not sure how to answer all the question that you have posed
>> since a lot of the stuff I am working with is quite new to me and I haven't
>> use many of the tools that are talked about but I will try my best to answer
>> the question to the best of my knowledge. I am trying to get the cassandra
>> to run between 2 nodes that are both Amazon's ec2 micro instances, I believe
>> they are using a 64 bit linux ubuntu 10.01 using java version 1.6.0_23. When
>> I said killed it was what was outputted into the console when the process
>> died so I am not sure what that exactly means. Here is some of the info
>> before cassandra went down:
>>
>> ring:
>>
>> Address         Status State   Load            Owns
>> Token
>>
>> 111232248257764777335763873822010980488
>> 10.127.155.205  Up     Normal  85.17 KB        59.06%
>> 41570168072350555868554892080805525145
>> 10.122.123.210  Up     Normal  91.1 KB         40.94%
>> 111232248257764777335763873822010980488
>>
>> vmstat before cassandra is up:
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
>> wa
>>  0  0      0 328196    632  13936    0    0    12     4   13    1  0  0
>> 99  0
>>
>> vmstat after cassandra is up:
>>
>> procs -----------memory---------- ---swap-- -----io---- -system--
>> ----cpu----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
>> wa
>>  0  2      0   5660    116  10312    0    0    12     4   13    1  0  0
>> 99  0
>>
>> Then after I run a line like sys.create_keyspace('testing', 1) in pycassa
>> with the connections setup to point to my machine I get the following error:
>>
>>
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File
>> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/system_manager.py",
>> line 365, in drop_keyspace
>>     schema_version = self._conn.system_drop_keyspace(keyspace)
>>   File
>> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
>> line 1255, in system_drop_keyspace
>>     return self.recv_system_drop_keyspace()
>>   File
>> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
>> line 1266, in recv_system_drop_keyspace
>>     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
>>   File
>> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
>> line 126, in readMessageBegin
>>     sz = self.readI32()
>>   File
>> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
>> line 203, in readI32
>>     buff = self.trans.readAll(4)
>>   File
>> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
>> line 58, in readAll
>>     chunk = self.read(sz-have)
>>   File
>> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
>> line 272, in read
>>     self.readFrame()
>>   File
>> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
>> line 276, in readFrame
>>     buff = self.__trans.readAll(4)
>>   File
>> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
>> line 58, in readAll
>>     chunk = self.read(sz-have)
>>   File
>> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TSocket.py",
>> line 108, in read
>>     raise TTransportException(type=TTransportException.END_OF_FILE,
>> message='TSocket read 0 bytes')
>> thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
>>
>> and then cassandra on the machine dies, here is the log some of the log of
>> the machine that died:
>>
>>  INFO [FlushWriter:1] 2010-12-24 03:24:01,999 Memtable.java (line 162)
>> Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-24-Data.db
>> (301 bytes)
>>  INFO [main] 2010-12-24 03:24:02,003 Mx4jTool.java (line 73) Will not load
>> MX4J, mx4j-tools.jar is not in the classpath
>>  INFO [main] 2010-12-24 03:24:02,048 CassandraDaemon.java (line 77)
>> Binding thrift service to /0.0.0.0:9160
>>  INFO [main] 2010-12-24 03:24:02,050 CassandraDaemon.java (line 91) Using
>> TFramedTransport with a max frame size of 15728640 bytes.
>>  INFO [main] 2010-12-24 03:24:02,053 CassandraDaemon.java (line 119)
>> Listening for thrift clients...
>>  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
>> (line 639) switching in a fresh Memtable for Migrations at
>> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
>> position=10873)
>>  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
>> (line 943) Enqueuing flush of Memtable-Migrations@948345082(5902 bytes, 1
>> operations)
>>  INFO [FlushWriter:1] 2010-12-24 03:26:42,226 Memtable.java (line 155)
>> Writing Memtable-Migrations@948345082(5902 bytes, 1 operations)
>>  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
>> (line 639) switching in a fresh Memtable for Schema at
>> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
>> position=10873)
>>  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
>> (line 943) Enqueuing flush of Memtable-Schema@212165140(2194 bytes, 3
>> operations)
>>  INFO [FlushWriter:1] 2010-12-24 03:26:45,351 Memtable.java (line 162)
>> Completed flushing /var/lib/cassandra/data/system/Migrations-e-11-Data.db
>> (6035 bytes)
>>  INFO [FlushWriter:1] 2010-12-24 03:26:45,531 Memtable.java (line 155)
>> Writing Memtable-Schema@212165140(2194 bytes, 3 operations)
>>
>> and the log on the machine that stays up:
>>
>> ERROR [ReadStage:4] 2010-12-24 03:24:01,979 AbstractCassandraDaemon.java
>> (line 90) Fatal exception in thread Thread[ReadStage:4,5,main]
>> org.apache.avro.AvroTypeException: Found
>> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}]}},"null"]}]},
>> expecting
>> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_write","type":["boolean","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}],"aliases":["org.apache.cassandra.config.avro.ColumnDef"]}},"null"]}],"aliases":["org.apache.cassandra.config.avro.CfDef"]}
>>     at
>> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212)
>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>     at
>> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121)
>>     at
>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138)
>>     at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>>     at
>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
>>     at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>>     at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118)
>>     at
>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
>>     at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>>     at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
>>     at
>> org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:98)
>>     at
>> org.apache.cassandra.db.migration.Migration.deserialize(Migration.java:274)
>>     at
>> org.apache.cassandra.db.DefinitionsUpdateResponseVerbHandler.doVerb(DefinitionsUpdateResponseVerbHandler.java:56)
>>     at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
>>     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>     at java.lang.Thread.run(Thread.java:662)
>>  INFO [GossipStage:1] 2010-12-24 03:24:02,151 Gossiper.java (line 583)
>> Node /10.127.155.205 has restarted, now UP again
>>  INFO [GossipStage:1] 2010-12-24 03:24:02,151 StorageService.java (line
>> 670) Node /10.127.155.205 state jump to normal
>>  INFO [HintedHandoff:1] 2010-12-24 03:24:02,151 HintedHandOffManager.java
>> (line 191) Started hinted handoff for endpoint /10.127.155.205
>>  INFO [HintedHandoff:1] 2010-12-24 03:24:02,152 HintedHandOffManager.java
>> (line 247) Finished hinted handoff of 0 rows to endpoint /10.127.155.205
>>  INFO [WRITE-/10.127.155.205] 2010-12-24 03:26:47,789
>> OutboundTcpConnection.java (line 115) error writing to /10.127.155.205
>>  INFO [ScheduledTasks:1] 2010-12-24 03:26:58,899 Gossiper.java (line 195)
>> InetAddress /10.127.155.205 is now dead.
>>
>> The ring output on my node that stays up:
>>
>> Address         Status State   Load            Owns
>> Token
>>
>> 111232248257764777335763873822010980488
>> 10.127.155.205  Down   Normal  85.17 KB        59.06%
>> 41570168072350555868554892080805525145
>> 10.122.123.210  Up     Normal  91.1 KB         40.94%
>> 111232248257764777335763873822010980488
>>
>> I am not sure how to use the jmx tools to connect to these machines so I
>> can't really answer that but hopefully this is enough information to
>> diagnose my problem, thanks
>>
>> Alex
>>
>>
>>
>> On Thu, Dec 23, 2010 at 4:35 PM, Dan Hendry <da...@gmail.com>wrote:
>>
>>> Your details are rather vague, what do you mean by killed? Is the
>>> Cassandra java process still running? Any other warning or error log
>>> messages (from either node)? Could you provide the last few Cassandra log
>>> lines from each machine? Can you connect to the node via JMX? What is the
>>> output of nodetool ring from the second node (which is presumably still
>>> alive)? Is there any unusual system activity: high cpu usage, low cpu usage,
>>> problems with disk IO (can be checked with vmstat).
>>>
>>> Can you provide any further system information? Linux/windows, java
>>> version, 32/64 bit, amount of ram?
>>>
>>>
>>> On Thu, Dec 23, 2010 at 1:42 PM, Alex Quan <al...@tinkur.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am a newbie to cassandra and am using cassandra RC 2. I initially have
>>>> cassndra working on one node and was able to create keyspace, column
>>>> families and populate the database fine. I tried adding a second node by
>>>> changing the seed to point to another node and setting listen_address and
>>>> rpc_address to blank. I then started up the second node and it seems to have
>>>> connected fine using the node tool but after that I couldn't get it to
>>>> accept any commands and whenever I tried to make a new keyspace or column
>>>> family it would kill my initial node after a message like this:
>>>>
>>>>  INFO 18:19:49,335 switching in a fresh Memtable for Schema at
>>>> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.log',
>>>> position=9143)
>>>>  INFO 18:19:49,335 Enqueuing flush of Memtable-Schema@1358138608(2410
>>>> bytes, 5 operations)
>>>> Killed
>>>>
>>>> and the next few time I start up the server a similar would pop up until
>>>> I am guessing all the stuff is flushed out then it would start fine until I
>>>> tried to add anything to it. I tried changing back the yaml file back to the
>>>> original setup and this still happens. I don't know what to try to get it to
>>>> work properly, if you guys can help I would be really grateful
>>>>
>>>> Alex
>>>>
>>>
>>>
>>
>

Re: Having trouble getting cassandra to stay up

Posted by Dan Hendry <da...@gmail.com>.
Hum, very strange.

More what I was trying to get at was: did the process truly die or was it
just non-responsive and looking like it was dead? It would be very strange
if the actual process was dying without any warnings in the logs. Presumably
you are running bin/cassandra *without* the -f option? What is the output of
top/vmstat on the dead node after Cassandra has 'died'? Sorry I was not
clear on this initially.

I have no experience with pycassa but you might want to try using the
Cassandra CLI to create keyspaces and column families to rule out some sort
of client weirdness. Also, you haven't made any changes to cassandra-env.sh
have you? EC2 micros have a very limited amount of ram. I have also seen
their CPU bursting cause problems but that does not seem to be the issue
here. I might also suggest you try a m1.small instead just to be safe; they
are still pretty cheap when you run then as spot-instances.

As a last ditch effort (given that this is a test cluster), you can delete
the contents of /var/lib/cassandra/data/*. /var/lib/cassandra/commitlog/* to
effectively reset your nodes.

On Fri, Dec 24, 2010 at 12:48 PM, Alex Quan <al...@tinkur.com> wrote:

> Sorry but I am not sure how to answer all the question that you have posed
> since a lot of the stuff I am working with is quite new to me and I haven't
> use many of the tools that are talked about but I will try my best to answer
> the question to the best of my knowledge. I am trying to get the cassandra
> to run between 2 nodes that are both Amazon's ec2 micro instances, I believe
> they are using a 64 bit linux ubuntu 10.01 using java version 1.6.0_23. When
> I said killed it was what was outputted into the console when the process
> died so I am not sure what that exactly means. Here is some of the info
> before cassandra went down:
>
> ring:
>
> Address         Status State   Load            Owns
> Token
>
> 111232248257764777335763873822010980488
> 10.127.155.205  Up     Normal  85.17 KB        59.06%
> 41570168072350555868554892080805525145
> 10.122.123.210  Up     Normal  91.1 KB         40.94%
> 111232248257764777335763873822010980488
>
> vmstat before cassandra is up:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  0      0 328196    632  13936    0    0    12     4   13    1  0  0 99
> 0
>
> vmstat after cassandra is up:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  2      0   5660    116  10312    0    0    12     4   13    1  0  0 99
> 0
>
> Then after I run a line like sys.create_keyspace('testing', 1) in pycassa
> with the connections setup to point to my machine I get the following error:
>
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/system_manager.py",
> line 365, in drop_keyspace
>     schema_version = self._conn.system_drop_keyspace(keyspace)
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> line 1255, in system_drop_keyspace
>     return self.recv_system_drop_keyspace()
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> line 1266, in recv_system_drop_keyspace
>     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> line 126, in readMessageBegin
>     sz = self.readI32()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> line 203, in readI32
>     buff = self.trans.readAll(4)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 58, in readAll
>     chunk = self.read(sz-have)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 272, in read
>     self.readFrame()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 276, in readFrame
>     buff = self.__trans.readAll(4)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 58, in readAll
>     chunk = self.read(sz-have)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TSocket.py",
> line 108, in read
>     raise TTransportException(type=TTransportException.END_OF_FILE,
> message='TSocket read 0 bytes')
> thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
>
> and then cassandra on the machine dies, here is the log some of the log of
> the machine that died:
>
>  INFO [FlushWriter:1] 2010-12-24 03:24:01,999 Memtable.java (line 162)
> Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-24-Data.db
> (301 bytes)
>  INFO [main] 2010-12-24 03:24:02,003 Mx4jTool.java (line 73) Will not load
> MX4J, mx4j-tools.jar is not in the classpath
>  INFO [main] 2010-12-24 03:24:02,048 CassandraDaemon.java (line 77) Binding
> thrift service to /0.0.0.0:9160
>  INFO [main] 2010-12-24 03:24:02,050 CassandraDaemon.java (line 91) Using
> TFramedTransport with a max frame size of 15728640 bytes.
>  INFO [main] 2010-12-24 03:24:02,053 CassandraDaemon.java (line 119)
> Listening for thrift clients...
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> (line 639) switching in a fresh Memtable for Migrations at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> position=10873)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> (line 943) Enqueuing flush of Memtable-Migrations@948345082(5902 bytes, 1
> operations)
>  INFO [FlushWriter:1] 2010-12-24 03:26:42,226 Memtable.java (line 155)
> Writing Memtable-Migrations@948345082(5902 bytes, 1 operations)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> (line 639) switching in a fresh Memtable for Schema at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> position=10873)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> (line 943) Enqueuing flush of Memtable-Schema@212165140(2194 bytes, 3
> operations)
>  INFO [FlushWriter:1] 2010-12-24 03:26:45,351 Memtable.java (line 162)
> Completed flushing /var/lib/cassandra/data/system/Migrations-e-11-Data.db
> (6035 bytes)
>  INFO [FlushWriter:1] 2010-12-24 03:26:45,531 Memtable.java (line 155)
> Writing Memtable-Schema@212165140(2194 bytes, 3 operations)
>
> and the log on the machine that stays up:
>
> ERROR [ReadStage:4] 2010-12-24 03:24:01,979 AbstractCassandraDaemon.java
> (line 90) Fatal exception in thread Thread[ReadStage:4,5,main]
> org.apache.avro.AvroTypeException: Found
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}]}},"null"]}]},
> expecting
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_write","type":["boolean","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}],"aliases":["org.apache.cassandra.config.avro.ColumnDef"]}},"null"]}],"aliases":["org.apache.cassandra.config.avro.CfDef"]}
>     at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212)
>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>     at
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
>     at
> org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:98)
>     at
> org.apache.cassandra.db.migration.Migration.deserialize(Migration.java:274)
>     at
> org.apache.cassandra.db.DefinitionsUpdateResponseVerbHandler.doVerb(DefinitionsUpdateResponseVerbHandler.java:56)
>     at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:662)
>  INFO [GossipStage:1] 2010-12-24 03:24:02,151 Gossiper.java (line 583) Node
> /10.127.155.205 has restarted, now UP again
>  INFO [GossipStage:1] 2010-12-24 03:24:02,151 StorageService.java (line
> 670) Node /10.127.155.205 state jump to normal
>  INFO [HintedHandoff:1] 2010-12-24 03:24:02,151 HintedHandOffManager.java
> (line 191) Started hinted handoff for endpoint /10.127.155.205
>  INFO [HintedHandoff:1] 2010-12-24 03:24:02,152 HintedHandOffManager.java
> (line 247) Finished hinted handoff of 0 rows to endpoint /10.127.155.205
>  INFO [WRITE-/10.127.155.205] 2010-12-24 03:26:47,789
> OutboundTcpConnection.java (line 115) error writing to /10.127.155.205
>  INFO [ScheduledTasks:1] 2010-12-24 03:26:58,899 Gossiper.java (line 195)
> InetAddress /10.127.155.205 is now dead.
>
> The ring output on my node that stays up:
>
> Address         Status State   Load            Owns
> Token
>
> 111232248257764777335763873822010980488
> 10.127.155.205  Down   Normal  85.17 KB        59.06%
> 41570168072350555868554892080805525145
> 10.122.123.210  Up     Normal  91.1 KB         40.94%
> 111232248257764777335763873822010980488
>
> I am not sure how to use the jmx tools to connect to these machines so I
> can't really answer that but hopefully this is enough information to
> diagnose my problem, thanks
>
> Alex
>
>
>
> On Thu, Dec 23, 2010 at 4:35 PM, Dan Hendry <da...@gmail.com>wrote:
>
>> Your details are rather vague, what do you mean by killed? Is the
>> Cassandra java process still running? Any other warning or error log
>> messages (from either node)? Could you provide the last few Cassandra log
>> lines from each machine? Can you connect to the node via JMX? What is the
>> output of nodetool ring from the second node (which is presumably still
>> alive)? Is there any unusual system activity: high cpu usage, low cpu usage,
>> problems with disk IO (can be checked with vmstat).
>>
>> Can you provide any further system information? Linux/windows, java
>> version, 32/64 bit, amount of ram?
>>
>>
>> On Thu, Dec 23, 2010 at 1:42 PM, Alex Quan <al...@tinkur.com> wrote:
>>
>>> Hi,
>>>
>>> I am a newbie to cassandra and am using cassandra RC 2. I initially have
>>> cassndra working on one node and was able to create keyspace, column
>>> families and populate the database fine. I tried adding a second node by
>>> changing the seed to point to another node and setting listen_address and
>>> rpc_address to blank. I then started up the second node and it seems to have
>>> connected fine using the node tool but after that I couldn't get it to
>>> accept any commands and whenever I tried to make a new keyspace or column
>>> family it would kill my initial node after a message like this:
>>>
>>>  INFO 18:19:49,335 switching in a fresh Memtable for Schema at
>>> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.log',
>>> position=9143)
>>>  INFO 18:19:49,335 Enqueuing flush of Memtable-Schema@1358138608(2410
>>> bytes, 5 operations)
>>> Killed
>>>
>>> and the next few time I start up the server a similar would pop up until
>>> I am guessing all the stuff is flushed out then it would start fine until I
>>> tried to add anything to it. I tried changing back the yaml file back to the
>>> original setup and this still happens. I don't know what to try to get it to
>>> work properly, if you guys can help I would be really grateful
>>>
>>> Alex
>>>
>>
>>
>

Re: Having trouble getting cassandra to stay up

Posted by Alex Quan <al...@tinkur.com>.
I started over and used a m1 type instance and everything seems to be
working fine now, thanks for all the help

Alex

On Mon, Dec 27, 2010 at 7:18 AM, Gary Dusbabek <gd...@gmail.com> wrote:

> You might want to try starting over.  Configure your initial keyspaces
> in conf/cassandra.yaml and load them into your cluster with
> bin/schematool.
>
> That nasty stack trace indicates the server is getting data that is
> not formatted the way it expects.  Please verify that your cassandra
> servers are both running the same version.
>
> Your earlier error when adding a keyspace through pycassa was
> confusing.  You stated that you tried to create a keyspace, but the
> error you pasted appeared to error in a drop_keyspace call.  Something
> doesn't add up.
>
> Gary.
>
>
> On Fri, Dec 24, 2010 at 11:48, Alex Quan <al...@tinkur.com> wrote:
> > Sorry but I am not sure how to answer all the question that you have
> posed
> > since a lot of the stuff I am working with is quite new to me and I
> haven't
> > use many of the tools that are talked about but I will try my best to
> answer
> > the question to the best of my knowledge. I am trying to get the
> cassandra
> > to run between 2 nodes that are both Amazon's ec2 micro instances, I
> believe
> > they are using a 64 bit linux ubuntu 10.01 using java version 1.6.0_23.
> When
> > I said killed it was what was outputted into the console when the process
> > died so I am not sure what that exactly means. Here is some of the info
> > before cassandra went down:
> >
> > ring:
> >
> > Address         Status State   Load            Owns
> > Token
> >
> > 111232248257764777335763873822010980488
> > 10.127.155.205  Up     Normal  85.17 KB        59.06%
> > 41570168072350555868554892080805525145
> > 10.122.123.210  Up     Normal  91.1 KB         40.94%
> > 111232248257764777335763873822010980488
> >
> > vmstat before cassandra is up:
> >
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id
> > wa
> >  0  0      0 328196    632  13936    0    0    12     4   13    1  0  0
> 99
> > 0
> >
> > vmstat after cassandra is up:
> >
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id
> > wa
> >  0  2      0   5660    116  10312    0    0    12     4   13    1  0  0
> 99
> > 0
> >
> > Then after I run a line like sys.create_keyspace('testing', 1) in pycassa
> > with the connections setup to point to my machine I get the following
> error:
> >
> >
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/system_manager.py",
> > line 365, in drop_keyspace
> >     schema_version = self._conn.system_drop_keyspace(keyspace)
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> > line 1255, in system_drop_keyspace
> >     return self.recv_system_drop_keyspace()
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> > line 1266, in recv_system_drop_keyspace
> >     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> > line 126, in readMessageBegin
> >     sz = self.readI32()
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> > line 203, in readI32
> >     buff = self.trans.readAll(4)
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> > line 58, in readAll
> >     chunk = self.read(sz-have)
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> > line 272, in read
> >     self.readFrame()
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> > line 276, in readFrame
> >     buff = self.__trans.readAll(4)
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> > line 58, in readAll
> >     chunk = self.read(sz-have)
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TSocket.py",
> > line 108, in read
> >     raise TTransportException(type=TTransportException.END_OF_FILE,
> > message='TSocket read 0 bytes')
> > thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
> >
> > and then cassandra on the machine dies, here is the log some of the log
> of
> > the machine that died:
> >
> >  INFO [FlushWriter:1] 2010-12-24 03:24:01,999 Memtable.java (line 162)
> > Completed flushing
> /var/lib/cassandra/data/system/LocationInfo-e-24-Data.db
> > (301 bytes)
> >  INFO [main] 2010-12-24 03:24:02,003 Mx4jTool.java (line 73) Will not
> load
> > MX4J, mx4j-tools.jar is not in the classpath
> >  INFO [main] 2010-12-24 03:24:02,048 CassandraDaemon.java (line 77)
> Binding
> > thrift service to /0.0.0.0:9160
> >  INFO [main] 2010-12-24 03:24:02,050 CassandraDaemon.java (line 91) Using
> > TFramedTransport with a max frame size of 15728640 bytes.
> >  INFO [main] 2010-12-24 03:24:02,053 CassandraDaemon.java (line 119)
> > Listening for thrift clients...
> >  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> > (line 639) switching in a fresh Memtable for Migrations at
> >
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> > position=10873)
> >  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> > (line 943) Enqueuing flush of Memtable-Migrations@948345082(5902 bytes,
> 1
> > operations)
> >  INFO [FlushWriter:1] 2010-12-24 03:26:42,226 Memtable.java (line 155)
> > Writing Memtable-Migrations@948345082(5902 bytes, 1 operations)
> >  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> > (line 639) switching in a fresh Memtable for Schema at
> >
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> > position=10873)
> >  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> > (line 943) Enqueuing flush of Memtable-Schema@212165140(2194 bytes, 3
> > operations)
> >  INFO [FlushWriter:1] 2010-12-24 03:26:45,351 Memtable.java (line 162)
> > Completed flushing /var/lib/cassandra/data/system/Migrations-e-11-Data.db
> > (6035 bytes)
> >  INFO [FlushWriter:1] 2010-12-24 03:26:45,531 Memtable.java (line 155)
> > Writing Memtable-Schema@212165140(2194 bytes, 3 operations)
> >
> > and the log on the machine that stays up:
> >
> > ERROR [ReadStage:4] 2010-12-24 03:24:01,979 AbstractCassandraDaemon.java
> > (line 90) Fatal exception in thread Thread[ReadStage:4,5,main]
> > org.apache.avro.AvroTypeException: Found
> >
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}]}},"null"]}]},
> > expecting
> >
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_write","type":["boolean","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}],"aliases":["org.apache.cassandra.config.avro.ColumnDef"]}},"null"]}],"aliases":["org.apache.cassandra.config.avro.CfDef"]}
> >     at
> > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212)
> >     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> >     at
> >
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
> >     at
> >
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
> >     at
> >
> org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:98)
> >     at
> >
> org.apache.cassandra.db.migration.Migration.deserialize(Migration.java:274)
> >     at
> >
> org.apache.cassandra.db.DefinitionsUpdateResponseVerbHandler.doVerb(DefinitionsUpdateResponseVerbHandler.java:56)
> >     at
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >     at java.lang.Thread.run(Thread.java:662)
> >  INFO [GossipStage:1] 2010-12-24 03:24:02,151 Gossiper.java (line 583)
> Node
> > /10.127.155.205 has restarted, now UP again
> >  INFO [GossipStage:1] 2010-12-24 03:24:02,151 StorageService.java (line
> 670)
> > Node /10.127.155.205 state jump to normal
> >  INFO [HintedHandoff:1] 2010-12-24 03:24:02,151 HintedHandOffManager.java
> > (line 191) Started hinted handoff for endpoint /10.127.155.205
> >  INFO [HintedHandoff:1] 2010-12-24 03:24:02,152 HintedHandOffManager.java
> > (line 247) Finished hinted handoff of 0 rows to endpoint /10.127.155.205
> >  INFO [WRITE-/10.127.155.205] 2010-12-24 03:26:47,789
> > OutboundTcpConnection.java (line 115) error writing to /10.127.155.205
> >  INFO [ScheduledTasks:1] 2010-12-24 03:26:58,899 Gossiper.java (line 195)
> > InetAddress /10.127.155.205 is now dead.
> >
> > The ring output on my node that stays up:
> >
> > Address         Status State   Load            Owns
> > Token
> >
> > 111232248257764777335763873822010980488
> > 10.127.155.205  Down   Normal  85.17 KB        59.06%
> > 41570168072350555868554892080805525145
> > 10.122.123.210  Up     Normal  91.1 KB         40.94%
> > 111232248257764777335763873822010980488
> >
> > I am not sure how to use the jmx tools to connect to these machines so I
> > can't really answer that but hopefully this is enough information to
> > diagnose my problem, thanks
> >
> > Alex
> >
> >
> > On Thu, Dec 23, 2010 at 4:35 PM, Dan Hendry <da...@gmail.com>
> > wrote:
> >>
> >> Your details are rather vague, what do you mean by killed? Is the
> >> Cassandra java process still running? Any other warning or error log
> >> messages (from either node)? Could you provide the last few Cassandra
> log
> >> lines from each machine? Can you connect to the node via JMX? What is
> the
> >> output of nodetool ring from the second node (which is presumably still
> >> alive)? Is there any unusual system activity: high cpu usage, low cpu
> usage,
> >> problems with disk IO (can be checked with vmstat).
> >> Can you provide any further system information? Linux/windows, java
> >> version, 32/64 bit, amount of ram?
> >>
> >> On Thu, Dec 23, 2010 at 1:42 PM, Alex Quan <al...@tinkur.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am a newbie to cassandra and am using cassandra RC 2. I initially
> have
> >>> cassndra working on one node and was able to create keyspace, column
> >>> families and populate the database fine. I tried adding a second node
> by
> >>> changing the seed to point to another node and setting listen_address
> and
> >>> rpc_address to blank. I then started up the second node and it seems to
> have
> >>> connected fine using the node tool but after that I couldn't get it to
> >>> accept any commands and whenever I tried to make a new keyspace or
> column
> >>> family it would kill my initial node after a message like this:
> >>>
> >>>  INFO 18:19:49,335 switching in a fresh Memtable for Schema at
> >>>
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.log',
> >>> position=9143)
> >>>  INFO 18:19:49,335 Enqueuing flush of Memtable-Schema@1358138608(2410
> >>> bytes, 5 operations)
> >>> Killed
> >>>
> >>> and the next few time I start up the server a similar would pop up
> until
> >>> I am guessing all the stuff is flushed out then it would start fine
> until I
> >>> tried to add anything to it. I tried changing back the yaml file back
> to the
> >>> original setup and this still happens. I don't know what to try to get
> it to
> >>> work properly, if you guys can help I would be really grateful
> >>>
> >>> Alex
> >>
> >
> >
>

Re: Having trouble getting cassandra to stay up

Posted by Gary Dusbabek <gd...@gmail.com>.
You might want to try starting over.  Configure your initial keyspaces
in conf/cassandra.yaml and load them into your cluster with
bin/schematool.

That nasty stack trace indicates the server is getting data that is
not formatted the way it expects.  Please verify that your cassandra
servers are both running the same version.

Your earlier error when adding a keyspace through pycassa was
confusing.  You stated that you tried to create a keyspace, but the
error you pasted appeared to error in a drop_keyspace call.  Something
doesn't add up.

Gary.


On Fri, Dec 24, 2010 at 11:48, Alex Quan <al...@tinkur.com> wrote:
> Sorry but I am not sure how to answer all the question that you have posed
> since a lot of the stuff I am working with is quite new to me and I haven't
> use many of the tools that are talked about but I will try my best to answer
> the question to the best of my knowledge. I am trying to get the cassandra
> to run between 2 nodes that are both Amazon's ec2 micro instances, I believe
> they are using a 64 bit linux ubuntu 10.01 using java version 1.6.0_23. When
> I said killed it was what was outputted into the console when the process
> died so I am not sure what that exactly means. Here is some of the info
> before cassandra went down:
>
> ring:
>
> Address         Status State   Load            Owns
> Token
>
> 111232248257764777335763873822010980488
> 10.127.155.205  Up     Normal  85.17 KB        59.06%
> 41570168072350555868554892080805525145
> 10.122.123.210  Up     Normal  91.1 KB         40.94%
> 111232248257764777335763873822010980488
>
> vmstat before cassandra is up:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  0      0 328196    632  13936    0    0    12     4   13    1  0  0 99
> 0
>
> vmstat after cassandra is up:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  2      0   5660    116  10312    0    0    12     4   13    1  0  0 99
> 0
>
> Then after I run a line like sys.create_keyspace('testing', 1) in pycassa
> with the connections setup to point to my machine I get the following error:
>
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/system_manager.py",
> line 365, in drop_keyspace
>     schema_version = self._conn.system_drop_keyspace(keyspace)
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> line 1255, in system_drop_keyspace
>     return self.recv_system_drop_keyspace()
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> line 1266, in recv_system_drop_keyspace
>     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> line 126, in readMessageBegin
>     sz = self.readI32()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> line 203, in readI32
>     buff = self.trans.readAll(4)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 58, in readAll
>     chunk = self.read(sz-have)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 272, in read
>     self.readFrame()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 276, in readFrame
>     buff = self.__trans.readAll(4)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 58, in readAll
>     chunk = self.read(sz-have)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TSocket.py",
> line 108, in read
>     raise TTransportException(type=TTransportException.END_OF_FILE,
> message='TSocket read 0 bytes')
> thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
>
> and then cassandra on the machine dies, here is the log some of the log of
> the machine that died:
>
>  INFO [FlushWriter:1] 2010-12-24 03:24:01,999 Memtable.java (line 162)
> Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-24-Data.db
> (301 bytes)
>  INFO [main] 2010-12-24 03:24:02,003 Mx4jTool.java (line 73) Will not load
> MX4J, mx4j-tools.jar is not in the classpath
>  INFO [main] 2010-12-24 03:24:02,048 CassandraDaemon.java (line 77) Binding
> thrift service to /0.0.0.0:9160
>  INFO [main] 2010-12-24 03:24:02,050 CassandraDaemon.java (line 91) Using
> TFramedTransport with a max frame size of 15728640 bytes.
>  INFO [main] 2010-12-24 03:24:02,053 CassandraDaemon.java (line 119)
> Listening for thrift clients...
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> (line 639) switching in a fresh Memtable for Migrations at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> position=10873)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> (line 943) Enqueuing flush of Memtable-Migrations@948345082(5902 bytes, 1
> operations)
>  INFO [FlushWriter:1] 2010-12-24 03:26:42,226 Memtable.java (line 155)
> Writing Memtable-Migrations@948345082(5902 bytes, 1 operations)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> (line 639) switching in a fresh Memtable for Schema at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> position=10873)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> (line 943) Enqueuing flush of Memtable-Schema@212165140(2194 bytes, 3
> operations)
>  INFO [FlushWriter:1] 2010-12-24 03:26:45,351 Memtable.java (line 162)
> Completed flushing /var/lib/cassandra/data/system/Migrations-e-11-Data.db
> (6035 bytes)
>  INFO [FlushWriter:1] 2010-12-24 03:26:45,531 Memtable.java (line 155)
> Writing Memtable-Schema@212165140(2194 bytes, 3 operations)
>
> and the log on the machine that stays up:
>
> ERROR [ReadStage:4] 2010-12-24 03:24:01,979 AbstractCassandraDaemon.java
> (line 90) Fatal exception in thread Thread[ReadStage:4,5,main]
> org.apache.avro.AvroTypeException: Found
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}]}},"null"]}]},
> expecting
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_write","type":["boolean","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}],"aliases":["org.apache.cassandra.config.avro.ColumnDef"]}},"null"]}],"aliases":["org.apache.cassandra.config.avro.CfDef"]}
>     at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212)
>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>     at
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
>     at
> org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:98)
>     at
> org.apache.cassandra.db.migration.Migration.deserialize(Migration.java:274)
>     at
> org.apache.cassandra.db.DefinitionsUpdateResponseVerbHandler.doVerb(DefinitionsUpdateResponseVerbHandler.java:56)
>     at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:662)
>  INFO [GossipStage:1] 2010-12-24 03:24:02,151 Gossiper.java (line 583) Node
> /10.127.155.205 has restarted, now UP again
>  INFO [GossipStage:1] 2010-12-24 03:24:02,151 StorageService.java (line 670)
> Node /10.127.155.205 state jump to normal
>  INFO [HintedHandoff:1] 2010-12-24 03:24:02,151 HintedHandOffManager.java
> (line 191) Started hinted handoff for endpoint /10.127.155.205
>  INFO [HintedHandoff:1] 2010-12-24 03:24:02,152 HintedHandOffManager.java
> (line 247) Finished hinted handoff of 0 rows to endpoint /10.127.155.205
>  INFO [WRITE-/10.127.155.205] 2010-12-24 03:26:47,789
> OutboundTcpConnection.java (line 115) error writing to /10.127.155.205
>  INFO [ScheduledTasks:1] 2010-12-24 03:26:58,899 Gossiper.java (line 195)
> InetAddress /10.127.155.205 is now dead.
>
> The ring output on my node that stays up:
>
> Address         Status State   Load            Owns
> Token
>
> 111232248257764777335763873822010980488
> 10.127.155.205  Down   Normal  85.17 KB        59.06%
> 41570168072350555868554892080805525145
> 10.122.123.210  Up     Normal  91.1 KB         40.94%
> 111232248257764777335763873822010980488
>
> I am not sure how to use the jmx tools to connect to these machines so I
> can't really answer that but hopefully this is enough information to
> diagnose my problem, thanks
>
> Alex
>
>
> On Thu, Dec 23, 2010 at 4:35 PM, Dan Hendry <da...@gmail.com>
> wrote:
>>
>> Your details are rather vague, what do you mean by killed? Is the
>> Cassandra java process still running? Any other warning or error log
>> messages (from either node)? Could you provide the last few Cassandra log
>> lines from each machine? Can you connect to the node via JMX? What is the
>> output of nodetool ring from the second node (which is presumably still
>> alive)? Is there any unusual system activity: high cpu usage, low cpu usage,
>> problems with disk IO (can be checked with vmstat).
>> Can you provide any further system information? Linux/windows, java
>> version, 32/64 bit, amount of ram?
>>
>> On Thu, Dec 23, 2010 at 1:42 PM, Alex Quan <al...@tinkur.com> wrote:
>>>
>>> Hi,
>>>
>>> I am a newbie to cassandra and am using cassandra RC 2. I initially have
>>> cassndra working on one node and was able to create keyspace, column
>>> families and populate the database fine. I tried adding a second node by
>>> changing the seed to point to another node and setting listen_address and
>>> rpc_address to blank. I then started up the second node and it seems to have
>>> connected fine using the node tool but after that I couldn't get it to
>>> accept any commands and whenever I tried to make a new keyspace or column
>>> family it would kill my initial node after a message like this:
>>>
>>>  INFO 18:19:49,335 switching in a fresh Memtable for Schema at
>>> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.log',
>>> position=9143)
>>>  INFO 18:19:49,335 Enqueuing flush of Memtable-Schema@1358138608(2410
>>> bytes, 5 operations)
>>> Killed
>>>
>>> and the next few time I start up the server a similar would pop up until
>>> I am guessing all the stuff is flushed out then it would start fine until I
>>> tried to add anything to it. I tried changing back the yaml file back to the
>>> original setup and this still happens. I don't know what to try to get it to
>>> work properly, if you guys can help I would be really grateful
>>>
>>> Alex
>>
>
>

Re: Having trouble getting cassandra to stay up

Posted by Alex Quan <al...@tinkur.com>.
Sorry but I am not sure how to answer all the question that you have posed
since a lot of the stuff I am working with is quite new to me and I haven't
use many of the tools that are talked about but I will try my best to answer
the question to the best of my knowledge. I am trying to get the cassandra
to run between 2 nodes that are both Amazon's ec2 micro instances, I believe
they are using a 64 bit linux ubuntu 10.01 using java version 1.6.0_23. When
I said killed it was what was outputted into the console when the process
died so I am not sure what that exactly means. Here is some of the info
before cassandra went down:

ring:

Address         Status State   Load            Owns
Token

111232248257764777335763873822010980488
10.127.155.205  Up     Normal  85.17 KB        59.06%
41570168072350555868554892080805525145
10.122.123.210  Up     Normal  91.1 KB         40.94%
111232248257764777335763873822010980488

vmstat before cassandra is up:

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa
 0  0      0 328196    632  13936    0    0    12     4   13    1  0  0 99
0

vmstat after cassandra is up:

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa
 0  2      0   5660    116  10312    0    0    12     4   13    1  0  0 99
0

Then after I run a line like sys.create_keyspace('testing', 1) in pycassa
with the connections setup to point to my machine I get the following error:


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/system_manager.py",
line 365, in drop_keyspace
    schema_version = self._conn.system_drop_keyspace(keyspace)
  File
"/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
line 1255, in system_drop_keyspace
    return self.recv_system_drop_keyspace()
  File
"/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
line 1266, in recv_system_drop_keyspace
    (fname, mtype, rseqid) = self._iprot.readMessageBegin()
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
line 126, in readMessageBegin
    sz = self.readI32()
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
line 203, in readI32
    buff = self.trans.readAll(4)
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
line 58, in readAll
    chunk = self.read(sz-have)
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
line 272, in read
    self.readFrame()
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
line 276, in readFrame
    buff = self.__trans.readAll(4)
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
line 58, in readAll
    chunk = self.read(sz-have)
  File
"/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TSocket.py",
line 108, in read
    raise TTransportException(type=TTransportException.END_OF_FILE,
message='TSocket read 0 bytes')
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

and then cassandra on the machine dies, here is the log some of the log of
the machine that died:

 INFO [FlushWriter:1] 2010-12-24 03:24:01,999 Memtable.java (line 162)
Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-24-Data.db
(301 bytes)
 INFO [main] 2010-12-24 03:24:02,003 Mx4jTool.java (line 73) Will not load
MX4J, mx4j-tools.jar is not in the classpath
 INFO [main] 2010-12-24 03:24:02,048 CassandraDaemon.java (line 77) Binding
thrift service to /0.0.0.0:9160
 INFO [main] 2010-12-24 03:24:02,050 CassandraDaemon.java (line 91) Using
TFramedTransport with a max frame size of 15728640 bytes.
 INFO [main] 2010-12-24 03:24:02,053 CassandraDaemon.java (line 119)
Listening for thrift clients...
 INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
(line 639) switching in a fresh Memtable for Migrations at
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
position=10873)
 INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
(line 943) Enqueuing flush of Memtable-Migrations@948345082(5902 bytes, 1
operations)
 INFO [FlushWriter:1] 2010-12-24 03:26:42,226 Memtable.java (line 155)
Writing Memtable-Migrations@948345082(5902 bytes, 1 operations)
 INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
(line 639) switching in a fresh Memtable for Schema at
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
position=10873)
 INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
(line 943) Enqueuing flush of Memtable-Schema@212165140(2194 bytes, 3
operations)
 INFO [FlushWriter:1] 2010-12-24 03:26:45,351 Memtable.java (line 162)
Completed flushing /var/lib/cassandra/data/system/Migrations-e-11-Data.db
(6035 bytes)
 INFO [FlushWriter:1] 2010-12-24 03:26:45,531 Memtable.java (line 155)
Writing Memtable-Schema@212165140(2194 bytes, 3 operations)

and the log on the machine that stays up:

ERROR [ReadStage:4] 2010-12-24 03:24:01,979 AbstractCassandraDaemon.java
(line 90) Fatal exception in thread Thread[ReadStage:4,5,main]
org.apache.avro.AvroTypeException: Found
{"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}]}},"null"]}]},
expecting
{"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_write","type":["boolean","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}],"aliases":["org.apache.cassandra.config.avro.ColumnDef"]}},"null"]}],"aliases":["org.apache.cassandra.config.avro.CfDef"]}
    at
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121)
    at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
    at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118)
    at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
    at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
    at
org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:98)
    at
org.apache.cassandra.db.migration.Migration.deserialize(Migration.java:274)
    at
org.apache.cassandra.db.DefinitionsUpdateResponseVerbHandler.doVerb(DefinitionsUpdateResponseVerbHandler.java:56)
    at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
 INFO [GossipStage:1] 2010-12-24 03:24:02,151 Gossiper.java (line 583) Node
/10.127.155.205 has restarted, now UP again
 INFO [GossipStage:1] 2010-12-24 03:24:02,151 StorageService.java (line 670)
Node /10.127.155.205 state jump to normal
 INFO [HintedHandoff:1] 2010-12-24 03:24:02,151 HintedHandOffManager.java
(line 191) Started hinted handoff for endpoint /10.127.155.205
 INFO [HintedHandoff:1] 2010-12-24 03:24:02,152 HintedHandOffManager.java
(line 247) Finished hinted handoff of 0 rows to endpoint /10.127.155.205
 INFO [WRITE-/10.127.155.205] 2010-12-24 03:26:47,789
OutboundTcpConnection.java (line 115) error writing to /10.127.155.205
 INFO [ScheduledTasks:1] 2010-12-24 03:26:58,899 Gossiper.java (line 195)
InetAddress /10.127.155.205 is now dead.

The ring output on my node that stays up:

Address         Status State   Load            Owns
Token

111232248257764777335763873822010980488
10.127.155.205  Down   Normal  85.17 KB        59.06%
41570168072350555868554892080805525145
10.122.123.210  Up     Normal  91.1 KB         40.94%
111232248257764777335763873822010980488

I am not sure how to use the jmx tools to connect to these machines so I
can't really answer that but hopefully this is enough information to
diagnose my problem, thanks

Alex


On Thu, Dec 23, 2010 at 4:35 PM, Dan Hendry <da...@gmail.com>wrote:

> Your details are rather vague, what do you mean by killed? Is the Cassandra
> java process still running? Any other warning or error log messages (from
> either node)? Could you provide the last few Cassandra log lines from each
> machine? Can you connect to the node via JMX? What is the output of nodetool
> ring from the second node (which is presumably still alive)? Is there any
> unusual system activity: high cpu usage, low cpu usage, problems with disk
> IO (can be checked with vmstat).
>
> Can you provide any further system information? Linux/windows, java
> version, 32/64 bit, amount of ram?
>
>
> On Thu, Dec 23, 2010 at 1:42 PM, Alex Quan <al...@tinkur.com> wrote:
>
>> Hi,
>>
>> I am a newbie to cassandra and am using cassandra RC 2. I initially have
>> cassndra working on one node and was able to create keyspace, column
>> families and populate the database fine. I tried adding a second node by
>> changing the seed to point to another node and setting listen_address and
>> rpc_address to blank. I then started up the second node and it seems to have
>> connected fine using the node tool but after that I couldn't get it to
>> accept any commands and whenever I tried to make a new keyspace or column
>> family it would kill my initial node after a message like this:
>>
>>  INFO 18:19:49,335 switching in a fresh Memtable for Schema at
>> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.log',
>> position=9143)
>>  INFO 18:19:49,335 Enqueuing flush of Memtable-Schema@1358138608(2410
>> bytes, 5 operations)
>> Killed
>>
>> and the next few time I start up the server a similar would pop up until I
>> am guessing all the stuff is flushed out then it would start fine until I
>> tried to add anything to it. I tried changing back the yaml file back to the
>> original setup and this still happens. I don't know what to try to get it to
>> work properly, if you guys can help I would be really grateful
>>
>> Alex
>>
>
>

Re: Having trouble getting cassandra to stay up

Posted by Dan Hendry <da...@gmail.com>.
Your details are rather vague, what do you mean by killed? Is the Cassandra
java process still running? Any other warning or error log messages (from
either node)? Could you provide the last few Cassandra log lines from each
machine? Can you connect to the node via JMX? What is the output of nodetool
ring from the second node (which is presumably still alive)? Is there any
unusual system activity: high cpu usage, low cpu usage, problems with disk
IO (can be checked with vmstat).

Can you provide any further system information? Linux/windows, java version,
32/64 bit, amount of ram?

On Thu, Dec 23, 2010 at 1:42 PM, Alex Quan <al...@tinkur.com> wrote:

> Hi,
>
> I am a newbie to cassandra and am using cassandra RC 2. I initially have
> cassndra working on one node and was able to create keyspace, column
> families and populate the database fine. I tried adding a second node by
> changing the seed to point to another node and setting listen_address and
> rpc_address to blank. I then started up the second node and it seems to have
> connected fine using the node tool but after that I couldn't get it to
> accept any commands and whenever I tried to make a new keyspace or column
> family it would kill my initial node after a message like this:
>
>  INFO 18:19:49,335 switching in a fresh Memtable for Schema at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.log',
> position=9143)
>  INFO 18:19:49,335 Enqueuing flush of Memtable-Schema@1358138608(2410
> bytes, 5 operations)
> Killed
>
> and the next few time I start up the server a similar would pop up until I
> am guessing all the stuff is flushed out then it would start fine until I
> tried to add anything to it. I tried changing back the yaml file back to the
> original setup and this still happens. I don't know what to try to get it to
> work properly, if you guys can help I would be really grateful
>
> Alex
>