You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Dan Larsen <da...@techba.se> on 2009/10/09 13:52:50 UTC
Irresponsive nodes
Hi again :-)
O.k... New problem...
I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM.
Running CommitLog on 1 disk and data on another.
Cassandra 0.4.0 - (yes I have checked... correct version :-P)
6GB set in the cassandra.in.sh.
I started throwing data at it, without problems.
All of a sudden, the node becomes irresponsive.
I only have 6.6GB of data in the DBs.
I experienced the same thing, while running much smaller nodes.
I tried restarting cassandra (kill [pid]).
When it starts up, it goes crazy for a while, trying to fill up the
RAM or something.
Then it stops filling RAM, but keeps a load of ~100% CPU.
It doesn't respond to anything, but a nodeprobe info, which responds,
but VERY slowly.
The log doesn't give me anything - not that I can understand anyways...
[.....]
INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142)
Cassandra starting up...
INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
ColumnFamilyStore.java (line 369) LocationInfo has reached its
threshold; switching in a fresh Memtable
INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
ColumnFamilyStore.java (line 1178) Enqueuing flush of Memtable
(LocationInfo)@2116316013
INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039 Memtable.java
(line 186) Flushing Memtable(LocationInfo)@2116316013
DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java (line
466) discard completed log segments for CommitLogContext(file='/var/
lib/cassandra/commitlog/CommitLog-1255087417263.log', position=257),
column family 0. CFIDs are system: TableMetadata(LocationInfo: 0,
HintsColumnFamily: 1, }), Fetcher: TableMetadata(PageSentences: 2,
Pages: 3, PageWords: 4, WordPages: 6, SentencePages: 5, }), }
DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java (line
509) Marking replay position 257 on commit log /var/lib/cassandra/
commitlog/CommitLog-1255087417263.log
INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243 Memtable.java
(line 220) Completed flushing /mnt/cassandra/data/system/
LocationInfo-19-Data.db
DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
SSTableReader.java (line 58) index size for bloom filter calc for
file : /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256
DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
SSTableReader.java (line 58) index size for bloom filter calc for
file : /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512
DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
SSTableReader.java (line 58) index size for bloom filter calc for
file : /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768
DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
SSTableReader.java (line 58) index size for bloom filter calc for
file : /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024
DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
ColumnFamilyStore.java (line 1048) Expected bloom filter size : 1024
DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line
40) Disseminating load info ...
DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line
40) Disseminating load info ...
DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line
40) Disseminating load info ...
DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line
40) Disseminating load info ...
If I try to insert anything, I get stuff like this:
ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574 StorageProxy.java
(line 179) error writing key md5
java.util.concurrent.TimeoutException: Operation timed out - received
only 0 responses from .
at org.apache.cassandra.service.QuorumResponseHandler.get
(QuorumResponseHandler.java:88)
at org.apache.cassandra.service.StorageProxy.insertBlocking
(StorageProxy.java:164)
at org.apache.cassandra.service.CassandraServer.doInsert
(CassandraServer.java:468)
at org.apache.cassandra.service.CassandraServer.insert
(CassandraServer.java:421)
at org.apache.cassandra.service.Cassandra$Processor$insert.process
(Cassandra.java:824)
at org.apache.cassandra.service.Cassandra$Processor.process
(Cassandra.java:627)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run
(TThreadPoolServer.java:253)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Any ideas?
Best regards
Dan
Re: Irresponsive nodes
Posted by Dan Larsen <da...@techba.se>.
That is exactly, what I have discovered by now ;-)
Looking forward for the next release then!
Thanks Jonathan!
Best
Dan
On 09/10/2009, at 16.21, Jonathan Ellis wrote:
> if you swamp it with inserts faster than it can write them, it will
> start spending more and more time trying to GC. that's what's
> happening... trunk is smarter about this and will stop accepting
> writes before it gets to that point, but for 0.4 you just need to be a
> little careful.
>
> -Jonathan
>
> On Fri, Oct 9, 2009 at 7:11 AM, Dan Larsen <da...@techba.se> wrote:
>> Thanks for the tips Eric.
>>
>> I was just about to try it, when I noticed, it had become
>> responsive again.
>> It took exactly 1 hour, before it was done!...
>>
>> But when I restart now, it's ready almost immediatly... Weird stuff!!
>>
>> I will try out your tips, next time this happens!
>>
>> It sounds like, it's pretty well-defined, when the JVM dies under
>> GC load..?
>> Any pointers there?
>> I was just thinking, that it might be possible, to add nodes based on
>> current knowledge?
>>
>> #Dan
>>
>> On 09/10/2009, at 13.58, Eric Bowman wrote:
>>
>>> A few things to try:
>>>
>>> 1. Enable verbose GC logging to see if your JVM is dying under GC
>>> load.
>>> 2. pkill -3 java will dump some nice stack traces from all running
>>> threads, could be some clues there.
>>>
>>>
>>> Dan Larsen wrote:
>>>>
>>>> Hi again :-)
>>>>
>>>> O.k... New problem...
>>>> I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM.
>>>> Running CommitLog on 1 disk and data on another.
>>>> Cassandra 0.4.0 - (yes I have checked... correct version :-P)
>>>> 6GB set in the cassandra.in.sh.
>>>>
>>>> I started throwing data at it, without problems.
>>>> All of a sudden, the node becomes irresponsive.
>>>>
>>>> I only have 6.6GB of data in the DBs.
>>>>
>>>> I experienced the same thing, while running much smaller nodes.
>>>>
>>>> I tried restarting cassandra (kill [pid]).
>>>>
>>>> When it starts up, it goes crazy for a while, trying to fill up the
>>>> RAM or something.
>>>> Then it stops filling RAM, but keeps a load of ~100% CPU.
>>>> It doesn't respond to anything, but a nodeprobe info, which
>>>> responds,
>>>> but VERY slowly.
>>>>
>>>>
>>>> The log doesn't give me anything - not that I can understand
>>>> anyways...
>>>>
>>>> [.....]
>>>> INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142)
>>>> Cassandra starting up...
>>>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>>>> ColumnFamilyStore.java (line 369) LocationInfo has reached its
>>>> threshold; switching in a fresh Memtable
>>>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>>>> ColumnFamilyStore.java (line 1178) Enqueuing flush of
>>>> Memtable(LocationInfo)@2116316013
>>>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039
>>>> Memtable.java
>>>> (line 186) Flushing Memtable(LocationInfo)@2116316013
>>>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java
>>>> (line
>>>> 466) discard completed log segments for
>>>>
>>>> CommitLogContext(file='/var/lib/cassandra/commitlog/
>>>> CommitLog-1255087417263.log',
>>>> position=257), column family 0. CFIDs are system:
>>>> TableMetadata(LocationInfo: 0, HintsColumnFamily: 1, }), Fetcher:
>>>> TableMetadata(PageSentences: 2, Pages: 3, PageWords: 4,
>>>> WordPages: 6,
>>>> SentencePages: 5, }), }
>>>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java
>>>> (line
>>>> 509) Marking replay position 257 on commit log
>>>> /var/lib/cassandra/commitlog/CommitLog-1255087417263.log
>>>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243
>>>> Memtable.java
>>>> (line 220) Completed flushing
>>>> /mnt/cassandra/data/system/LocationInfo-19-Data.db
>>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>>> SSTableReader.java (line 58) index size for bloom filter calc for
>>>> file
>>>> : /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256
>>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>>> SSTableReader.java (line 58) index size for bloom filter calc for
>>>> file
>>>> : /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512
>>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>>> SSTableReader.java (line 58) index size for bloom filter calc for
>>>> file
>>>> : /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768
>>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>>> SSTableReader.java (line 58) index size for bloom filter calc for
>>>> file
>>>> : /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024
>>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>>> ColumnFamilyStore.java (line 1048) Expected bloom filter size :
>>>> 1024
>>>> DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line
>>>> 40) Disseminating load info ...
>>>> DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line
>>>> 40) Disseminating load info ...
>>>> DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line
>>>> 40) Disseminating load info ...
>>>> DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line
>>>> 40) Disseminating load info ...
>>>>
>>>>
>>>> If I try to insert anything, I get stuff like this:
>>>>
>>>> ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574
>>>> StorageProxy.java
>>>> (line 179) error writing key md5
>>>> java.util.concurrent.TimeoutException: Operation timed out -
>>>> received
>>>> only 0 responses from .
>>>> at
>>>>
>>>> org.apache.cassandra.service.QuorumResponseHandler.get
>>>> (QuorumResponseHandler.java:88)
>>>>
>>>> at
>>>>
>>>> org.apache.cassandra.service.StorageProxy.insertBlocking
>>>> (StorageProxy.java:164)
>>>>
>>>> at
>>>>
>>>> org.apache.cassandra.service.CassandraServer.doInsert
>>>> (CassandraServer.java:468)
>>>>
>>>> at
>>>>
>>>> org.apache.cassandra.service.CassandraServer.insert
>>>> (CassandraServer.java:421)
>>>>
>>>> at
>>>>
>>>> org.apache.cassandra.service.Cassandra$Processor$insert.process
>>>> (Cassandra.java:824)
>>>>
>>>> at
>>>>
>>>> org.apache.cassandra.service.Cassandra$Processor.process
>>>> (Cassandra.java:627)
>>>>
>>>> at
>>>>
>>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run
>>>> (TThreadPoolServer.java:253)
>>>>
>>>> at
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask
>>>> (ThreadPoolExecutor.java:886)
>>>>
>>>> at
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run
>>>> (ThreadPoolExecutor.java:908)
>>>>
>>>> at java.lang.Thread.run(Thread.java:619)
>>>>
>>>>
>>>> Any ideas?
>>>>
>>>> Best regards
>>>> Dan
>>>
>>>
>>> --
>>> Eric Bowman
>>> Boboco Ltd
>>> ebowman@boboco.ie
>>> http://www.boboco.ie/ebowman/pubkey.pgp
>>> +35318394189/+353872801532
>>>
>>>
>>
>>
>
Re: Irresponsive nodes
Posted by Jonathan Ellis <jb...@gmail.com>.
if you swamp it with inserts faster than it can write them, it will
start spending more and more time trying to GC. that's what's
happening... trunk is smarter about this and will stop accepting
writes before it gets to that point, but for 0.4 you just need to be a
little careful.
-Jonathan
On Fri, Oct 9, 2009 at 7:11 AM, Dan Larsen <da...@techba.se> wrote:
> Thanks for the tips Eric.
>
> I was just about to try it, when I noticed, it had become responsive again.
> It took exactly 1 hour, before it was done!...
>
> But when I restart now, it's ready almost immediatly... Weird stuff!!
>
> I will try out your tips, next time this happens!
>
> It sounds like, it's pretty well-defined, when the JVM dies under GC load..?
> Any pointers there?
> I was just thinking, that it might be possible, to add nodes based on
> current knowledge?
>
> #Dan
>
> On 09/10/2009, at 13.58, Eric Bowman wrote:
>
>> A few things to try:
>>
>> 1. Enable verbose GC logging to see if your JVM is dying under GC load.
>> 2. pkill -3 java will dump some nice stack traces from all running
>> threads, could be some clues there.
>>
>>
>> Dan Larsen wrote:
>>>
>>> Hi again :-)
>>>
>>> O.k... New problem...
>>> I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM.
>>> Running CommitLog on 1 disk and data on another.
>>> Cassandra 0.4.0 - (yes I have checked... correct version :-P)
>>> 6GB set in the cassandra.in.sh.
>>>
>>> I started throwing data at it, without problems.
>>> All of a sudden, the node becomes irresponsive.
>>>
>>> I only have 6.6GB of data in the DBs.
>>>
>>> I experienced the same thing, while running much smaller nodes.
>>>
>>> I tried restarting cassandra (kill [pid]).
>>>
>>> When it starts up, it goes crazy for a while, trying to fill up the
>>> RAM or something.
>>> Then it stops filling RAM, but keeps a load of ~100% CPU.
>>> It doesn't respond to anything, but a nodeprobe info, which responds,
>>> but VERY slowly.
>>>
>>>
>>> The log doesn't give me anything - not that I can understand anyways...
>>>
>>> [.....]
>>> INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142)
>>> Cassandra starting up...
>>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>>> ColumnFamilyStore.java (line 369) LocationInfo has reached its
>>> threshold; switching in a fresh Memtable
>>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>>> ColumnFamilyStore.java (line 1178) Enqueuing flush of
>>> Memtable(LocationInfo)@2116316013
>>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039 Memtable.java
>>> (line 186) Flushing Memtable(LocationInfo)@2116316013
>>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java (line
>>> 466) discard completed log segments for
>>>
>>> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1255087417263.log',
>>> position=257), column family 0. CFIDs are system:
>>> TableMetadata(LocationInfo: 0, HintsColumnFamily: 1, }), Fetcher:
>>> TableMetadata(PageSentences: 2, Pages: 3, PageWords: 4, WordPages: 6,
>>> SentencePages: 5, }), }
>>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java (line
>>> 509) Marking replay position 257 on commit log
>>> /var/lib/cassandra/commitlog/CommitLog-1255087417263.log
>>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243 Memtable.java
>>> (line 220) Completed flushing
>>> /mnt/cassandra/data/system/LocationInfo-19-Data.db
>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>> SSTableReader.java (line 58) index size for bloom filter calc for file
>>> : /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256
>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>> SSTableReader.java (line 58) index size for bloom filter calc for file
>>> : /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512
>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>> SSTableReader.java (line 58) index size for bloom filter calc for file
>>> : /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768
>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>> SSTableReader.java (line 58) index size for bloom filter calc for file
>>> : /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024
>>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>>> ColumnFamilyStore.java (line 1048) Expected bloom filter size : 1024
>>> DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line
>>> 40) Disseminating load info ...
>>> DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line
>>> 40) Disseminating load info ...
>>> DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line
>>> 40) Disseminating load info ...
>>> DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line
>>> 40) Disseminating load info ...
>>>
>>>
>>> If I try to insert anything, I get stuff like this:
>>>
>>> ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574 StorageProxy.java
>>> (line 179) error writing key md5
>>> java.util.concurrent.TimeoutException: Operation timed out - received
>>> only 0 responses from .
>>> at
>>>
>>> org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>>>
>>> at
>>>
>>> org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:164)
>>>
>>> at
>>>
>>> org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:468)
>>>
>>> at
>>>
>>> org.apache.cassandra.service.CassandraServer.insert(CassandraServer.java:421)
>>>
>>> at
>>>
>>> org.apache.cassandra.service.Cassandra$Processor$insert.process(Cassandra.java:824)
>>>
>>> at
>>>
>>> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>>>
>>> at
>>>
>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>
>>> at java.lang.Thread.run(Thread.java:619)
>>>
>>>
>>> Any ideas?
>>>
>>> Best regards
>>> Dan
>>
>>
>> --
>> Eric Bowman
>> Boboco Ltd
>> ebowman@boboco.ie
>> http://www.boboco.ie/ebowman/pubkey.pgp
>> +35318394189/+353872801532
>>
>>
>
>
Re: Irresponsive nodes
Posted by Dan Larsen <da...@techba.se>.
Thanks for the tips Eric.
I was just about to try it, when I noticed, it had become responsive
again.
It took exactly 1 hour, before it was done!...
But when I restart now, it's ready almost immediatly... Weird stuff!!
I will try out your tips, next time this happens!
It sounds like, it's pretty well-defined, when the JVM dies under GC
load..?
Any pointers there?
I was just thinking, that it might be possible, to add nodes based on
current knowledge?
#Dan
On 09/10/2009, at 13.58, Eric Bowman wrote:
> A few things to try:
>
> 1. Enable verbose GC logging to see if your JVM is dying under GC
> load.
> 2. pkill -3 java will dump some nice stack traces from all running
> threads, could be some clues there.
>
>
> Dan Larsen wrote:
>> Hi again :-)
>>
>> O.k... New problem...
>> I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM.
>> Running CommitLog on 1 disk and data on another.
>> Cassandra 0.4.0 - (yes I have checked... correct version :-P)
>> 6GB set in the cassandra.in.sh.
>>
>> I started throwing data at it, without problems.
>> All of a sudden, the node becomes irresponsive.
>>
>> I only have 6.6GB of data in the DBs.
>>
>> I experienced the same thing, while running much smaller nodes.
>>
>> I tried restarting cassandra (kill [pid]).
>>
>> When it starts up, it goes crazy for a while, trying to fill up the
>> RAM or something.
>> Then it stops filling RAM, but keeps a load of ~100% CPU.
>> It doesn't respond to anything, but a nodeprobe info, which responds,
>> but VERY slowly.
>>
>>
>> The log doesn't give me anything - not that I can understand
>> anyways...
>>
>> [.....]
>> INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142)
>> Cassandra starting up...
>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>> ColumnFamilyStore.java (line 369) LocationInfo has reached its
>> threshold; switching in a fresh Memtable
>> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
>> ColumnFamilyStore.java (line 1178) Enqueuing flush of
>> Memtable(LocationInfo)@2116316013
>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039 Memtable.java
>> (line 186) Flushing Memtable(LocationInfo)@2116316013
>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java
>> (line
>> 466) discard completed log segments for
>> CommitLogContext(file='/var/lib/cassandra/commitlog/
>> CommitLog-1255087417263.log',
>> position=257), column family 0. CFIDs are system:
>> TableMetadata(LocationInfo: 0, HintsColumnFamily: 1, }), Fetcher:
>> TableMetadata(PageSentences: 2, Pages: 3, PageWords: 4, WordPages: 6,
>> SentencePages: 5, }), }
>> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java
>> (line
>> 509) Marking replay position 257 on commit log
>> /var/lib/cassandra/commitlog/CommitLog-1255087417263.log
>> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243 Memtable.java
>> (line 220) Completed flushing
>> /mnt/cassandra/data/system/LocationInfo-19-Data.db
>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>> SSTableReader.java (line 58) index size for bloom filter calc for
>> file
>> : /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256
>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>> SSTableReader.java (line 58) index size for bloom filter calc for
>> file
>> : /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512
>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>> SSTableReader.java (line 58) index size for bloom filter calc for
>> file
>> : /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768
>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>> SSTableReader.java (line 58) index size for bloom filter calc for
>> file
>> : /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024
>> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
>> ColumnFamilyStore.java (line 1048) Expected bloom filter size : 1024
>> DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line
>> 40) Disseminating load info ...
>> DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line
>> 40) Disseminating load info ...
>> DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line
>> 40) Disseminating load info ...
>> DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line
>> 40) Disseminating load info ...
>>
>>
>> If I try to insert anything, I get stuff like this:
>>
>> ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574 StorageProxy.java
>> (line 179) error writing key md5
>> java.util.concurrent.TimeoutException: Operation timed out - received
>> only 0 responses from .
>> at
>> org.apache.cassandra.service.QuorumResponseHandler.get
>> (QuorumResponseHandler.java:88)
>>
>> at
>> org.apache.cassandra.service.StorageProxy.insertBlocking
>> (StorageProxy.java:164)
>>
>> at
>> org.apache.cassandra.service.CassandraServer.doInsert
>> (CassandraServer.java:468)
>>
>> at
>> org.apache.cassandra.service.CassandraServer.insert
>> (CassandraServer.java:421)
>>
>> at
>> org.apache.cassandra.service.Cassandra$Processor$insert.process
>> (Cassandra.java:824)
>>
>> at
>> org.apache.cassandra.service.Cassandra$Processor.process
>> (Cassandra.java:627)
>>
>> at
>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run
>> (TThreadPoolServer.java:253)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask
>> (ThreadPoolExecutor.java:886)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run
>> (ThreadPoolExecutor.java:908)
>>
>> at java.lang.Thread.run(Thread.java:619)
>>
>>
>> Any ideas?
>>
>> Best regards
>> Dan
>
>
> --
> Eric Bowman
> Boboco Ltd
> ebowman@boboco.ie
> http://www.boboco.ie/ebowman/pubkey.pgp
> +35318394189/+353872801532
>
>
Re: Irresponsive nodes
Posted by Eric Bowman <eb...@boboco.ie>.
A few things to try:
1. Enable verbose GC logging to see if your JVM is dying under GC load.
2. pkill -3 java will dump some nice stack traces from all running
threads, could be some clues there.
Dan Larsen wrote:
> Hi again :-)
>
> O.k... New problem...
> I have an Amazon EC2 node with 4 "CPUs" and 7.5 GB of RAM.
> Running CommitLog on 1 disk and data on another.
> Cassandra 0.4.0 - (yes I have checked... correct version :-P)
> 6GB set in the cassandra.in.sh.
>
> I started throwing data at it, without problems.
> All of a sudden, the node becomes irresponsive.
>
> I only have 6.6GB of data in the DBs.
>
> I experienced the same thing, while running much smaller nodes.
>
> I tried restarting cassandra (kill [pid]).
>
> When it starts up, it goes crazy for a while, trying to fill up the
> RAM or something.
> Then it stops filling RAM, but keeps a load of ~100% CPU.
> It doesn't respond to anything, but a nodeprobe info, which responds,
> but VERY slowly.
>
>
> The log doesn't give me anything - not that I can understand anyways...
>
> [.....]
> INFO [main] 2009-10-09 11:23:37,320 CassandraDaemon.java (line 142)
> Cassandra starting up...
> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
> ColumnFamilyStore.java (line 369) LocationInfo has reached its
> threshold; switching in a fresh Memtable
> INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-09 11:24:40,239
> ColumnFamilyStore.java (line 1178) Enqueuing flush of
> Memtable(LocationInfo)@2116316013
> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:41,039 Memtable.java
> (line 186) Flushing Memtable(LocationInfo)@2116316013
> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,191 CommitLog.java (line
> 466) discard completed log segments for
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1255087417263.log',
> position=257), column family 0. CFIDs are system:
> TableMetadata(LocationInfo: 0, HintsColumnFamily: 1, }), Fetcher:
> TableMetadata(PageSentences: 2, Pages: 3, PageWords: 4, WordPages: 6,
> SentencePages: 5, }), }
> DEBUG [COMMIT-LOG-WRITER] 2009-10-09 11:24:45,243 CommitLog.java (line
> 509) Marking replay position 257 on commit log
> /var/lib/cassandra/commitlog/CommitLog-1255087417263.log
> INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-09 11:24:45,243 Memtable.java
> (line 220) Completed flushing
> /mnt/cassandra/data/system/LocationInfo-19-Data.db
> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
> SSTableReader.java (line 58) index size for bloom filter calc for file
> : /mnt/cassandra/data/Fetcher/WordPages-347-Data.db : 256
> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
> SSTableReader.java (line 58) index size for bloom filter calc for file
> : /mnt/cassandra/data/Fetcher/WordPages-416-Data.db : 512
> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
> SSTableReader.java (line 58) index size for bloom filter calc for file
> : /mnt/cassandra/data/Fetcher/WordPages-486-Data.db : 768
> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
> SSTableReader.java (line 58) index size for bloom filter calc for file
> : /mnt/cassandra/data/Fetcher/WordPages-555-Data.db : 1024
> DEBUG [MINOR-COMPACTION-POOL:1] 2009-10-09 11:27:08,228
> ColumnFamilyStore.java (line 1048) Expected bloom filter size : 1024
> DEBUG [Timer-0] 2009-10-09 11:28:39,859 LoadDisseminator.java (line
> 40) Disseminating load info ...
> DEBUG [Timer-0] 2009-10-09 11:33:40,783 LoadDisseminator.java (line
> 40) Disseminating load info ...
> DEBUG [Timer-0] 2009-10-09 11:38:40,956 LoadDisseminator.java (line
> 40) Disseminating load info ...
> DEBUG [Timer-0] 2009-10-09 11:43:40,064 LoadDisseminator.java (line
> 40) Disseminating load info ...
>
>
> If I try to insert anything, I get stuff like this:
>
> ERROR [pool-1-thread-5324] 2009-10-09 10:12:36,574 StorageProxy.java
> (line 179) error writing key md5
> java.util.concurrent.TimeoutException: Operation timed out - received
> only 0 responses from .
> at
> org.apache.cassandra.service.QuorumResponseHandler.get(QuorumResponseHandler.java:88)
>
> at
> org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:164)
>
> at
> org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:468)
>
> at
> org.apache.cassandra.service.CassandraServer.insert(CassandraServer.java:421)
>
> at
> org.apache.cassandra.service.Cassandra$Processor$insert.process(Cassandra.java:824)
>
> at
> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
>
> at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
> at java.lang.Thread.run(Thread.java:619)
>
>
> Any ideas?
>
> Best regards
> Dan
--
Eric Bowman
Boboco Ltd
ebowman@boboco.ie
http://www.boboco.ie/ebowman/pubkey.pgp
+35318394189/+353872801532