You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jan Algermissen <ja...@nordsc.com> on 2013/09/04 10:44:35 UTC

Cassandra crashes

Hi,

I have set up C* in a very limited environment: 3 VMs at digitalocean with 2GB RAM and 40GB SSDs, so my expectations about overall performance are low.

Keyspace uses replication level of 2.

I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small texts, 300.000 wide rows effektively) in a quite 'agressive' way, using java-driver and async update statements.

After a while of importing data, I start seeing timeouts reported by the driver:

com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write

and then later, host-unavailability exceptions:

com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive).

Looking at the 3 hosts, I see two C*s went down - which explains that I still see some writes succeeding (that must be the one host left, satisfying the consitency level ONE).


The logs tell me AFAIU that the servers shutdown due to reaching the heap size limit.

I am irritated by the fact that the instances (it seems) shut themselves down instead of limiting their amount of work. I understand that I need to tweak the configuration and likely get more RAM, but still, I would actually be satisfied with reduced service (and likely more timeouts in the client).  Right now it looks as if I would have to slow down the client 'artificially'  to prevent the loss of hosts - does that make sense?

Can anyone explain whether this is intended behavior, meaning I'll just have to accept the self-shutdown of the hosts? Or alternatively, what data I should collect to investigate the cause further?

Jan






Re: Cassandra shuts down; was:Cassandra crashes

Posted by Nate McCall <na...@thelastpickle.com>.
Ideally, you should get back pressure in the form of dropped messages
before you see crashes, but if turning down the heap allocation was the
only thing you did, there are other changes required (several mentioned by
Romain above are very good places to start).
A few other ideas:
- did you adjust ParNew along with heap?
- you may want to adjust SurvivorRatio and MaxTenuringThreshold (change
both to 4 as a starting point) in JVM
- definitely play with compaction throughput by turning it way up since you
have IO capacity

These will cause you to GC and compact continuously in this environment,
but it should at least keep going


On Wed, Sep 4, 2013 at 9:14 AM, Romain HARDOUIN
<ro...@urssaf.fr>wrote:

> Have you tried to tweak settings like memtable_total_space_in_mb and
> flush_largest_memtables_at?
> Also, the compaction manager seems to be pretty busy, take a look at
> in_memory_compaction_limit_in_mb.
> And with SSD hardware you should modifiy multithreaded_compaction,
> compaction_throughput_mb_per_sec, concurrent_reads and concurrent_writes.
> Of course 2GB of RAM is low but tweak these settings might help you.
> Maybe some guru could confirm/infirm that.
>
>
>
>
> De :        Jan Algermissen <ja...@nordsc.com>
> A :        user@cassandra.apache.org,
> Date :        04/09/2013 12:29
> Objet :        Re: Cassandra shuts down; was:Cassandra crashes
> ------------------------------
>
>
>
> Romain,
>
>
> On 04.09.2013, at 11:11, Romain HARDOUIN <ro...@urssaf.fr>
> wrote:
>
> > Maybe you should include the end of Cassandra logs.
>
> There is nothing that seems interesting in cassandra.log. Below you find
> system.log.
>
> > What comes to my mind when I read your first post is OOM killer.
> > But what you describe later is not the case.
> > Just to be sure, have you checked /var/log/messages?
>
> Nothing there, just occasional Firewall TCP rejections.
>
> Somehow I think I am simply overloading the whole cluster (see the hinted
> handoff messages in the log). Could that be due to the limited memory of
> 2GB my nodes have? IOW, not enough space to buffer up the writes before
> dumping to disk?
>
> Also, my overall write performance is actually pretty bad compared to what
> I read about C*. Before I thought it was the client doing to much work or
> the network. Turns out that's not the case.
>
> I'd expect C* to sort of just suck in my rather small amount of data -
> must be me, not using the right configuration. Oh well, I'll get there :-)
> Thanks anyhow.
>
> Jan
>
>
>
>
>

Re: Cassandra shuts down; was:Cassandra crashes

Posted by Romain HARDOUIN <ro...@urssaf.fr>.
Have you tried to tweak settings like memtable_total_space_in_mb and 
flush_largest_memtables_at?
Also, the compaction manager seems to be pretty busy, take a look at 
in_memory_compaction_limit_in_mb.
And with SSD hardware you should modifiy multithreaded_compaction, 
compaction_throughput_mb_per_sec, concurrent_reads and concurrent_writes.
Of course 2GB of RAM is low but tweak these settings might help you.
Maybe some guru could confirm/infirm that.




De :    Jan Algermissen <ja...@nordsc.com>
A :     user@cassandra.apache.org, 
Date :  04/09/2013 12:29
Objet : Re: Cassandra shuts down; was:Cassandra crashes



Romain,


On 04.09.2013, at 11:11, Romain HARDOUIN <ro...@urssaf.fr> 
wrote:

> Maybe you should include the end of Cassandra logs. 

There is nothing that seems interesting in cassandra.log. Below you find 
system.log.

> What comes to my mind when I read your first post is OOM killer. 
> But what you describe later is not the case. 
> Just to be sure, have you checked /var/log/messages? 

Nothing there, just occasional Firewall TCP rejections. 

Somehow I think I am simply overloading the whole cluster (see the hinted 
handoff messages in the log). Could that be due to the limited memory of 
2GB my nodes have? IOW, not enough space to buffer up the writes before 
dumping to disk?

Also, my overall write performance is actually pretty bad compared to what 
I read about C*. Before I thought it was the client doing to much work or 
the network. Turns out that's not the case.

I'd expect C* to sort of just suck in my rather small amount of data - 
must be me, not using the right configuration. Oh well, I'll get there :-) 
Thanks anyhow.

Jan





Re: Cassandra shuts down; was:Cassandra crashes

Posted by Jan Algermissen <ja...@nordsc.com>.
Romain,


On 04.09.2013, at 11:11, Romain HARDOUIN <ro...@urssaf.fr> wrote:

> Maybe you should include the end of Cassandra logs. 

There is nothing that seems interesting in cassandra.log. Below you find system.log.

> What comes to my mind when I read your first post is OOM killer. 
> But what you describe later is not the case. 
> Just to be sure, have you checked /var/log/messages? 

Nothing there, just occasional Firewall TCP rejections. 

Somehow I think I am simply overloading the whole cluster (see the hinted handoff messages in the log). Could that be due to the limited memory of 2GB my nodes have? IOW, not enough space to buffer up the writes before dumping to disk?

Also, my overall write performance is actually pretty bad compared to what I read about C*. Before I thought it was the client doing to much work or the network. Turns out that's not the case.

I'd expect C* to sort of just suck in my rather small amount of data - must be me, not using the right configuration. Oh well, I'll get there :-) Thanks anyhow.

Jan

> 
> Romain 
> 




INFO [ScheduledTasks:1] 2013-09-04 07:17:09,057 StatusLogger.java (line 96) KeyCache                        216                      936                      all                                                                 
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,057 StatusLogger.java (line 102) RowCache                          0                        0                      all              org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,058 StatusLogger.java (line 109) ColumnFamily                Memtable ops,data
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,082 StatusLogger.java (line 112) system.local                             4,52
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,083 StatusLogger.java (line 112) system.peers                              0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,083 StatusLogger.java (line 112) system.batchlog                           0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,083 StatusLogger.java (line 112) system.NodeIdInfo                         0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,083 StatusLogger.java (line 112) system.LocationInfo                       0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,084 StatusLogger.java (line 112) system.Schema                             0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,084 StatusLogger.java (line 112) system.Migrations                         0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,084 StatusLogger.java (line 112) system.schema_keyspaces                   0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,084 StatusLogger.java (line 112) system.schema_columns                     0,0
ERROR [FlushWriter:6] 2013-09-04 07:17:09,210 CassandraDaemon.java (line 192) Exception in thread Thread[FlushWriter:6,5,main]
java.lang.OutOfMemoryError: Java heap space
	at org.apache.cassandra.io.util.FastByteArrayOutputStream.expand(FastByteArrayOutputStream.java:104)
	at org.apache.cassandra.io.util.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:220)
	at java.io.DataOutputStream.write(DataOutputStream.java:107)
	at org.apache.cassandra.io.util.DataOutputBuffer.write(DataOutputBuffer.java:60)
	at org.apache.cassandra.utils.ByteBufferUtil.write(ByteBufferUtil.java:328)
	at org.apache.cassandra.utils.ByteBufferUtil.writeWithLength(ByteBufferUtil.java:315)
	at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:55)
	at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:30)
	at org.apache.cassandra.db.OnDiskAtom$Serializer.serializeForSSTable(OnDiskAtom.java:62)
	at org.apache.cassandra.db.ColumnIndex$Builder.add(ColumnIndex.java:181)
	at org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:133)
	at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:185)
	at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:489)
	at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:448)
	at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,210 StatusLogger.java (line 112) system.schema_columnfamilies                 0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,218 StatusLogger.java (line 112) system.IndexInfo                          0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,218 StatusLogger.java (line 112) system.range_xfers                        0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,219 StatusLogger.java (line 112) system.peer_events                        0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,219 StatusLogger.java (line 112) system.hints                     1524,9518677
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,219 StatusLogger.java (line 112) system.HintsColumnFamily                  0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,219 StatusLogger.java (line 112) products.product                89984,7340032
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,220 StatusLogger.java (line 112) products.user                             0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,220 StatusLogger.java (line 112) products.y                                0,0
 INFO [FlushWriter:7] 2013-09-04 07:17:09,222 Memtable.java (line 461) Writing Memtable-product@1877827562(140596898/128974848 serialized/live bytes, 1982097 ops)
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,225 StatusLogger.java (line 112) system_auth.users                         0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,228 StatusLogger.java (line 112) system_traces.sessions                    0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,228 StatusLogger.java (line 112) system_traces.events                      0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,229 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 7532 ms for 4 collections, 973781352 used; max is 1031798784
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,230 StatusLogger.java (line 53) Pool Name                    Active   Pending   Blocked
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,230 StatusLogger.java (line 68) ReadStage                         0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,231 StatusLogger.java (line 68) RequestResponseStage              0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,231 StatusLogger.java (line 68) ReadRepairStage                   0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,232 StatusLogger.java (line 68) MutationStage                     0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,243 StatusLogger.java (line 68) ReplicateOnWriteStage             0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,244 StatusLogger.java (line 68) GossipStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,245 StatusLogger.java (line 68) AntiEntropyStage                  0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,245 StatusLogger.java (line 68) MigrationStage                    0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,246 StatusLogger.java (line 68) MemtablePostFlusher               1         5         0
 INFO [StorageServiceShutdownHook] 2013-09-04 07:17:09,262 ThriftServer.java (line 116) Stop listening to thrift clients
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,264 StatusLogger.java (line 68) FlushWriter                       1         4         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,300 StatusLogger.java (line 68) MiscStage                         0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,301 StatusLogger.java (line 68) commitlog_archiver                0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,301 StatusLogger.java (line 68) InternalResponseStage             0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,301 StatusLogger.java (line 68) HintedHandoff                     0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,302 StatusLogger.java (line 73) CompactionManager                 2         4
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,307 StatusLogger.java (line 85) MessagingService                n/a     181,1
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,336 StatusLogger.java (line 95) Cache Type                     Size                 Capacity               KeysToSave                                                         Provider
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,338 StatusLogger.java (line 96) KeyCache                        216                      936                      all                                                                 
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,338 StatusLogger.java (line 102) RowCache                          0                        0                      all              org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,388 StatusLogger.java (line 109) ColumnFamily                Memtable ops,data
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,391 StatusLogger.java (line 112) system.local                             4,52
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,391 StatusLogger.java (line 112) system.peers                              0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,392 StatusLogger.java (line 112) system.batchlog                           0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,392 StatusLogger.java (line 112) system.NodeIdInfo                         0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,392 StatusLogger.java (line 112) system.LocationInfo                       0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,393 StatusLogger.java (line 112) system.Schema                             0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,393 StatusLogger.java (line 112) system.Migrations                         0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,479 StatusLogger.java (line 112) system.schema_keyspaces                   0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,480 StatusLogger.java (line 112) system.schema_columns                     0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,481 StatusLogger.java (line 112) system.schema_columnfamilies                 0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,482 StatusLogger.java (line 112) system.IndexInfo                          0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,483 StatusLogger.java (line 112) system.range_xfers                        0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,483 StatusLogger.java (line 112) system.peer_events                        0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,483 StatusLogger.java (line 112) system.hints                     1540,9611391
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,484 StatusLogger.java (line 112) system.HintsColumnFamily                  0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,484 StatusLogger.java (line 112) products.product                92447,7340032
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,555 StatusLogger.java (line 112) products.user                             0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,556 StatusLogger.java (line 112) products.y                                0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,556 StatusLogger.java (line 112) system_auth.users                         0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,556 StatusLogger.java (line 112) system_traces.sessions                    0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:09,574 StatusLogger.java (line 112) system_traces.events                      0,0
 WARN [ScheduledTasks:1] 2013-09-04 07:17:09,574 GCInspector.java (line 142) Heap is 0.9437705947131645 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-09-04 07:17:09,575 StorageService.java (line 3618) Flushing CFS(Keyspace='system', ColumnFamily='hints') to relieve memory pressure
 INFO [ScheduledTasks:1] 2013-09-04 07:17:11,990 ColumnFamilyStore.java (line 630) Enqueuing flush of Memtable-hints@1707891156(9668763/9668763 serialized/live bytes, 1549 ops)
 INFO [StorageServiceShutdownHook] 2013-09-04 07:17:12,188 Server.java (line 151) Stop listening for CQL clients
 INFO [StorageServiceShutdownHook] 2013-09-04 07:17:12,188 Gossiper.java (line 1122) Announcing shutdown
 INFO [ScheduledTasks:1] 2013-09-04 07:17:12,992 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 2409 ms for 1 collections, 917569968 used; max is 1031798784
 INFO [ScheduledTasks:1] 2013-09-04 07:17:12,993 StatusLogger.java (line 53) Pool Name                    Active   Pending   Blocked
 INFO [ScheduledTasks:1] 2013-09-04 07:17:12,994 StatusLogger.java (line 68) ReadStage                         0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:12,996 StatusLogger.java (line 68) RequestResponseStage              0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:12,997 StatusLogger.java (line 68) ReadRepairStage                   0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:12,997 StatusLogger.java (line 68) MutationStage                     0         1         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:12,998 StatusLogger.java (line 68) ReplicateOnWriteStage             0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:12,999 StatusLogger.java (line 68) GossipStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,011 StatusLogger.java (line 68) AntiEntropyStage                  0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,012 StatusLogger.java (line 68) MigrationStage                    0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,013 StatusLogger.java (line 68) MemtablePostFlusher               1         6         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,013 StatusLogger.java (line 68) FlushWriter                       1         5         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,014 StatusLogger.java (line 68) MiscStage                         0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,015 StatusLogger.java (line 68) commitlog_archiver                0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,016 StatusLogger.java (line 68) InternalResponseStage             0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,026 StatusLogger.java (line 68) HintedHandoff                     0         0         0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,027 StatusLogger.java (line 73) CompactionManager                 2         4
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,027 StatusLogger.java (line 85) MessagingService                n/a       0,2
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,028 StatusLogger.java (line 95) Cache Type                     Size                 Capacity               KeysToSave                                                         Provider
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,073 StatusLogger.java (line 96) KeyCache                        216                      936                      all                                                                 
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,074 StatusLogger.java (line 102) RowCache                          0                        0                      all              org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,074 StatusLogger.java (line 109) ColumnFamily                Memtable ops,data
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,076 StatusLogger.java (line 112) system.local                             4,52
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,077 StatusLogger.java (line 112) system.peers                              0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,077 StatusLogger.java (line 112) system.batchlog                           0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,078 StatusLogger.java (line 112) system.NodeIdInfo                         0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,078 StatusLogger.java (line 112) system.LocationInfo                       0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,079 StatusLogger.java (line 112) system.Schema                             0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,083 StatusLogger.java (line 112) system.Migrations                         0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,083 StatusLogger.java (line 112) system.schema_keyspaces                   0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,084 StatusLogger.java (line 112) system.schema_columns                     0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,084 StatusLogger.java (line 112) system.schema_columnfamilies                 0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,085 StatusLogger.java (line 112) system.IndexInfo                          0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,085 StatusLogger.java (line 112) system.range_xfers                        0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,086 StatusLogger.java (line 112) system.peer_events                        0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,086 StatusLogger.java (line 112) system.hints                          9,52886
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,086 StatusLogger.java (line 112) system.HintsColumnFamily                  0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,087 StatusLogger.java (line 112) products.product              153502,11534336
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,093 StatusLogger.java (line 112) products.user                             0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,095 StatusLogger.java (line 112) products.y                                0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,096 StatusLogger.java (line 112) system_auth.users                         0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,098 StatusLogger.java (line 112) system_traces.sessions                    0,0
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,098 StatusLogger.java (line 112) system_traces.events                      0,0
 WARN [ScheduledTasks:1] 2013-09-04 07:17:13,101 GCInspector.java (line 142) Heap is 0.8892915772228707 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-09-04 07:17:13,101 StorageService.java (line 3618) Flushing CFS(Keyspace='products', ColumnFamily='product') to relieve memory pressure
 INFO [ScheduledTasks:1] 2013-09-04 07:17:13,107 ColumnFamilyStore.java (line 630) Enqueuing flush of Memtable-product@1107806067(11494898/11534336 serialized/live bytes, 155140 ops)
 INFO [StorageServiceShutdownHook] 2013-09-04 07:17:14,189 MessagingService.java (line 685) Waiting for messaging service to quiesce
 INFO [ACCEPT-/82.196.1.207] 2013-09-04 07:17:14,190 MessagingService.java (line 895) MessagingService shutting down server thread.
 INFO [FlushWriter:7] 2013-09-04 07:17:25,219 Memtable.java (line 495) Completed flushing /var/lib/cassandra/data/products/product/products-product-ic-442-Data.db (37718870 bytes) for commitlog position ReplayPosition(segmentId=1378271785146, position=20998285)
 INFO [FlushWriter:7] 2013-09-04 07:17:25,232 Memtable.java (line 461) Writing Memtable-hints@887419340(30631648/30631648 serialized/live bytes, 5011 ops)


-----------------------------------------------------


 WARN [ScheduledTasks:1] 2013-09-04 06:40:15,772 GCInspector.java (line 136) Heap is 0.9995487569793453 full.  You may need to reduce memtable and/or cache sizes.  Cassandra is now reducing cache sizes to free up memory.  Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-09-04 06:40:15,773 AutoSavingCache.java (line 185) Reducing KeyCache capacity from 51380224 to 96830 to reduce memory pressure
 WARN [ScheduledTasks:1] 2013-09-04 06:40:15,774 GCInspector.java (line 142) Heap is 0.9995487569793453 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-09-04 06:40:15,774 StorageService.java (line 3618) Flushing CFS(Keyspace='system', ColumnFamily='hints') to relieve memory pressure
 INFO [HANDSHAKE-/37.139.24.133] 2013-09-04 06:40:15,791 OutboundTcpConnection.java (line 399) Handshaking version with /37.139.24.133
 INFO [HANDSHAKE-/37.139.24.133] 2013-09-04 06:40:15,809 OutboundTcpConnection.java (line 399) Handshaking version with /37.139.24.133
 INFO [HANDSHAKE-/37.139.24.133] 2013-09-04 06:40:15,887 OutboundTcpConnection.java (line 399) Handshaking version with /37.139.24.133
 WARN [Native-Transport-Requests:965] 2013-09-04 06:40:21,117 Slf4JLogger.java (line 76) An exception was thrown by a user handler while handling an exception event ([id: 0x8ecd02f0, /37.139.31.126:60608 :> /146.185.135.226:9042] EXCEPTION: java.lang.AssertionError: java.lang.InterruptedException)
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
	at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:61)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
	at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:145)
	at org.jboss.netty.handler.execution.ExecutionHandler.handleUpstream(ExecutionHandler.java:172)
	at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:61)
	at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:61)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:378)
	at org.jboss.
[... more stack trace ...]
 WARN [Native-Transport-Requests:965] 2013-09-04 06:40:25,878 Slf4JLogger.java (line 76) An exception was thrown by a user handler while handling an exception event ([id: 0x8ecd02f0, /37.139.31.126:60608 :> /146.185.135.226:9042] EXCEPTION: java.lang.AssertionError: java.lang.InterruptedException)
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
	at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:61)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
	at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:145)
	at org.jboss.netty.handler.execution.ExecutionHandler.handleUpstream(ExecutionHandler.java:172)
	at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:61)
	at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:61)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:378)
	at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:533)
	at org.jboss.netty.channel.Channels$7.run(Channels.java:507)
	at org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:41)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:407)
	at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:35)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:379)
	at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:35)
	at org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
	at org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:504)
	at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:47)
	at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
	at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)
 INFO [StorageServiceShutdownHook] 2013-09-04 06:40:25,923 Server.java (line 151) Stop listening for CQL clients
 INFO [StorageServiceShutdownHook] 2013-09-04 06:40:25,924 Gossiper.java (line 1122) Announcing shutdown
 INFO [StorageServiceShutdownHook] 2013-09-04 06:40:29,348 MessagingService.java (line 685) Waiting for messaging service to quiesce
 INFO [ACCEPT-/146.185.135.226] 2013-09-04 06:40:29,355 MessagingService.java (line 895) MessagingService shutting down server thread.
 INFO [HintedHandoff:2] 2013-09-04 06:40:59,235 HintedHandOffManager.java (line 418) Timed out replaying hints to /82.196.1.207; aborting (0 delivered)
 INFO [FlushWriter:127] 2013-09-04 06:43:14,080 Memtable.java (line 495) Completed flushing /var/lib/cassandra/data/products/product/products-product-ic-296-Data.db (11135924 bytes) for commitlog position ReplayPosition(segmentId=1378049078572, position=23650216)
 INFO [FlushWriter:127] 2013-09-04 06:43:14,109 Memtable.java (line 461) Writing Memtable-product@800619397(36612976/34603008 serialized/live bytes, 501581 ops)
 INFO [FlushWriter:127] 2013-09-04 06:43:50,826 Memtable.java (line 495) Completed flushing /var/lib/cassandra/data/products/product/products-product-ic-297-Data.db (10377307 bytes) for commitlog position ReplayPosition(segmentId=1378049078574, position=1219563)
 INFO [FlushWriter:127] 2013-09-04 06:43:50,854 Memtable.java (line 461) Writing Memtable-product@2016963286(25537061/24117248 serialized/live bytes, 329765 ops)
 INFO [CompactionExecutor:222] 2013-09-04 06:43:50,858 CompactionTask.java (line 105) Compacting [SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-297-Data.db'), SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-292-Data.db'), SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-296-Data.db'), SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-294-Data.db')]
 INFO [FlushWriter:127] 2013-09-04 06:44:10,401 Memtable.java (line 495) Completed flushing /var/lib/cassandra/data/products/product/products-product-ic-298-Data.db (7082158 bytes) for commitlog position ReplayPosition(segmentId=1378049078574, position=31970427)
 INFO [FlushWriter:127] 2013-09-04 06:44:10,442 Memtable.java (line 461) Writing Memtable-product@980202276(5866214/6291456 serialized/live bytes, 79296 ops)
 INFO [FlushWriter:127] 2013-09-04 06:44:15,308 Memtable.java (line 495) Completed flushing /var/lib/cassandra/data/products/product/products-product-ic-300-Data.db (1725129 bytes) for commitlog position ReplayPosition(segmentId=1378049078575, position=5562931)
 INFO [FlushWriter:127] 2013-09-04 06:44:15,309 Memtable.java (line 461) Writing Memtable-product@1335531705(45116775/42991616 serialized/live bytes, 622343 ops)
 INFO [FlushWriter:127] 2013-09-04 06:44:52,124 Memtable.java (line 495) Completed flushing /var/lib/cassandra/data/products/product/products-product-ic-301-Data.db (12793982 bytes) for commitlog position ReplayPosition(segmentId=1378049078576, position=28462140)
 INFO [CompactionExecutor:222] 2013-09-04 07:02:09,277 CompactionTask.java (line 262) Compacted 4 sstables to [/var/lib/cassandra/data/products/product/products-product-ic-299,].  65,775,866 bytes to 46,875,492 (~71% of original) in 1,098,419ms = 0.040698MB/s.  8,553 total rows, 5,403 unique.  Row merge counts were {1:2538, 2:2580, 3:285, 4:0, }
 INFO [CompactionExecutor:223] 2013-09-04 07:02:09,399 CompactionTask.java (line 105) Compacting [SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-299-Data.db'), SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-298-Data.db'), SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-301-Data.db'), SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-300-Data.db')]
 INFO [CompactionExecutor:223] 2013-09-04 07:22:11,234 CompactionTask.java (line 262) Compacted 4 sstables to [/var/lib/cassandra/data/products/product/products-product-ic-302,].  68,476,761 bytes to 53,738,703 (~78% of original) in 1,201,835ms = 0.042642MB/s.  9,076 total rows, 6,753 unique.  Row merge counts were {1:4864, 2:1470, 3:404, 4:15, }
 INFO [CompactionExecutor:224] 2013-09-04 07:22:11,317 CompactionTask.java (line 105) Compacting [SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-274-Data.db'), SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-293-Data.db'), SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-265-Data.db'), SSTableReader(path='/var/lib/cassandra/data/products/product/products-product-ic-302-Data.db')]
 WARN [StorageServiceShutdownHook] 2013-09-04 07:40:29,355 StorageProxy.java (line 1697) Some hints were not written before shutdown.  This is not supposed to happen.  You should (a) run repair, and (b) file a bug report




Re: Cassandra shuts down; was:Cassandra crashes

Posted by Romain HARDOUIN <ro...@urssaf.fr>.
Maybe you should include the end of Cassandra logs.
What comes to my mind when I read your first post is OOM killer. 
But what you describe later is not the case.
Just to be sure, have you checked /var/log/messages?

Romain



De :    Jan Algermissen <ja...@nordsc.com>
A :     user@cassandra.apache.org, 
Date :  04/09/2013 10:52
Objet : Re: Cassandra shuts down; was:Cassandra crashes



The subject line isn't appropriate - the servers do not crash but shut 
down. Since the log messages appear several lines before the end of the 
log file, I only saw afterwards. Excuse the confusion.

Jan


On 04.09.2013, at 10:44, Jan Algermissen <ja...@nordsc.com> 
wrote:

> Hi,
> 
> I have set up C* in a very limited environment: 3 VMs at digitalocean 
with 2GB RAM and 40GB SSDs, so my expectations about overall performance 
are low.
> 
> Keyspace uses replication level of 2.
> 
> I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small 
texts, 300.000 wide rows effektively) in a quite 'agressive' way, using 
java-driver and async update statements.
> 
> After a while of importing data, I start seeing timeouts reported by the 
driver:
> 
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra 
timeout during write query at consistency ONE (1 replica were required but 
only 0 acknowledged the write
> 
> and then later, host-unavailability exceptions:
> 
> com.datastax.driver.core.exceptions.UnavailableException: Not enough 
replica available for query at consistency ONE (1 required but only 0 
alive).
> 
> Looking at the 3 hosts, I see two C*s went down - which explains that I 
still see some writes succeeding (that must be the one host left, 
satisfying the consitency level ONE).
> 
> 
> The logs tell me AFAIU that the servers shutdown due to reaching the 
heap size limit.
> 
> I am irritated by the fact that the instances (it seems) shut themselves 
down instead of limiting their amount of work. I understand that I need to 
tweak the configuration and likely get more RAM, but still, I would 
actually be satisfied with reduced service (and likely more timeouts in 
the client).  Right now it looks as if I would have to slow down the 
client 'artificially'  to prevent the loss of hosts - does that make 
sense?
> 
> Can anyone explain whether this is intended behavior, meaning I'll just 
have to accept the self-shutdown of the hosts? Or alternatively, what data 
I should collect to investigate the cause further?
> 
> Jan
> 
> 
> 
> 
> 



Re: Cassandra shuts down; was:Cassandra crashes

Posted by Jan Algermissen <ja...@nordsc.com>.
The subject line isn't appropriate - the servers do not crash but shut down. Since the log messages appear several lines before the end of the log file, I only saw afterwards. Excuse the confusion.

Jan


On 04.09.2013, at 10:44, Jan Algermissen <ja...@nordsc.com> wrote:

> Hi,
> 
> I have set up C* in a very limited environment: 3 VMs at digitalocean with 2GB RAM and 40GB SSDs, so my expectations about overall performance are low.
> 
> Keyspace uses replication level of 2.
> 
> I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small texts, 300.000 wide rows effektively) in a quite 'agressive' way, using java-driver and async update statements.
> 
> After a while of importing data, I start seeing timeouts reported by the driver:
> 
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write
> 
> and then later, host-unavailability exceptions:
> 
> com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive).
> 
> Looking at the 3 hosts, I see two C*s went down - which explains that I still see some writes succeeding (that must be the one host left, satisfying the consitency level ONE).
> 
> 
> The logs tell me AFAIU that the servers shutdown due to reaching the heap size limit.
> 
> I am irritated by the fact that the instances (it seems) shut themselves down instead of limiting their amount of work. I understand that I need to tweak the configuration and likely get more RAM, but still, I would actually be satisfied with reduced service (and likely more timeouts in the client).  Right now it looks as if I would have to slow down the client 'artificially'  to prevent the loss of hosts - does that make sense?
> 
> Can anyone explain whether this is intended behavior, meaning I'll just have to accept the self-shutdown of the hosts? Or alternatively, what data I should collect to investigate the cause further?
> 
> Jan
> 
> 
> 
> 
> 


Re: Cassandra crashes - solved

Posted by Jan Algermissen <ja...@nordsc.com>.
On 06.09.2013, at 17:07, Jan Algermissen <ja...@nordsc.com> wrote:

> 
> On 06.09.2013, at 13:12, Alex Major <al...@gmail.com> wrote:
> 
>> Have you changed the appropriate config settings so that Cassandra will run with only 2GB RAM? You shouldn't find the nodes go down.
>> 
>> Check out this blog post http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/ , it outlines the configuration settings needed to run Cassandra on 64MB RAM and might give you some insights.
> 
> Yes, I have my fingers on the knobs and have also seen the article you mention - very helpful indeed. As well as the replies so far. Thanks very much.
> 
> However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my data import :-(

The problem for me was

  in_memory_compaction_limit_in_mb: 1

it seems that the combination of my rather large rows (70 cols each) in combination with the slower two-pass compaction process mentioned in the comment of the config switch caused the "java.lang.AssertionError: incorrect row data size" exceptions.

After turning in_memory_compaction_limit_in_mb back to 64 all I am getting are write tmeouts.

AFAIU that is fine because now C* is stable and i all have is a capacity problem solvable with more nodes or more RAM (maybe, depends on whether IO is an issue).

Jan



> 
> Now, while it would be easy to scale out and up a bit until the default config of C* is sufficient, I really like to dive deep and try to understand why the thing is still going down, IOW, which of my config settings is so darn wrong that in most cases kill -9 remains the only way to shutdown the Java process in the end.
> 
> 
> The problem seems to be the heap size (set to MAX_HEAP_SIZE="640M"   and HEAP_NEWSIZE="120M" ) in combination with some cassandra activity that demands too much heap, right?
> 
> So how do I find out what activity this is and how do I sufficiently reduce that activity.
> 
> What bugs me in general is that AFAIU C* is so eager at giving massive write speed, that it sort of forgets to protect itself from client demand. I would very much like to understand why and how that happens.  I mean: no matter how many clients are flooding the database, it should not die due to out of memory situations, regardless of any configuration specifics, or?
> 
> 
> tl;dr
> 
> Currently my client side (with java-driver) after a while reports more and more timeouts and then the following exception:
> 
> com.datastax.driver.core.ex
> ceptions.DriverInternalError: An unexpected error occured server side: java.lang.OutOfMemoryError: unable 
> to create new native thread ;
> 
> On the server side, my cluster remains more or less in this condition:
> 
> DN  xxxxx     71,33 MB   256     34,1%  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  rack1
> UN  xxxxx  189,38 MB  256     32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f  rack1
> UN  xxxxx    198,49 MB  256     33,9%  0c2931a9-6582-48f2-b65a-e406e0bf1e56  rack1
> 
> The host that is down (it is the seed host, if that matters) still shows the running java process, but I cannot shut down cassandra or connect with nodetool, hence kill -9 to the rescue.
> 
> In that host, I still see a load of around 1.
> 
> jstack -F lists 892 threads, all blocked, except for 5 inactive ones.
> 
> 
> The system.log after a few seconds of import shows the following exception:
> 
> java.lang.AssertionError: incorrect row data size 771030 written to /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; correct is 771200
>        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
>        at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>        at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>        at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>        at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>        at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:724)
> 
> 
> And then, after about 2 minutes there are out of memory errors:
> 
> ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor
> :5,1,main]
> java.lang.OutOfMemoryError: unable to create new native thread
>        at java.lang.Thread.start0(Native Method)
>        at java.lang.Thread.start(Thread.java:693)
>        at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.<init>(ParallelCompactionIterable.java:296)
>        at org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73)
>        at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120)
>        at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>        at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>        at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>        at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:724)
> ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,685 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor:
> 
> 
> On the other hosts the log looks similar, but these keep running, desipte the OutOfMemory Errors.
> 
> 
> 
> 
> Jan
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> 
>> 
>> On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen <ja...@nordsc.com> wrote:
>> Hi,
>> 
>> I have set up C* in a very limited environment: 3 VMs at digitalocean with 2GB RAM and 40GB SSDs, so my expectations about overall performance are low.
>> 
>> Keyspace uses replication level of 2.
>> 
>> I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small texts, 300.000 wide rows effektively) in a quite 'agressive' way, using java-driver and async update statements.
>> 
>> After a while of importing data, I start seeing timeouts reported by the driver:
>> 
>> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write
>> 
>> and then later, host-unavailability exceptions:
>> 
>> com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive).
>> 
>> Looking at the 3 hosts, I see two C*s went down - which explains that I still see some writes succeeding (that must be the one host left, satisfying the consitency level ONE).
>> 
>> 
>> The logs tell me AFAIU that the servers shutdown due to reaching the heap size limit.
>> 
>> I am irritated by the fact that the instances (it seems) shut themselves down instead of limiting their amount of work. I understand that I need to tweak the configuration and likely get more RAM, but still, I would actually be satisfied with reduced service (and likely more timeouts in the client).  Right now it looks as if I would have to slow down the client 'artificially'  to prevent the loss of hosts - does that make sense?
>> 
>> Can anyone explain whether this is intended behavior, meaning I'll just have to accept the self-shutdown of the hosts? Or alternatively, what data I should collect to investigate the cause further?
>> 
>> Jan
>> 
>> 
>> 
>> 
>> 
>> 
> 


Re: Cassandra crashes

Posted by Jan Algermissen <ja...@nordsc.com>.
Hi John,


On 10.09.2013, at 01:06, John Sanda <jo...@gmail.com> wrote:

> Check your file limits - 
> http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docs&version=1.2&file=#cassandra/troubleshooting/trblshootInsufficientResources_r.html

Did that already - without success.

Meanwhile I upgraded servers and I am getting closer.

I assume by now that heavy writes of rows with considerable size (as in: more than a couple of numbers) require a certain amount of RAM due to the C* architecture.

IOW, my through put limit is how fast I can get it to disk, but the minimal memory I need for that cannot be tuned down but depends on the size of the stuff written to C*. (Due to C* doing its memtable magic) to save using sequential IO.

It is an interesting trade off. (if I get it right by now :-)

Jan

> 
> On Friday, September 6, 2013, Jan Algermissen wrote:
> 
> On 06.09.2013, at 13:12, Alex Major <al...@gmail.com> wrote:
> 
> > Have you changed the appropriate config settings so that Cassandra will run with only 2GB RAM? You shouldn't find the nodes go down.
> >
> > Check out this blog post http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/ , it outlines the configuration settings needed to run Cassandra on 64MB RAM and might give you some insights.
> 
> Yes, I have my fingers on the knobs and have also seen the article you mention - very helpful indeed. As well as the replies so far. Thanks very much.
> 
> However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my data import :-(
> 
> Now, while it would be easy to scale out and up a bit until the default config of C* is sufficient, I really like to dive deep and try to understand why the thing is still going down, IOW, which of my config settings is so darn wrong that in most cases kill -9 remains the only way to shutdown the Java process in the end.
> 
> 
> The problem seems to be the heap size (set to MAX_HEAP_SIZE="640M"   and HEAP_NEWSIZE="120M" ) in combination with some cassandra activity that demands too much heap, right?
> 
> So how do I find out what activity this is and how do I sufficiently reduce that activity.
> 
> What bugs me in general is that AFAIU C* is so eager at giving massive write speed, that it sort of forgets to protect itself from client demand. I would very much like to understand why and how that happens.  I mean: no matter how many clients are flooding the database, it should not die due to out of memory situations, regardless of any configuration specifics, or?
> 
> 
> tl;dr
> 
> Currently my client side (with java-driver) after a while reports more and more timeouts and then the following exception:
> 
> com.datastax.driver.core.ex
> ceptions.DriverInternalError: An unexpected error occured server side: java.lang.OutOfMemoryError: unable
> to create new native thread ;
> 
> On the server side, my cluster remains more or less in this condition:
> 
> DN  xxxxx     71,33 MB   256     34,1%  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  rack1
> UN  xxxxx  189,38 MB  256     32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f  rack1
> UN  xxxxx    198,49 MB  256     33,9%  0c2931a9-6582-48f2-b65a-e406e0bf1e56  rack1
> 
> The host that is down (it is the seed host, if that matters) still shows the running java process, but I cannot shut down cassandra or connect with nodetool, hence kill -9 to the rescue.
> 
> In that host, I still see a load of around 1.
> 
> jstack -F lists 892 threads, all blocked, except for 5 inactive ones.
> 
> 
> The system.log after a few seconds of import shows the following exception:
> 
> java.lang.AssertionError: incorrect row data size 771030 written to /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; correct is 771200
>         at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
>         at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>         at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> 
> 
> And then, after about 2 minutes there are out of memory errors:
> 
>  ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor
> :5,1,main]
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:693)
>         at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.<init>(ParallelCompactionIterable.java:296)
>         at org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73)
>         at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120)
>         at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,685 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor:
> 
> 
> On the other hosts the log looks similar, but these keep running, desipte the OutOfMemory Errors.
> 
> 
> 
> 
> Jan
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> >
> >
> > On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen <ja...@nordsc.com> wrote:
> > Hi,
> >
> > I have set up C* in a very limited environment: 3 VMs at digitalocean with 2GB RAM and 40GB SSDs, so my expectations about overall performance are low.
> >
> > Keyspace uses replication level of 2.
> >
> > I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small texts, 300.000 wide rows effektively) in a quite 'agressive' way, using java-driver and async update statements.
> >
> > After a while of importing data, I start seeing timeouts reported by the driver:
> >
> > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write
> >
> > and then later, host-unavailability exceptions:
> >
> > com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive).
> >
> > Looking at the 3 hosts, I see two C*s went down - which explains that I still see some writes succeeding (that must be the one host left, satisfying the consitency level ONE).
> >
> >
> > The logs tell me AFAIU that the servers shutdown due to reaching the heap size limit.
> >
> > I am irritated by the fact that the instances (it seems) shut themselves down instead of limiting their amount of work. I understand that I need to tweak the configuration and likely get more RAM, but still, I would actually be satisfied with reduced service (and likely more timeouts in the client).  Right now it looks as if I would have to slow down the client 'artificially'  to prevent the loss of hosts - does that make sense?
> >
> > Can anyone explain whether this is intended behavior, meaning I'll just have to accept the self-shutdown of the hosts? Or alternatively, what data I should collect to investigate the cause further?
> >
> > Jan
> >
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> 
> - John


Re: Cassandra crashes

Posted by John Sanda <jo...@gmail.com>.
Check your file limits -
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docs&version=1.2&file=#cassandra/troubleshooting/trblshootInsufficientResources_r.html

On Friday, September 6, 2013, Jan Algermissen wrote:

>
> On 06.09.2013, at 13:12, Alex Major <al3xdm@gmail.com <javascript:;>>
> wrote:
>
> > Have you changed the appropriate config settings so that Cassandra will
> run with only 2GB RAM? You shouldn't find the nodes go down.
> >
> > Check out this blog post
> http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/, it outlines the configuration settings needed to run Cassandra on 64MB
> RAM and might give you some insights.
>
> Yes, I have my fingers on the knobs and have also seen the article you
> mention - very helpful indeed. As well as the replies so far. Thanks very
> much.
>
> However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my
> data import :-(
>
> Now, while it would be easy to scale out and up a bit until the default
> config of C* is sufficient, I really like to dive deep and try to
> understand why the thing is still going down, IOW, which of my config
> settings is so darn wrong that in most cases kill -9 remains the only way
> to shutdown the Java process in the end.
>
>
> The problem seems to be the heap size (set to MAX_HEAP_SIZE="640M"   and
> HEAP_NEWSIZE="120M" ) in combination with some cassandra activity that
> demands too much heap, right?
>
> So how do I find out what activity this is and how do I sufficiently
> reduce that activity.
>
> What bugs me in general is that AFAIU C* is so eager at giving massive
> write speed, that it sort of forgets to protect itself from client demand.
> I would very much like to understand why and how that happens.  I mean: no
> matter how many clients are flooding the database, it should not die due to
> out of memory situations, regardless of any configuration specifics, or?
>
>
> tl;dr
>
> Currently my client side (with java-driver) after a while reports more and
> more timeouts and then the following exception:
>
> com.datastax.driver.core.ex
> ceptions.DriverInternalError: An unexpected error occured server side:
> java.lang.OutOfMemoryError: unable
> to create new native thread ;
>
> On the server side, my cluster remains more or less in this condition:
>
> DN  xxxxx     71,33 MB   256     34,1%
>  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  rack1
> UN  xxxxx  189,38 MB  256     32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f
>  rack1
> UN  xxxxx    198,49 MB  256     33,9%
>  0c2931a9-6582-48f2-b65a-e406e0bf1e56  rack1
>
> The host that is down (it is the seed host, if that matters) still shows
> the running java process, but I cannot shut down cassandra or connect with
> nodetool, hence kill -9 to the rescue.
>
> In that host, I still see a load of around 1.
>
> jstack -F lists 892 threads, all blocked, except for 5 inactive ones.
>
>
> The system.log after a few seconds of import shows the following exception:
>
> java.lang.AssertionError: incorrect row data size 771030 written to
> /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db;
> correct is 771200
>         at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>         at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
>
> And then, after about 2 minutes there are out of memory errors:
>
>  ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java
> (line 192) Exception in thread Thread[CompactionExecutor
> :5,1,main]
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:693)
>         at
> org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.<init>(ParallelCompactionIterable.java:296)
>         at
> org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120)
>         at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>         at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,685 CassandraDaemon.java
> (line 192) Exception in thread Thread[CompactionExecutor:
>
>
> On the other hosts the log looks similar, but these keep running, desipte
> the OutOfMemory Errors.
>
>
>
>
> Jan
>
>
>
>
>
>
>
>
>
>
>
>
>
> >
> >
> > On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen <
> jan.algermissen@nordsc.com <javascript:;>> wrote:
> > Hi,
> >
> > I have set up C* in a very limited environment: 3 VMs at digitalocean
> with 2GB RAM and 40GB SSDs, so my expectations about overall performance
> are low.
> >
> > Keyspace uses replication level of 2.
> >
> > I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small
> texts, 300.000 wide rows effektively) in a quite 'agressive' way, using
> java-driver and async update statements.
> >
> > After a while of importing data, I start seeing timeouts reported by the
> driver:
> >
> > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
> timeout during write query at consistency ONE (1 replica were required but
> only 0 acknowledged the write
> >
> > and then later, host-unavailability exceptions:
> >
> > com.datastax.driver.core.exceptions.UnavailableException: Not enough
> replica available for query at consistency ONE (1 required but only 0
> alive).
> >
> > Looking at the 3 hosts, I see two C*s went down - which explains that I
> still see some writes succeeding (that must be the one host left,
> satisfying the consitency level ONE).
> >
> >
> > The logs tell me AFAIU that the servers shutdown due to reaching the
> heap size limit.
> >
> > I am irritated by the fact that the instances (it seems) shut themselves
> down instead of limiting their amount of work. I understand that I need to
> tweak the configuration and likely get more RAM, but still, I would
> actually be satisfied with reduced service (and likely more timeouts in the
> client).  Right now it looks as if I would have to slow down the client
> 'artificially'  to prevent the loss of hosts - does that make sense?
> >
> > Can anyone explain whether this is intended behavior, meaning I'll just
> have to accept the self-shutdown of the hosts? Or alternatively, what data
> I should collect to investigate the cause further?
> >
> > Jan
> >
> >
> >
> >
> >
> >
>
>

-- 

- John

Re: Cassandra crashes

Posted by Jan Algermissen <ja...@nordsc.com>.
On 06.09.2013, at 13:12, Alex Major <al...@gmail.com> wrote:

> Have you changed the appropriate config settings so that Cassandra will run with only 2GB RAM? You shouldn't find the nodes go down.
> 
> Check out this blog post http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/ , it outlines the configuration settings needed to run Cassandra on 64MB RAM and might give you some insights.

Yes, I have my fingers on the knobs and have also seen the article you mention - very helpful indeed. As well as the replies so far. Thanks very much.

However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my data import :-(

Now, while it would be easy to scale out and up a bit until the default config of C* is sufficient, I really like to dive deep and try to understand why the thing is still going down, IOW, which of my config settings is so darn wrong that in most cases kill -9 remains the only way to shutdown the Java process in the end.


The problem seems to be the heap size (set to MAX_HEAP_SIZE="640M"   and HEAP_NEWSIZE="120M" ) in combination with some cassandra activity that demands too much heap, right?

So how do I find out what activity this is and how do I sufficiently reduce that activity.

What bugs me in general is that AFAIU C* is so eager at giving massive write speed, that it sort of forgets to protect itself from client demand. I would very much like to understand why and how that happens.  I mean: no matter how many clients are flooding the database, it should not die due to out of memory situations, regardless of any configuration specifics, or?


tl;dr

Currently my client side (with java-driver) after a while reports more and more timeouts and then the following exception:

com.datastax.driver.core.ex
ceptions.DriverInternalError: An unexpected error occured server side: java.lang.OutOfMemoryError: unable 
to create new native thread ;

On the server side, my cluster remains more or less in this condition:

DN  xxxxx     71,33 MB   256     34,1%  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  rack1
UN  xxxxx  189,38 MB  256     32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f  rack1
UN  xxxxx    198,49 MB  256     33,9%  0c2931a9-6582-48f2-b65a-e406e0bf1e56  rack1

The host that is down (it is the seed host, if that matters) still shows the running java process, but I cannot shut down cassandra or connect with nodetool, hence kill -9 to the rescue.

In that host, I still see a load of around 1.

jstack -F lists 892 threads, all blocked, except for 5 inactive ones.


The system.log after a few seconds of import shows the following exception:

java.lang.AssertionError: incorrect row data size 771030 written to /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; correct is 771200
        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
        at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
        at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
        at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
        at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)


And then, after about 2 minutes there are out of memory errors:

 ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor
:5,1,main]
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:693)
        at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.<init>(ParallelCompactionIterable.java:296)
        at org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73)
        at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120)
        at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
        at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
        at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,685 CassandraDaemon.java (line 192) Exception in thread Thread[CompactionExecutor:


On the other hosts the log looks similar, but these keep running, desipte the OutOfMemory Errors.




Jan













> 
> 
> On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen <ja...@nordsc.com> wrote:
> Hi,
> 
> I have set up C* in a very limited environment: 3 VMs at digitalocean with 2GB RAM and 40GB SSDs, so my expectations about overall performance are low.
> 
> Keyspace uses replication level of 2.
> 
> I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small texts, 300.000 wide rows effektively) in a quite 'agressive' way, using java-driver and async update statements.
> 
> After a while of importing data, I start seeing timeouts reported by the driver:
> 
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write
> 
> and then later, host-unavailability exceptions:
> 
> com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive).
> 
> Looking at the 3 hosts, I see two C*s went down - which explains that I still see some writes succeeding (that must be the one host left, satisfying the consitency level ONE).
> 
> 
> The logs tell me AFAIU that the servers shutdown due to reaching the heap size limit.
> 
> I am irritated by the fact that the instances (it seems) shut themselves down instead of limiting their amount of work. I understand that I need to tweak the configuration and likely get more RAM, but still, I would actually be satisfied with reduced service (and likely more timeouts in the client).  Right now it looks as if I would have to slow down the client 'artificially'  to prevent the loss of hosts - does that make sense?
> 
> Can anyone explain whether this is intended behavior, meaning I'll just have to accept the self-shutdown of the hosts? Or alternatively, what data I should collect to investigate the cause further?
> 
> Jan
> 
> 
> 
> 
> 
> 


Re: Cassandra crashes

Posted by Alex Major <al...@gmail.com>.
Have you changed the appropriate config settings so that Cassandra will run
with only 2GB RAM? You shouldn't find the nodes go down.

Check out this blog post
http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/,
it outlines the configuration settings needed to run Cassandra on 64MB
RAM and might give you some insights.


On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen
<ja...@nordsc.com>wrote:

> Hi,
>
> I have set up C* in a very limited environment: 3 VMs at digitalocean with
> 2GB RAM and 40GB SSDs, so my expectations about overall performance are low.
>
> Keyspace uses replication level of 2.
>
> I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small
> texts, 300.000 wide rows effektively) in a quite 'agressive' way, using
> java-driver and async update statements.
>
> After a while of importing data, I start seeing timeouts reported by the
> driver:
>
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
> timeout during write query at consistency ONE (1 replica were required but
> only 0 acknowledged the write
>
> and then later, host-unavailability exceptions:
>
> com.datastax.driver.core.exceptions.UnavailableException: Not enough
> replica available for query at consistency ONE (1 required but only 0
> alive).
>
> Looking at the 3 hosts, I see two C*s went down - which explains that I
> still see some writes succeeding (that must be the one host left,
> satisfying the consitency level ONE).
>
>
> The logs tell me AFAIU that the servers shutdown due to reaching the heap
> size limit.
>
> I am irritated by the fact that the instances (it seems) shut themselves
> down instead of limiting their amount of work. I understand that I need to
> tweak the configuration and likely get more RAM, but still, I would
> actually be satisfied with reduced service (and likely more timeouts in the
> client).  Right now it looks as if I would have to slow down the client
> 'artificially'  to prevent the loss of hosts - does that make sense?
>
> Can anyone explain whether this is intended behavior, meaning I'll just
> have to accept the self-shutdown of the hosts? Or alternatively, what data
> I should collect to investigate the cause further?
>
> Jan
>
>
>
>
>
>