You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by ad...@panasiangroup.com on 2013/04/04 10:27:42 UTC

Cassandra services down frequently [Version 1.1.4]

Hi,

We are running 4 nodes Cassandra cluster (1.1.4) with Replica Factor 2  
(DC 1) and Replica Factor 1 (DC 2) in two differnet data cnters with  
network topology. Our machines are having 16GB RAM and 8 core with two  
hard drives.

# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
Address         DC          Rack        Status State   Load             
Effective-Ownership Token
                                                                        
                      169417178424467235000914166253263322299
10.0.0.3        DC1         RAC1        Up     Normal  91.93 GB         
66.67%              0
10.0.0.4        DC1         RAC1        Up     Normal  84.88 GB         
66.67%              56713727820156410577229101238628035242
10.0.0.15       DC1         RAC1        Up     Normal  82.51 GB         
66.67%              113427455640312821154458202477256070484
10.40.1.103     DC2         RAC1        Up     Normal  303.2 MB         
100.00%             169417178424467235000914166253263322299

# java -version
java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)

After some time (1 hour / 2 hour) cassandra shut services on one or  
two nodes with follwoing errors;

============================================================
  INFO 11:01:25,527 GC for ConcurrentMarkSweep: 1968 ms for 2  
collections, 3817667464 used; max is 4093640704
  INFO 11:01:42,838 GC for ConcurrentMarkSweep: 1828 ms for 2  
collections, 3850830504 used; max is 4093640704
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid27363.hprof ...
Heap dump file created [4664912349 bytes in 44.731 secs]
ERROR 11:02:41,156 Exception in thread Thread[CompactionExecutor:87,1,main]
java.lang.OutOfMemoryError: Java heap space
         at  
org.apache.cassandra.io.util.FastByteArrayOutputStream.expand(FastByteArrayOutputStream.java:104)
         at  
org.apache.cassandra.io.util.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:220)
         at java.io.DataOutputStream.write(DataOutputStream.java:90)
         at  
org.apache.cassandra.io.util.DataOutputBuffer.write(DataOutputBuffer.java:61)
         at  
org.apache.cassandra.utils.ByteBufferUtil.write(ByteBufferUtil.java:328)
         at  
org.apache.cassandra.utils.ByteBufferUtil.writeWithLength(ByteBufferUtil.java:315)
         at  
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:62)
         at  
org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:366)
         at  
org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:339)
         at  
org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:89)
         at  
org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:138)
         at  
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:156)
         at  
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
         at  
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
         at  
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
         at  
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
         at  
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
         at  
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
         at java.lang.Thread.run(Thread.java:662)
  INFO 11:02:41,373 Stop listening to thrift clients
  INFO 11:02:41,376 InetAddress /10.0.0.15 is now dead.
  INFO 11:02:41,376 InetAddress /10.0.0.3 is now dead.
  INFO 11:02:41,377 InetAddress /10.40.1.103 is now dead.
  INFO 11:02:41,397 InetAddress /10.0.0.3 is now UP
  INFO 11:02:41,397 InetAddress /10.0.0.15 is now UP
  INFO 11:02:41,398 InetAddress /10.40.1.103 is now UP
  INFO 11:02:41,398 Started hinted handoff for token: 0 with IP: /10.0.0.3
  INFO 11:02:41,450 Announcing shutdown
  INFO 11:02:48,184 GC for ConcurrentMarkSweep: 1887 ms for 2  
collections, 2234362128 used; max is 4093640704
  INFO 11:02:48,206 Waiting for messaging service to quiesce
  INFO 11:02:48,207 MessagingService shutting down server thread.
============================================================

Our cassandra.yaml configurations are as under;

============================================================
cluster_name: 'ABC Cluster'
initial_token: 0
hinted_handoff_enabled: true
max_hint_window_in_ms: 2147483647 # one hour
hinted_handoff_throttle_delay_in_ms: 0
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
authority: org.apache.cassandra.auth.AllowAllAuthority
partitioner: org.apache.cassandra.dht.RandomPartitioner

data_file_directories:
     - /u/cassandra/data

commitlog_directory: /var/log/cassandra/commitlog
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
row_cache_provider: SerializingCacheProvider
saved_caches_directory: /var/log/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32

seed_provider:
           # Ex: "<ip1>,<ip2>,<ip3>"
           - seeds: "10.0.0.3,10.0.0.4"

flush_largest_memtables_at: 1.0
reduce_cache_sizes_at: 1.0
reduce_cache_capacity_to: 0.6
concurrent_reads: 8
concurrent_writes: 32
memtable_flush_queue_size: 4
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: 10.0.0.3
rpc_address: 10.0.0.3
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
rpc_min_threads: 16
rpc_max_threads: 2147483647
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 256
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: true
rpc_timeout_in_ms: 15000
phi_convict_threshold: 8
endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.0
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
encryption_options:
     internode_encryption: none
     keystore: conf/.keystore
     keystore_password: cassandra
     truststore: conf/.truststore
     truststore_password: cassandra
============================================================

Please help me to fix this issue permanently and smooth running of  
Cassandra nodes.

Regards,

Adeel Akbar

Re: Cassandra services down frequently [Version 1.1.4]

Posted by aaron morton <aa...@thelastpickle.com>.

> MAX_HEAP_SIZE="6G"
> HEAP_NEWSIZE="500M"
The new heap feels a little low, I often see 800M as a good number. It depends on the number of cores, but if that's working stick with it. 

> key_cache_size_in_mb: 512
Have you run this at the default and checked the cache hit rate using nodetool info ? The default size would be about 300M. 

> row_cache_size_in_mb: 14336
This is way too high. 
You've told the JVM to lock in 6GB and then told the row cache it can use 14GB, but you only have 16GB on the node. At some point things are going to go crash, bang, wallop. 

Set it to 1GB and check the cache hit rate using nodetool info. 

The remaining memory will be used by the OS to cache disk access. 


> I have a querry, if Cassandra is using JVM for all operations then why we need change above parameters separately in cassandra.yaml.

The JVM params are passed to the JVM before the server starts and have to be formatted a specific way. The yaml file is much easier for humans to read. 

Cheers


-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/04/2013, at 1:16 PM, 金剑 <ji...@gmail.com> wrote:

> It also use off-heap memory out of JVM. SerializingCacheProvider should be one of the case.
> 
> Best Regards!
> 
> Jian Jin
> 
> 
> 2013/4/6 <ad...@panasiangroup.com>
> Thank you Aaron and Bryan for your advice.
> 
> I have changed following parameters and now Cassandra running absolutely fine. Please review below setting and advice am I right or right direction.
> 
> cassandra-env.sh
> #JVM_OPTS="$JVM_OPTS -ea"
> MAX_HEAP_SIZE="6G"
> HEAP_NEWSIZE="500M"
> 
>  cassandra.yaml
> # do not persist caches to disk
> key_cache_save_period: 0
> row_cache_save_period: 0
> 
> key_cache_size_in_mb: 512
> row_cache_size_in_mb: 14336
> row_cache_provider: SerializingCacheProvider
> 
> I have a querry, if Cassandra is using JVM for all operations then why we need change above parameters separately in cassandra.yaml.
> 
> 
> Thanks & Regards
> 
> Adeel Akbar
> 
> 
> Quoting aaron morton <aa...@thelastpickle.com>:
> 
> We can see from below that you've tweaked and disabled many of the  memory "safety valve" and other memory related settings.
> Agree.
> Also you are running with JVM heap size of 3.81GB which is non  default. For a 16GB node I would expect 8GB.
> 
> Try restoring the yaml values to the defaults and allowing the  cassandra-env.sh file to determine the memory size.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 5/04/2013, at 12:36 PM, Bryan Talbot <bt...@aeriagames.com> wrote:
> 
> On Thu, Apr 4, 2013 at 1:27 AM, <ad...@panasiangroup.com> wrote:
> 
> After some time (1 hour / 2 hour) cassandra shut services on one or  two nodes with follwoing errors;
> 
> 
> Wonder what the workload and schema is like ...
> 
> We can see from below that you've tweaked and disabled many of the  memory "safety valve" and other memory related settings.  Those  could be causing issues too.
> 
> 
> hinted_handoff_throttle_delay_in_ms: 0
> flush_largest_memtables_at: 1.0
> reduce_cache_sizes_at: 1.0
> reduce_cache_capacity_to: 0.6
> rpc_keepalive: true
> rpc_server_type: sync
> rpc_min_threads: 16
> rpc_max_threads: 2147483647
> in_memory_compaction_limit_in_mb: 256
> compaction_throughput_mb_per_sec: 16
> rpc_timeout_in_ms: 15000
> dynamic_snitch_badness_threshold: 0.0
> 
> 
> 
>

Re: Cassandra services down frequently [Version 1.1.4]

Posted by 金剑 <ji...@gmail.com>.

It also use off-heap memory out of JVM. SerializingCacheProvider should be
one of the case.

Best Regards!

Jian Jin


2013/4/6 <ad...@panasiangroup.com>

> Thank you Aaron and Bryan for your advice.
>
> I have changed following parameters and now Cassandra running absolutely
> fine. Please review below setting and advice am I right or right direction.
>
> cassandra-env.sh
> #JVM_OPTS="$JVM_OPTS -ea"
> MAX_HEAP_SIZE="6G"
> HEAP_NEWSIZE="500M"
>
>  cassandra.yaml
> # do not persist caches to disk
> key_cache_save_period: 0
> row_cache_save_period: 0
>
> key_cache_size_in_mb: 512
> row_cache_size_in_mb: 14336
> row_cache_provider: SerializingCacheProvider
>
> I have a querry, if Cassandra is using JVM for all operations then why we
> need change above parameters separately in cassandra.yaml.
>
>
> Thanks & Regards
>
> Adeel Akbar
>
>
> Quoting aaron morton <aa...@thelastpickle.com>:
>
>  We can see from below that you've tweaked and disabled many of the
>>>  memory "safety valve" and other memory related settings.
>>>
>> Agree.
>> Also you are running with JVM heap size of 3.81GB which is non  default.
>> For a 16GB node I would expect 8GB.
>>
>> Try restoring the yaml values to the defaults and allowing the
>>  cassandra-env.sh file to determine the memory size.
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 5/04/2013, at 12:36 PM, Bryan Talbot <bt...@aeriagames.com> wrote:
>>
>>  On Thu, Apr 4, 2013 at 1:27 AM, <adeel.akbar@panasiangroup.com**> wrote:
>>>
>>> After some time (1 hour / 2 hour) cassandra shut services on one or  two
>>> nodes with follwoing errors;
>>>
>>>
>>> Wonder what the workload and schema is like ...
>>>
>>> We can see from below that you've tweaked and disabled many of the
>>>  memory "safety valve" and other memory related settings.  Those  could be
>>> causing issues too.
>>>
>>>
>>> hinted_handoff_throttle_delay_**in_ms: 0
>>> flush_largest_memtables_at: 1.0
>>> reduce_cache_sizes_at: 1.0
>>> reduce_cache_capacity_to: 0.6
>>> rpc_keepalive: true
>>> rpc_server_type: sync
>>> rpc_min_threads: 16
>>> rpc_max_threads: 2147483647
>>> in_memory_compaction_limit_in_**mb: 256
>>> compaction_throughput_mb_per_**sec: 16
>>> rpc_timeout_in_ms: 15000
>>> dynamic_snitch_badness_**threshold: 0.0
>>>
>>
>>
>>
>

Re: Cassandra services down frequently [Version 1.1.4]

Posted by ad...@panasiangroup.com.

Thank you Aaron and Bryan for your advice.

I have changed following parameters and now Cassandra running  
absolutely fine. Please review below setting and advice am I right or  
right direction.

cassandra-env.sh
#JVM_OPTS="$JVM_OPTS -ea"
MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="500M"

  cassandra.yaml
# do not persist caches to disk
key_cache_save_period: 0
row_cache_save_period: 0

key_cache_size_in_mb: 512
row_cache_size_in_mb: 14336
row_cache_provider: SerializingCacheProvider

I have a querry, if Cassandra is using JVM for all operations then why  
we need change above parameters separately in cassandra.yaml.


Thanks & Regards

Adeel Akbar

Quoting aaron morton <aa...@thelastpickle.com>:

>> We can see from below that you've tweaked and disabled many of the   
>> memory "safety valve" and other memory related settings.
> Agree.
> Also you are running with JVM heap size of 3.81GB which is non   
> default. For a 16GB node I would expect 8GB.
>
> Try restoring the yaml values to the defaults and allowing the   
> cassandra-env.sh file to determine the memory size.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/04/2013, at 12:36 PM, Bryan Talbot <bt...@aeriagames.com> wrote:
>
>> On Thu, Apr 4, 2013 at 1:27 AM, <ad...@panasiangroup.com> wrote:
>>
>> After some time (1 hour / 2 hour) cassandra shut services on one or  
>>  two nodes with follwoing errors;
>>
>>
>> Wonder what the workload and schema is like ...
>>
>> We can see from below that you've tweaked and disabled many of the   
>> memory "safety valve" and other memory related settings.  Those   
>> could be causing issues too.
>>
>>
>> hinted_handoff_throttle_delay_in_ms: 0
>> flush_largest_memtables_at: 1.0
>> reduce_cache_sizes_at: 1.0
>> reduce_cache_capacity_to: 0.6
>> rpc_keepalive: true
>> rpc_server_type: sync
>> rpc_min_threads: 16
>> rpc_max_threads: 2147483647
>> in_memory_compaction_limit_in_mb: 256
>> compaction_throughput_mb_per_sec: 16
>> rpc_timeout_in_ms: 15000
>> dynamic_snitch_badness_threshold: 0.0
>
>

Re: Cassandra services down frequently [Version 1.1.4]

Posted by aaron morton <aa...@thelastpickle.com>.

> We can see from below that you've tweaked and disabled many of the memory "safety valve" and other memory related settings. 
Agree. 
Also you are running with JVM heap size of 3.81GB which is non default. For a 16GB node I would expect 8GB. 

Try restoring the yaml values to the defaults and allowing the cassandra-env.sh file to determine the memory size. 

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/04/2013, at 12:36 PM, Bryan Talbot <bt...@aeriagames.com> wrote:

> On Thu, Apr 4, 2013 at 1:27 AM, <ad...@panasiangroup.com> wrote:
> 
> After some time (1 hour / 2 hour) cassandra shut services on one or two nodes with follwoing errors;
> 
> 
> Wonder what the workload and schema is like ...
> 
> We can see from below that you've tweaked and disabled many of the memory "safety valve" and other memory related settings.  Those could be causing issues too.
> 
>  
> hinted_handoff_throttle_delay_in_ms: 0
> flush_largest_memtables_at: 1.0
> reduce_cache_sizes_at: 1.0
> reduce_cache_capacity_to: 0.6
> rpc_keepalive: true
> rpc_server_type: sync
> rpc_min_threads: 16
> rpc_max_threads: 2147483647
> in_memory_compaction_limit_in_mb: 256
> compaction_throughput_mb_per_sec: 16
> rpc_timeout_in_ms: 15000
> dynamic_snitch_badness_threshold: 0.0

Re: Cassandra services down frequently [Version 1.1.4]

Posted by Bryan Talbot <bt...@aeriagames.com>.

On Thu, Apr 4, 2013 at 1:27 AM, <ad...@panasiangroup.com> wrote:

>
> After some time (1 hour / 2 hour) cassandra shut services on one or two
> nodes with follwoing errors;
>

Wonder what the workload and schema is like ...

We can see from below that you've tweaked and disabled many of the memory
"safety valve" and other memory related settings.  Those could be causing
issues too.

> hinted_handoff_throttle_delay_**in_ms: 0
> flush_largest_memtables_at: 1.0
> reduce_cache_sizes_at: 1.0
> reduce_cache_capacity_to: 0.6
> rpc_keepalive: true
> rpc_server_type: sync
> rpc_min_threads: 16
> rpc_max_threads: 2147483647
> in_memory_compaction_limit_in_**mb: 256
> compaction_throughput_mb_per_**sec: 16
> rpc_timeout_in_ms: 15000
> dynamic_snitch_badness_**threshold: 0.0
>