You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2014/02/22 01:14:20 UTC

[jira] [Comment Edited] (CASSANDRA-6753) Cassandra2.1~beta1 Stall at Boot

    [ https://issues.apache.org/jira/browse/CASSANDRA-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909024#comment-13909024 ] 

Benedict edited comment on CASSANDRA-6753 at 2/22/14 12:13 AM:
---------------------------------------------------------------

Some obvious questions:
- Do you see any other errors?
- The 0 and -858993472 correspond to the used() and nextClean part of that method, respectively, correct? What about the limit and the cleanThreshold? What do they say?
- This is consistent, every time you start?

This is definitely not normal, and is almost certainly a bug, but it shouldn't ever stop Cassandra from starting. So, I wonder if there is a strange interaction going on with some other problem, which may be easier to track down if we can figure out if there is another such problem.

Could you attach the output from jstacking the process?

The easiest possibility to explain this is that somehow the memtable_cleanup_threshold is negative. We don't actually check this on startup, which is an oversight. The fact that the value for nextClean is exactly \-0.4 * 2Gb has me suspicious - with an 8Gb heap, we would default to a 2Gb limit, and default cleanup_threshold is 0.4. Is it possible you accidentally added a '\-' prefix to the line in the config file? Unlikely, I know, but it would explain it instantly :-)




was (Author: benedict):
Some obvious questions:
- Do you see any other errors?
- The 0 and -858993472 correspond to the used() and nextClean part of that method, respectively, correct? What about the limit and the cleanThreshold? What do they say?
- This is consistent, every time you start?

This is definitely not normal, and is almost certainly a bug, but it shouldn't ever stop Cassandra from starting. So, I wonder if there is a strange interaction going on with some other problem, which may be easier to track down if we can figure out if there is another such problem.

Could you attach the output from jstacking the process?

The easiest possibility to explain this is that somehow the memtable_cleanup_threshold is negative. We don't actually check this on startup, which is an oversight. The fact that the value for nextClean is exactly -0.4 * 2Gb has me suspicious - with an 8Gb heap, we would default to a 2Gb limit, and default cleanup_threshold is 0.4. Is it possible you accidentally added a '-' prefix to the line in the config file? Unlikely, I know, but it would explain it instantly :-)



> Cassandra2.1~beta1 Stall at Boot
> --------------------------------
>
>                 Key: CASSANDRA-6753
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6753
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Distributor ID:	Ubuntu
> Description:	Ubuntu 12.04.3 LTS
> Release:	12.04
> Codename:	precise
> AWS: i2.xlarge
> {code}
> INFO  22:34:40 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0
> INFO  22:34:40 Heap size: 12777553920/12777553920
> INFO  22:34:40 Code Cache Non-heap memory: init = 2555904(2496K) used = 621632(607K) committed = 2555904(2496K) max = 50331648(49152K)
> INFO  22:34:40 Par Eden Space Heap memory: init = 859045888(838912K) used = 137447616(134226K) committed = 859045888(838912K) max = 859045888(838912K)
> INFO  22:34:40 Par Survivor Space Heap memory: init = 107347968(104832K) used = 0(0K) committed = 107347968(104832K) max = 107347968(104832K)
> INFO  22:34:40 CMS Old Gen Heap memory: init = 11811160064(11534336K) used = 1433816(1400K) committed = 11811160064(11534336K) max = 11811160064(11534336K)
> INFO  22:34:40 CMS Perm Gen Non-heap memory: init = 21757952(21248K) used = 18654512(18217K) committed = 21757952(21248K) max = 85983232(83968K)
> INFO  22:34:40 Classpath: /usr/share/cassandra/lib/airline-0.6.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/commons-math3-3.2.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-16.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.6.jar:/usr/share/cassandra/lib/javax.inject.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna-4.0.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/logback-classic-1.0.13.jar:/usr/share/cassandra/lib/logback-core-1.0.13.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/stream-2.5.2.jar:/usr/share/cassandra/lib/thrift-server-0.3.3.jar:/usr/share/cassandra/CustomAgent.jar:/usr/share/cassandra/apache-cassandra-2.1.0~beta1.jar:/usr/share/cassandra/apache-cassandra-thrift-2.1.0~beta1.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/jna.jar:/usr/share/cassandra/mx4j-tools.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar:/usr/share/cassandra/lib/jamm-0.2.6.jar:/usr/share/cassandra/CustomAgent.jar:/usr/local/jcollectd/jcollectd.jar
> {code}
> {code:title=Node configuration}
> [authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_snapshot=true; batchlog_replay_throttle_in_kb=1024; cas_contention_timeout_in_ms=1000; client_encryption_options=<REDACTED>; cluster_name=sketchy_staging_test; column_index_size_in_kb=64; commitlog_directory=/mnt/cassandra/commitlog; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_period_in_ms=10000; compaction_preheat_key_cache=true; compaction_throughput_mb_per_sec=64; concurrent_counter_writes=32; concurrent_reads=128; concurrent_writes=128; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; cross_node_timeout=false; data_file_directories=[/mnt/cassandra/data]; disk_failure_policy=stop; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; endpoint_snitch=SimpleSnitch; flush_directory=/mnt/cassandra/flush; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; in_memory_compaction_limit_in_mb=64; incremental_backups=false; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; inter_dc_tcp_nodelay=false; internode_compression=all; key_cache_save_period=14400; key_cache_size_in_mb=1024; listen_address=10.9.163.158; max_hint_window_in_ms=14400000; max_hints_delivery_threads=2; memtable_cleanup_threshold=0.4; memtable_total_space_in_mb=2048; native_transport_port=9042; num_tokens=256; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_validity_in_ms=2000; preheat_kernel_page_cache=false; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_timeout_in_ms=10000; row_cache_save_period=14400; row_cache_size_in_mb=1024; rpc_address=0.0.0.0; rpc_keepalive=true; rpc_port=9160; rpc_server_type=sync; saved_caches_directory=/mnt/cassandra/cache; seed_provider=[{class_name=org.apache.cassandra.locator.SimpleSeedProvider, parameters=[{seeds=10.71.141.38,10.218.142.35}]}]; server_encryption_options=<REDACTED>; snapshot_before_compaction=false; ssl_storage_port=7001; start_native_transport=true; start_rpc=true; storage_port=7000; thrift_framed_transport_size_in_mb=15; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; trickle_fsync=true; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; write_request_timeout_in_ms=2000]
> {code}
>            Reporter: David Chia
>            Assignee: Benedict
>             Fix For: 2.1 beta2
>
>
> I was trying out the new release for several perf. improvements that I am very interested in. After upgrading my cassandra from 2.0.5 to the beta version, cassandra is stalled while init the column families.
> I might misconfigure something, but it seems it is suck in a loop. I added a couple debug statements, but, on second thought, I think I should just leave it to the experts...
> It's looping in the following over and over:
> {code:title=src/java/org/apache/cassandra/utils/memory/Pool.java#needsCleaning}
> 0 >= -858993472 && true && true
> {code}
> {code:title=Log}
> INFO  [HeapSlabPoolCleaner] 2014-02-21 22:28:40,073 Keyspace.java:77 - java.lang.Thread.getStackTrace(Unknown Source),
> org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:77),
> org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:74),
> com.google.common.collect.Iterators$8.transform(Iterators.java:794),
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48),
> org.apache.cassandra.db.ColumnFamilyStore.all(ColumnFamilyStore.java:2278),
> org.apache.cassandra.db.ColumnFamilyStore$FlushLargestColumnFamily.run(ColumnFamilyStore.java:1043),
> org.apache.cassandra.utils.memory.PoolCleanerThread.run(PoolCleanerThread.java:70)
> {code}
> They may be totally unrelated or a normal behavior. Let me know if there is any other info I should provide.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)