You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sam Tunnicliffe (JIRA)" <ji...@apache.org> on 2015/05/19 19:25:01 UTC
[jira] [Assigned] (CASSANDRA-9129) HintedHandoff in pending state
forever after upgrading to 2.0.14 from 2.0.11 and 2.0.12
[ https://issues.apache.org/jira/browse/CASSANDRA-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Tunnicliffe reassigned CASSANDRA-9129:
------------------------------------------
Assignee: Sam Tunnicliffe (was: Aleksey Yeschenko)
> HintedHandoff in pending state forever after upgrading to 2.0.14 from 2.0.11 and 2.0.12
> ---------------------------------------------------------------------------------------
>
> Key: CASSANDRA-9129
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9129
> Project: Cassandra
> Issue Type: Bug
> Environment: Ubuntu 12.04.5 LTS
> AWS (m3.xlarge)
> 15G RAM
> 4 core Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
> Cassandra 2.0.14
> Reporter: Russ Lavoie
> Assignee: Sam Tunnicliffe
> Fix For: 2.0.x
>
>
> Upgrading from Cassandra 2.0.11 or 2.0.12 to 2.0.14 I am seeing a pending hinted hand off that never clears. New hinted hand offs that go into pending waiting for a node to come up clear as expected. But 1 always remains.
> I through the following steps.
> 1) stop cassandra
> 2) Upgrade cassandra to 2.0.14
> 3) Start cassandra
> 4) nodetool tpstats
> There are no errors in the logs, to help with this issue. I ran a few nodetool commands to get some data and pasted them below:
> Below is what is shown after running nodetool status on each node in the ring
> {code}Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID Rack
> UN <NODE1> 279.8 MB 256 34.9% <HOSTID> rack1
> UN <NODE2> 279.79 MB 256 33.0% <HOSTID> rack1
> UN <NODE3> 279.87 MB 256 32.1% <HOSTID> rack1
> {code}
> Below is what is shown after running nodetool tpstats on each node in the ring showing a single HintedHandoff in pending status that never clears
> {code}
> Pool Name Active Pending Completed Blocked All time blocked
> ReadStage 0 0 14550 0 0
> RequestResponseStage 0 0 113040 0 0
> MutationStage 0 0 168873 0 0
> ReadRepairStage 0 0 1147 0 0
> ReplicateOnWriteStage 0 0 0 0 0
> GossipStage 0 0 232112 0 0
> CacheCleanupExecutor 0 0 0 0 0
> MigrationStage 0 0 0 0 0
> MemoryMeter 0 0 6 0 0
> FlushWriter 0 0 38 0 0
> ValidationExecutor 0 0 0 0 0
> InternalResponseStage 0 0 0 0 0
> AntiEntropyStage 0 0 0 0 0
> MemtablePostFlusher 0 0 1333 0 0
> MiscStage 0 0 0 0 0
> PendingRangeCalculator 0 0 6 0 0
> CompactionExecutor 0 0 178 0 0
> commitlog_archiver 0 0 0 0 0
> HintedHandoff 0 1 133 0 0
> Message type Dropped
> RANGE_SLICE 0
> READ_REPAIR 0
> PAGED_RANGE 0
> BINARY 0
> READ 0
> MUTATION 0
> _TRACE 0
> REQUEST_RESPONSE 0
> COUNTER_MUTATION 0
> {code}
> Below is what is shown after running nodetool cfstats system.hints on all 3 nodes.
> {code}
> Keyspace: system
> Read Count: 0
> Read Latency: NaN ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Tasks: 0
> Table: hints
> SSTable count: 0
> Space used (live), bytes: 0
> Space used (total), bytes: 0
> Off heap memory used (total), bytes: 0
> SSTable Compression Ratio: 0.0
> Number of keys (estimate): 0
> Memtable cell count: 0
> Memtable data size, bytes: 0
> Memtable switch count: 0
> Local read count: 0
> Local read latency: 0.000 ms
> Local write count: 0
> Local write latency: 0.000 ms
> Pending tasks: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.00000
> Bloom filter space used, bytes: 0
> Bloom filter off heap memory used, bytes: 0
> Index summary off heap memory used, bytes: 0
> Compression metadata off heap memory used, bytes: 0
> Compacted partition minimum bytes: 0
> Compacted partition maximum bytes: 0
> Compacted partition mean bytes: 0
> Average live cells per slice (last five minutes): 0.0
> Average tombstones per slice (last five minutes): 0.0
> ----------------
> {code}
> Below is what is shown after running nodetool gossipinfo
> {code}
> /<NODE1>
> generation:1428349617
> heartbeat:238170
> HOST_ID:<NODE1ID>
> RELEASE_VERSION:2.0.14
> DC:<DCNAME>
> RPC_ADDRESS:<NODE1IP>
> SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
> STATUS:NORMAL,-1399780091502863826
> RACK:rack1
> SEVERITY:0.0
> LOAD:2.93383711E8
> NET_VERSION:7
> /<NODE2>
> generation:1428349784
> heartbeat:237665
> HOST_ID:<NODE2ID>
> RELEASE_VERSION:2.0.14
> DC:app3-profiledata
> RPC_ADDRESS:<NODE2>
> SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
> STATUS:NORMAL,-1019261967377984057
> RACK:rack1
> SEVERITY:0.0
> LOAD:2.93393487E8
> NET_VERSION:7
> /<NODE3>
> generation:1428348889
> heartbeat:240384
> HOST_ID:<NODE3ID>
> RELEASE_VERSION:2.0.14
> DC:app3-profiledata
> RPC_ADDRESS:<NODE3IP>
> SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
> STATUS:NORMAL,-1060333141359417961
> RACK:rack1
> SEVERITY:0.0
> LOAD:2.9345286E8
> NET_VERSION:7
> {code}
>
>
> Below is cassandra.yaml
> {code}
> cluster_name: '<Cluster Name>'
> num_tokens: 256
> auto_bootstrap: true
> hinted_handoff_enabled: true
> max_hint_window_in_ms: 345600000
> hinted_handoff_throttle_in_kb: 1024
> max_hints_delivery_threads: 2
> authenticator: AllowAllAuthenticator
> authorizer: AllowAllAuthorizer
> permissions_validity_in_ms: 2000
> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> data_file_directories:
> - /mnt/cassandra/data
> commitlog_directory: /mnt/cassandra/commitlog
> disk_failure_policy: stop
> key_cache_size_in_mb:
> key_cache_save_period: 14400
> row_cache_size_in_mb: 0
> row_cache_save_period: 0
> saved_caches_directory: /mnt/cassandra/saved_caches
> commitlog_sync: batch
> commitlog_sync_batch_window_in_ms: 50
> commitlog_segment_size_in_mb: 32
> seed_provider:
> - class_name: org.apache.cassandra.locator.SimpleSeedProvider
> parameters:
> - seeds: "<NODE1>,<NODE2>,<NODE3>"
> concurrent_reads: 32
> concurrent_writes: 32
> memtable_total_space_in_mb: 512
> memtable_flush_queue_size: 4
> trickle_fsync: false
> trickle_fsync_interval_in_kb: 10240
> storage_port: 7000
> ssl_storage_port: 7001
> listen_address: <LOCALIP>
> start_native_transport: true
> native_transport_port: 9042
> start_rpc: true
> rpc_address: <LOCALIP>
> rpc_port: 9160
> rpc_keepalive: true
> rpc_server_type: hsha
> rpc_min_threads: 16
> rpc_max_threads: 256
> thrift_framed_transport_size_in_mb: 15
> incremental_backups: false
> snapshot_before_compaction: false
> auto_snapshot: true
> column_index_size_in_kb: 64
> in_memory_compaction_limit_in_mb: 64
> multithreaded_compaction: false
> compaction_throughput_mb_per_sec: 128
> compaction_preheat_key_cache: true
> read_request_timeout_in_ms: 10000
> range_request_timeout_in_ms: 10000
> write_request_timeout_in_ms: 10000
> truncate_request_timeout_in_ms: 60000
> request_timeout_in_ms: 10000
> cross_node_timeout: false
> phi_convict_threshold: 12
> endpoint_snitch: PropertyFileSnitch
> dynamic_snitch_update_interval_in_ms: 100
> dynamic_snitch_reset_interval_in_ms: 600000
> dynamic_snitch_badness_threshold: 0.2
> request_scheduler: org.apache.cassandra.scheduler.NoScheduler
> index_interval: 512
> server_encryption_options:
> internode_encryption: none
> keystore: conf/.keystore
> keystore_password: cassandra
> truststore: conf/.truststore
> truststore_password: cassandra
> client_encryption_options:
> enabled: false
> keystore: conf/.keystore
> keystore_password: cassandra
> internode_compression: all
> inter_dc_tcp_nodelay: true
> {code}
> I have stopped upgrading my other cassandra clusters until cause for this behavior is found.
> Please let me know if more information is needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)