You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sam Tunnicliffe (JIRA)" <ji...@apache.org> on 2015/05/19 19:25:01 UTC

[jira] [Assigned] (CASSANDRA-9129) HintedHandoff in pending state forever after upgrading to 2.0.14 from 2.0.11 and 2.0.12

     [ https://issues.apache.org/jira/browse/CASSANDRA-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sam Tunnicliffe reassigned CASSANDRA-9129:
------------------------------------------

    Assignee: Sam Tunnicliffe  (was: Aleksey Yeschenko)

> HintedHandoff in pending state forever after upgrading to 2.0.14 from 2.0.11 and 2.0.12
> ---------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9129
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9129
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Ubuntu 12.04.5 LTS
> AWS (m3.xlarge)
> 15G RAM
> 4 core Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
> Cassandra 2.0.14
>            Reporter: Russ Lavoie
>            Assignee: Sam Tunnicliffe
>             Fix For: 2.0.x
>
>
> Upgrading from Cassandra 2.0.11 or 2.0.12 to 2.0.14 I am seeing a pending hinted hand off that never clears.  New hinted hand offs that go into pending waiting for a node to come up clear as expected.  But 1 always remains.
> I through the following steps.
> 1) stop cassandra
> 2) Upgrade cassandra to 2.0.14
> 3) Start cassandra
> 4) nodetool tpstats
> There are no errors in the logs, to help with this issue.  I ran a few nodetool commands to get some data and pasted them below:
> Below is what is shown after running nodetool status on each node in the ring
> {code}Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address       Load       Tokens  Owns   Host ID   Rack
> UN  <NODE1>  279.8 MB   256     34.9%  <HOSTID>       rack1
> UN  <NODE2>  279.79 MB  256     33.0%  <HOSTID>       rack1
> UN  <NODE3>  279.87 MB  256     32.1%  <HOSTID>       rack1
> {code}
> Below is what is shown after running nodetool tpstats on each node in the ring showing a single HintedHandoff in pending status that never clears
> {code}
> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
> ReadStage                         0         0          14550         0                 0
> RequestResponseStage              0         0         113040         0                 0
> MutationStage                     0         0         168873         0                 0
> ReadRepairStage                   0         0           1147         0                 0
> ReplicateOnWriteStage             0         0              0         0                 0
> GossipStage                       0         0         232112         0                 0
> CacheCleanupExecutor              0         0              0         0                 0
> MigrationStage                    0         0              0         0                 0
> MemoryMeter                       0         0              6         0                 0
> FlushWriter                       0         0             38         0                 0
> ValidationExecutor                0         0              0         0                 0
> InternalResponseStage             0         0              0         0                 0
> AntiEntropyStage                  0         0              0         0                 0
> MemtablePostFlusher               0         0           1333         0                 0
> MiscStage                         0         0              0         0                 0
> PendingRangeCalculator            0         0              6         0                 0
> CompactionExecutor                0         0            178         0                 0
> commitlog_archiver                0         0              0         0                 0
> HintedHandoff                     0         1            133         0                 0
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                  0
> PAGED_RANGE                  0
> BINARY                       0
> READ                         0
> MUTATION                     0
> _TRACE                       0
> REQUEST_RESPONSE             0
> COUNTER_MUTATION             0
> {code}
> Below is what is shown after running nodetool cfstats system.hints on all 3 nodes.
> {code}
> Keyspace: system
> 	Read Count: 0
> 	Read Latency: NaN ms.
> 	Write Count: 0
> 	Write Latency: NaN ms.
> 	Pending Tasks: 0
> 		Table: hints
> 		SSTable count: 0
> 		Space used (live), bytes: 0
> 		Space used (total), bytes: 0
> 		Off heap memory used (total), bytes: 0
> 		SSTable Compression Ratio: 0.0
> 		Number of keys (estimate): 0
> 		Memtable cell count: 0
> 		Memtable data size, bytes: 0
> 		Memtable switch count: 0
> 		Local read count: 0
> 		Local read latency: 0.000 ms
> 		Local write count: 0
> 		Local write latency: 0.000 ms
> 		Pending tasks: 0
> 		Bloom filter false positives: 0
> 		Bloom filter false ratio: 0.00000
> 		Bloom filter space used, bytes: 0
> 		Bloom filter off heap memory used, bytes: 0
> 		Index summary off heap memory used, bytes: 0
> 		Compression metadata off heap memory used, bytes: 0
> 		Compacted partition minimum bytes: 0
> 		Compacted partition maximum bytes: 0
> 		Compacted partition mean bytes: 0
> 		Average live cells per slice (last five minutes): 0.0
> 		Average tombstones per slice (last five minutes): 0.0
> ----------------
> {code}
> Below is what is shown after running nodetool gossipinfo
> {code}
> /<NODE1>
>   generation:1428349617
>   heartbeat:238170
>   HOST_ID:<NODE1ID>
>   RELEASE_VERSION:2.0.14
>   DC:<DCNAME>
>   RPC_ADDRESS:<NODE1IP>
>   SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
>   STATUS:NORMAL,-1399780091502863826
>   RACK:rack1
>   SEVERITY:0.0
>   LOAD:2.93383711E8
>   NET_VERSION:7
> /<NODE2>
>   generation:1428349784
>   heartbeat:237665
>   HOST_ID:<NODE2ID>
>   RELEASE_VERSION:2.0.14
>   DC:app3-profiledata
>   RPC_ADDRESS:<NODE2>
>   SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
>   STATUS:NORMAL,-1019261967377984057
>   RACK:rack1
>   SEVERITY:0.0
>   LOAD:2.93393487E8
>   NET_VERSION:7
> /<NODE3>
>   generation:1428348889
>   heartbeat:240384
>   HOST_ID:<NODE3ID>
>   RELEASE_VERSION:2.0.14
>   DC:app3-profiledata
>   RPC_ADDRESS:<NODE3IP>
>   SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
>   STATUS:NORMAL,-1060333141359417961
>   RACK:rack1
>   SEVERITY:0.0
>   LOAD:2.9345286E8
>   NET_VERSION:7
> {code}
>   
>   
> Below is cassandra.yaml
> {code}
> cluster_name: '<Cluster Name>'
> num_tokens: 256
> auto_bootstrap: true
> hinted_handoff_enabled: true
> max_hint_window_in_ms: 345600000
> hinted_handoff_throttle_in_kb: 1024
> max_hints_delivery_threads: 2
> authenticator: AllowAllAuthenticator
> authorizer: AllowAllAuthorizer
> permissions_validity_in_ms: 2000
> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> data_file_directories:
>     - /mnt/cassandra/data
> commitlog_directory: /mnt/cassandra/commitlog
> disk_failure_policy: stop
> key_cache_size_in_mb:
> key_cache_save_period: 14400
> row_cache_size_in_mb: 0
> row_cache_save_period: 0
> saved_caches_directory: /mnt/cassandra/saved_caches
> commitlog_sync: batch
> commitlog_sync_batch_window_in_ms: 50
> commitlog_segment_size_in_mb: 32
> seed_provider:
>     - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>       parameters:
>           - seeds: "<NODE1>,<NODE2>,<NODE3>"
> concurrent_reads: 32
> concurrent_writes: 32
> memtable_total_space_in_mb: 512
> memtable_flush_queue_size: 4
> trickle_fsync: false
> trickle_fsync_interval_in_kb: 10240
> storage_port: 7000
> ssl_storage_port: 7001
> listen_address: <LOCALIP>
> start_native_transport: true
> native_transport_port: 9042
> start_rpc: true
> rpc_address: <LOCALIP>
> rpc_port: 9160
> rpc_keepalive: true
> rpc_server_type: hsha
> rpc_min_threads: 16
> rpc_max_threads: 256
> thrift_framed_transport_size_in_mb: 15
> incremental_backups: false
> snapshot_before_compaction: false
> auto_snapshot: true
> column_index_size_in_kb: 64
> in_memory_compaction_limit_in_mb: 64
> multithreaded_compaction: false
> compaction_throughput_mb_per_sec: 128
> compaction_preheat_key_cache: true
> read_request_timeout_in_ms: 10000
> range_request_timeout_in_ms: 10000
> write_request_timeout_in_ms: 10000
> truncate_request_timeout_in_ms: 60000
> request_timeout_in_ms: 10000
> cross_node_timeout: false
> phi_convict_threshold: 12
> endpoint_snitch: PropertyFileSnitch
> dynamic_snitch_update_interval_in_ms: 100
> dynamic_snitch_reset_interval_in_ms: 600000
> dynamic_snitch_badness_threshold: 0.2
> request_scheduler: org.apache.cassandra.scheduler.NoScheduler
> index_interval: 512
> server_encryption_options:
>     internode_encryption: none
>     keystore: conf/.keystore
>     keystore_password: cassandra
>     truststore: conf/.truststore
>     truststore_password: cassandra
> client_encryption_options:
>     enabled: false
>     keystore: conf/.keystore
>     keystore_password: cassandra
> internode_compression: all
> inter_dc_tcp_nodelay: true
> {code}
> I have stopped upgrading my other cassandra clusters until cause for this behavior is found.
> Please let me know if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)