You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Peter Haggerty (JIRA)" <ji...@apache.org> on 2012/12/14 00:54:12 UTC

[jira] [Updated] (CASSANDRA-5068) CLONE - Once a host has been hinted to, log messages for it repeat every 10 mins even if no hints are delivered

     [ https://issues.apache.org/jira/browse/CASSANDRA-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Haggerty updated CASSANDRA-5068:
--------------------------------------

        Fix Version/s:     (was: 0.8.10)
                           (was: 1.0.7)
             Reviewer:   (was: jbellis)
               Labels: hinted hintedhandoff phantom  (was: )
          Description: 
We have "0 row" hinted handoffs every 10 minutes like clockwork. This impacts our ability to monitor the cluster by adding persistent noise in the handoff metric.

Previous mentions of this issue are here:
http://www.mail-archive.com/user@cassandra.apache.org/msg25982.html

The hinted handoffs can be scrubbed away with
nodetool -h 127.0.0.1 scrub system HintsColumnFamily
but they return after anywhere from a few minutes to multiple hours later.

These started to appear after an upgrade to 1.1.6 and haven't gone away despite rolling cleanups, rolling restarts, multiple rounds of scrubbing, etc.

A few things we've noticed about the handoffs:
1. The phantom handoff endpoint changes after a non-zero handoff comes through

2. Sometimes a non-zero handoff will be immediately followed by an "off schedule" phantom handoff to the endpoint the phantom had been using before

3. The sstable2json output seems to include multiple sub-sections for each handoff with the same "deletedAt" information.



The phantom handoff endpoint changes after a non-zero handoff comes through:
 INFO [HintedHandoff:1] 2012-12-11 06:57:35,093 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.1
 INFO [HintedHandoff:1] 2012-12-11 07:07:35,092 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.1
 INFO [HintedHandoff:1] 2012-12-11 07:07:37,915 HintedHandOffManager.java (line 392) Finished hinted handoff of 1058 rows to endpoint /10.10.10.2
 INFO [HintedHandoff:1] 2012-12-11 07:17:35,093 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.2
 INFO [HintedHandoff:1] 2012-12-11 07:27:35,093 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.2



Sometimes a non-zero handoff will be immediately followed by an "off schedule" phantom handoff to the endpoint the phantom had been using before:
 INFO [HintedHandoff:1] 2012-12-12 21:47:39,335 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
 INFO [HintedHandoff:1] 2012-12-12 21:57:39,335 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
 INFO [HintedHandoff:1] 2012-12-12 22:07:43,319 HintedHandOffManager.java (line 392) Finished hinted handoff of 1416 rows to endpoint /10.10.10.4
 INFO [HintedHandoff:1] 2012-12-12 22:07:43,320 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
 INFO [HintedHandoff:1] 2012-12-12 22:17:39,357 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.4
 INFO [HintedHandoff:1] 2012-12-12 22:27:39,337 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.4



The first few entries from one of the json files:
{
    "0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa": {
        "ccf5dc203a2211e20000e154da71a9bb": {
            "deletedAt": -9223372036854775808, 
            "subColumns": []
        }, 
        "ccf603303a2211e20000e154da71a9bb": {
            "deletedAt": -9223372036854775808, 
            "subColumns": []
        }, 


  was:
{noformat}
 INFO 15:36:03,977 Started hinted handoff for token: 170141183460469231731687303715884105726 with IP: /10.179.111.137
 INFO 15:36:03,978 Finished hinted handoff of 0 rows to endpoint /10.179.111.137
 INFO 15:46:31,248 Started hinted handoff for token: 170141183460469231731687303715884105726 with IP: /10.179.111.137
 INFO 15:46:31,249 Finished hinted handoff of 0 rows to endpoint /10.179.111.137
 INFO 15:56:29,448 Started hinted handoff for token: 170141183460469231731687303715884105726 with IP: /10.179.111.137
 INFO 15:56:29,449 Finished hinted handoff of 0 rows to endpoint /10.179.111.137
 INFO 16:06:09,949 Started hinted handoff for token: 170141183460469231731687303715884105726 with IP: /10.179.111.137
 INFO 16:06:09,950 Finished hinted handoff of 0 rows to endpoint /10.179.111.137
 INFO 16:16:21,529 Started hinted handoff for token: 170141183460469231731687303715884105726 with IP: /10.179.111.137
 INFO 16:16:21,530 Finished hinted handoff of 0 rows to endpoint /10.179.111.137
{noformat}

Introduced by CASSANDRA-3554.  The problem is that until a compaction on hints occurs, tombstones are present causing the isEmpty() check to be false.

          Environment: 
cassandra 1.1.6
java 1.6.0_30
    Affects Version/s:     (was: 0.6)
                       1.1.6

Cloning CASSANDRA-3733 as it seems to be the same issue.
                
> CLONE - Once a host has been hinted to, log messages for it repeat every 10 mins even if no hints are delivered
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5068
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5068
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.6
>         Environment: cassandra 1.1.6
> java 1.6.0_30
>            Reporter: Peter Haggerty
>            Assignee: Brandon Williams
>            Priority: Minor
>              Labels: hinted, hintedhandoff, phantom
>
> We have "0 row" hinted handoffs every 10 minutes like clockwork. This impacts our ability to monitor the cluster by adding persistent noise in the handoff metric.
> Previous mentions of this issue are here:
> http://www.mail-archive.com/user@cassandra.apache.org/msg25982.html
> The hinted handoffs can be scrubbed away with
> nodetool -h 127.0.0.1 scrub system HintsColumnFamily
> but they return after anywhere from a few minutes to multiple hours later.
> These started to appear after an upgrade to 1.1.6 and haven't gone away despite rolling cleanups, rolling restarts, multiple rounds of scrubbing, etc.
> A few things we've noticed about the handoffs:
> 1. The phantom handoff endpoint changes after a non-zero handoff comes through
> 2. Sometimes a non-zero handoff will be immediately followed by an "off schedule" phantom handoff to the endpoint the phantom had been using before
> 3. The sstable2json output seems to include multiple sub-sections for each handoff with the same "deletedAt" information.
> The phantom handoff endpoint changes after a non-zero handoff comes through:
>  INFO [HintedHandoff:1] 2012-12-11 06:57:35,093 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.1
>  INFO [HintedHandoff:1] 2012-12-11 07:07:35,092 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.1
>  INFO [HintedHandoff:1] 2012-12-11 07:07:37,915 HintedHandOffManager.java (line 392) Finished hinted handoff of 1058 rows to endpoint /10.10.10.2
>  INFO [HintedHandoff:1] 2012-12-11 07:17:35,093 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.2
>  INFO [HintedHandoff:1] 2012-12-11 07:27:35,093 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.2
> Sometimes a non-zero handoff will be immediately followed by an "off schedule" phantom handoff to the endpoint the phantom had been using before:
>  INFO [HintedHandoff:1] 2012-12-12 21:47:39,335 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
>  INFO [HintedHandoff:1] 2012-12-12 21:57:39,335 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
>  INFO [HintedHandoff:1] 2012-12-12 22:07:43,319 HintedHandOffManager.java (line 392) Finished hinted handoff of 1416 rows to endpoint /10.10.10.4
>  INFO [HintedHandoff:1] 2012-12-12 22:07:43,320 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
>  INFO [HintedHandoff:1] 2012-12-12 22:17:39,357 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.4
>  INFO [HintedHandoff:1] 2012-12-12 22:27:39,337 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.4
> The first few entries from one of the json files:
> {
>     "0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa": {
>         "ccf5dc203a2211e20000e154da71a9bb": {
>             "deletedAt": -9223372036854775808, 
>             "subColumns": []
>         }, 
>         "ccf603303a2211e20000e154da71a9bb": {
>             "deletedAt": -9223372036854775808, 
>             "subColumns": []
>         }, 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira