You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Melvin Wang (JIRA)" <ji...@apache.org> on 2011/07/23 02:53:09 UTC

[jira] [Created] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Expose number of rpc timeouts for individual hosts metric via jmx 
------------------------------------------------------------------

                 Key: CASSANDRA-2941
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Melvin Wang
            Assignee: Melvin Wang
            Priority: Minor


We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Melvin Wang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Melvin Wang updated CASSANDRA-2941:
-----------------------------------

    Attachment:     (was: c2941.patch)

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>         Attachments: c2941-v2.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Melvin Wang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Melvin Wang updated CASSANDRA-2941:
-----------------------------------

    Attachment:     (was: twttr-cassandra-0.8-counts-resync-timeouts-metric.patch)

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071131#comment-13071131 ] 

Jonathan Ellis commented on CASSANDRA-2941:
-------------------------------------------

- does not apply to 0.8 for me
- i don't see anything to prevent dropping timeouts b/c of race in timeoutreporter.apply.  (using NBHM + replace would fix this)
- if you did the recentTimeouts create first, then the timeouts put, you wouldn't have to special case recent == null later

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>         Attachments: twttr-cassandra-0.8-counts-resync-timeouts-metric.diff
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095836#comment-13095836 ] 

Hudson commented on CASSANDRA-2941:
-----------------------------------

Integrated in Cassandra-0.8 #310 (See [https://builds.apache.org/job/Cassandra-0.8/310/])
    Fix typo introduced by CASSANDRA-2941

slebresne : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1164068
Files : 
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/MessagingService.java


> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>             Fix For: 0.8.5
>
>         Attachments: c2941-v2.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078397#comment-13078397 ] 

Jonathan Ellis commented on CASSANDRA-2941:
-------------------------------------------

bq. Although get**** is called from multi threaded, only 'read' operations of hashmap is performed so we don't need 'lock' here.

Actually, we still need to establish a happens-before for the read, or we have no guarantees that the JMX thread will ever see the updates made by the timeout reporter.  So we could either use a Map of AtomicLong or a ConcurrentMap of Long.

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>         Attachments: c2941.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095288#comment-13095288 ] 

Sylvain Lebresne commented on CASSANDRA-2941:
---------------------------------------------

for the records, this patch had the following lines:
{noformat}
        AtomicLong c = timeoutsPerHost.get(ip);
        if (c == null)
            c = timeoutsPerHost.put(ip, new AtomicLong());
        c.incrementAndGet();
{noformat}
which are a guaranteed NPE.
I've fixed that directly though (in r1164068).

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>             Fix For: 0.8.5
>
>         Attachments: c2941-v2.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Melvin Wang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Melvin Wang updated CASSANDRA-2941:
-----------------------------------

    Attachment: twttr-cassandra-0.8-counts-resync-timeouts-metric.patch

rebased my patch.

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>         Attachments: twttr-cassandra-0.8-counts-resync-timeouts-metric.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073387#comment-13073387 ] 

Jonathan Ellis commented on CASSANDRA-2941:
-------------------------------------------

bq. timeoutreporter.apply is only called in one thread, right?

you're right, that should be fine.

bq. rebased my patch.

What did you rebase against?  There have been no commits in the meantime but it does not apply to 0.8 head.

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>         Attachments: twttr-cassandra-0.8-counts-resync-timeouts-metric.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Melvin Wang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Melvin Wang updated CASSANDRA-2941:
-----------------------------------

    Attachment: c2941-v2.patch

change timeoutsPerHost to use AtomicLong.
re-structure the code of getRecentTimeoutPerHost()

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>         Attachments: c2941-v2.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078473#comment-13078473 ] 

Jonathan Ellis commented on CASSANDRA-2941:
-------------------------------------------

I'm talking about timeoutsPerHost, not recentTimeoutsPerHost.  since tPH is a plain HashMap there is no happens-before relationship between the updates and the reads.

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>         Attachments: c2941.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Melvin Wang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Melvin Wang updated CASSANDRA-2941:
-----------------------------------

    Attachment: twttr-cassandra-0.8-counts-resync-timeouts-metric.diff

expose the number of timeouts per host.
expose the delta of this metric.


> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>         Attachments: twttr-cassandra-0.8-counts-resync-timeouts-metric.diff
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Melvin Wang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Melvin Wang updated CASSANDRA-2941:
-----------------------------------

    Attachment: c2941.patch

Sorry for the confusion. This patch worked against the current trunk.

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>         Attachments: c2941.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Melvin Wang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095534#comment-13095534 ] 

Melvin Wang commented on CASSANDRA-2941:
----------------------------------------

Ah, my bad. Thinking of python's dictionary while I did this :) Sorry.

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>             Fix For: 0.8.5
>
>         Attachments: c2941-v2.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Melvin Wang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078469#comment-13078469 ] 

Melvin Wang commented on CASSANDRA-2941:
----------------------------------------

I am not clear why we have no guarantees that the JMX thread will ever see the updates made by the timeout reporter. The current structure is that, if there is a timeout happened, apply() will be called for it and if this is the first time for a certain IP address, an atomicLong will be created for it. Since this is the first time for this IP address to time out, it is natural not to see its updates before. From then on, it will get updated whenever JMX threads call getRecent***(). Maybe I miss something here?

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>         Attachments: c2941.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Melvin Wang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Melvin Wang updated CASSANDRA-2941:
-----------------------------------

    Attachment:     (was: twttr-cassandra-0.8-counts-resync-timeouts-metric.diff)

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Melvin Wang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073325#comment-13073325 ] 

Melvin Wang commented on CASSANDRA-2941:
----------------------------------------

patch are old, need to rebase. i'll do it.

timeoutreporter.apply is only called in one thread, right? In expireMap, a timerTask will be created to monitor the cache, yes/no ?

If the previous is true, the reason for me to do it this way is that I only do 'write' operation to the hashmap in one thread so that we will not corrupt the data structure. Although get**** is called from multi threaded, only 'read' operations of hashmap is performed so we don't need 'lock' here. I think this is the reason I try not to create an atomicLong and insert into hashmap.

> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2941) Expose number of rpc timeouts for individual hosts metric via jmx

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089081#comment-13089081 ] 

Hudson commented on CASSANDRA-2941:
-----------------------------------

Integrated in Cassandra-0.8 #289 (See [https://builds.apache.org/job/Cassandra-0.8/289/])
    expose rpc timeouts per host in MessagingServiceMBean
patch by Melvin Wang; reviewed by jbellis for CASSANDRA-2941

jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1160449
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/MessagingServiceMBean.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/MessagingService.java


> Expose number of rpc timeouts for individual hosts metric via jmx 
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2941
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>             Fix For: 0.8.5
>
>         Attachments: c2941-v2.patch
>
>
> We have a total number timeouts for each node. It's better for monitoring to break down this total number into number of timeouts per host that this node tried to connect to.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira