You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Edward Ribeiro (JIRA)" <ji...@apache.org> on 2016/08/02 22:05:20 UTC

[jira] [Updated] (CASSANDRA-11823) Creating a table leads to a race with GraphiteReporter

     [ https://issues.apache.org/jira/browse/CASSANDRA-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Ribeiro updated CASSANDRA-11823:
---------------------------------------
    Attachment: CASSANDRA-11823.patch

Hi [~ostefano] and [~Stefania], 

I took a stab at this issue, and I guess I've found the root cause of the problem. I am providing a patch for cassandra-3.0 branch.

*IMHO*, it looks like when a table is created, the metrics Set for a specific key entry at {{TableMetrics.allTableMetrics}} is updated while the metrics {{Set}} is being iterated to get a summarized value to be passed to {{GraphiteReporter}}, as below, for example:

{code}
            public Long getValue()
            {
                long total = 0;
                for (Metric cfGauge : allTableMetrics.get(name))
                {
                    total = total + ((Gauge<? extends Number>) cfGauge).getValue().longValue();
                }
                return total;
            }
{code}

Even tough {{allTableMetrics}} is a thread-safe {{ConcurrentMap}}, *the {{Set}} iterated in the for-loop above is not!* Oddly enough, the  {{ConcurrentModificationException}} reports the {{Map}} as the offending one instead of the {{Set}} inside the {{Map}} that's effectively being iterated (I guess that is is due to the nature of the for-each loop).

*If this is the case*, the solution is to create a thread-safe {{Set}}.  {{Collections#synchronizedSet}} will not work, but fortunately, we can also  create a thread-safe {{Set}} backed by a {{ConcurrentHashMap}}.
Until Java 8, we could do this as shown here: http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#newSetFromMap%28java.util.Map%29

But as C* uses Java 8 this can be done as here: http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#newKeySet--

Of course, I can be chasing my own tail (would not the first time, lol) and the problem has *nothing* to do with I exposed above, so, please, let me know what you think. :)

> Creating a table leads to a race with GraphiteReporter
> ------------------------------------------------------
>
>                 Key: CASSANDRA-11823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11823
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefano Ortolani
>            Priority: Minor
>              Labels: lhf
>         Attachments: CASSANDRA-11823.patch
>
>
> Happened only on 3/4 nodes out of 13.
> {code:xml}
> INFO  [MigrationStage:1] 2016-05-18 00:34:11,566 ColumnFamilyStore.java:381 - Initializing schema.table
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-18 00:34:11,569 ScheduledReporter.java:119 - RuntimeException thrown from GraphiteReporter#report. Exception was suppressed.
> java.util.ConcurrentModificationException: null
> 	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) ~[na:1.8.0_91]
> 	at java.util.HashMap$KeyIterator.next(HashMap.java:1453) ~[na:1.8.0_91]
> 	at org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:690) ~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686) ~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281) ~[metrics-graphite-3.1.0.jar:3.1.0]
> 	at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158) ~[metrics-graphite-3.1.0.jar:3.1.0]
> 	at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) ~[metrics-core-3.1.0.jar:3.1.0]
> 	at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) ~[metrics-core-3.1.0.jar:3.1.0]
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_91]
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_91]
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_91]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
> 	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)