You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Erik Onnen (JIRA)" <ji...@apache.org> on 2011/01/23 21:22:43 UTC

[jira] Created: (CASSANDRA-2037) Unsafe Multimap Access in MessagingService

Unsafe Multimap Access in MessagingService
------------------------------------------

                 Key: CASSANDRA-2037
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2037
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.7.0
            Reporter: Erik Onnen
            Priority: Critical


MessagingSerice is a system singleton with a static Multimap field targets. Multimaps are not thread safe but no attempt is made to synchronize access to that field. Multimap ultimately uses the standard java HashMap which is susceptible to a race condition where threads will get stuck during a get operation yielding multiple threads similar to the following stack:

"pool-1-thread-6451" prio=10 tid=0x00007fa5242c9000 nid=0x10f4 runnable [0x00007fa52fde4000]
   java.lang.Thread.State: RUNNABLE
	at java.util.HashMap.get(HashMap.java:303)
	at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:205)
	at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
	at com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:72)
	at com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:60)
	at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:303)
	at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:353)
	at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
	at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
	at org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
	at org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2037) Unsafe Multimap Access in MessagingService

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986534#action_12986534 ] 

Thibaut commented on CASSANDRA-2037:
------------------------------------

Both. But there might be many requests during the first few seconds when I restart our application. The cluster has size 20.

I disabled JNA, but this didn't help. I still see sudden spikes where cassandra will take up an enormous amount of cpu (uptime load > 1000).

Jstack won't work anymore:

-bash-4.1# jstack 27699 > /tmp/jstackerror
27699: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding

Also, my entire application comes to a halt as the node is still marked as up, but won't respond (cassandra is taking up all the cpu on the first node)

/software/cassandra/bin/nodetool -h localhost ring
Address         Status State   Load            Owns    Token                                                             
                                                       ffffffffffffffff                                                  
192.168.0.1     Up     Normal  3.48 GB         5.00%   0cc                                                               
192.168.0.2     Up     Normal  3.48 GB         5.00%   199                                                               
192.168.0.3     Up     Normal  3.67 GB         5.00%   266                                                               
192.168.0.4     Up     Normal  2.55 GB         5.00%   333                                                               
192.168.0.5     Up     Normal  2.58 GB         5.00%   400                                                               
192.168.0.6     Up     Normal  2.54 GB         5.00%   4cc                                                               
192.168.0.7     Up     Normal  2.59 GB         5.00%   599                                                               
192.168.0.8     Up     Normal  2.58 GB         5.00%   666                                                               
192.168.0.9     Up     Normal  2.33 GB         5.00%   733                                                               
192.168.0.10    Down   Normal  2.39 GB         5.00%   7ff                                                               
192.168.0.11    Up     Normal  2.4 GB          5.00%   8cc                                                               
192.168.0.12    Up     Normal  2.74 GB         5.00%   999                                                               
192.168.0.13    Up     Normal  3.17 GB         5.00%   a66                                                               
192.168.0.14    Up     Normal  3.25 GB         5.00%   b33                                                               
192.168.0.15    Up     Normal  3.01 GB         5.00%   c00                                                               
192.168.0.16    Up     Normal  2.48 GB         5.00%   ccc                                                               
192.168.0.17    Up     Normal  2.41 GB         5.00%   d99                                                               
192.168.0.18    Up     Normal  2.3 GB          5.00%   e66                                                               
192.168.0.19    Up     Normal  2.27 GB         5.00%   f33                                                               
192.168.0.20    Up     Normal  2.32 GB         5.00%   ffffffffffffffff  


The interesting part is that after a while (seconds or minutes), I have seen cassandra nodes return to a normal state again (without restart). I have also never seen this happen at 2 nodes at the same time in the cluster (the node where it happens differes, but there seems to be scheme for it to happen on the first node most of the times).

In the above case, I restarted node 192.168.0.10  and the first node returned to normal state. (I don't know if there is a correlation)

I attached the jstack of the node in trouble (as soon as I could access it with jstack, but I suspect this is the jstack when the node was running normal again).





> Unsafe Multimap Access in MessagingService
> ------------------------------------------
>
>                 Key: CASSANDRA-2037
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2037
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Erik Onnen
>            Priority: Critical
>
> MessagingSerice is a system singleton with a static Multimap field targets. Multimaps are not thread safe but no attempt is made to synchronize access to that field. Multimap ultimately uses the standard java HashMap which is susceptible to a race condition where threads will get stuck during a get operation yielding multiple threads similar to the following stack:
> "pool-1-thread-6451" prio=10 tid=0x00007fa5242c9000 nid=0x10f4 runnable [0x00007fa52fde4000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.HashMap.get(HashMap.java:303)
> 	at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:205)
> 	at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
> 	at com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:72)
> 	at com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:60)
> 	at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:303)
> 	at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:353)
> 	at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
> 	at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
> 	at org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
> 	at org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
> 	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
> 	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2037) Unsafe Multimap Access in MessagingService

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985658#action_12985658 ] 

Thibaut commented on CASSANDRA-2037:
------------------------------------

I tried https://hudson.apache.org/hudson/job/Cassandra-0.7/193/artifact/cassandra/build/apache-cassandra-2011-01-24_06-01-26-bin.tar.gz.

After already a few seconds of running my app, I already had one instance taking over all cpus (uptime load was > 1000).

Unfortunately, I can't output a stacktrace since jstack won't connect (also the -F function won't have any effect).

So there is still a bug somewhere...



> Unsafe Multimap Access in MessagingService
> ------------------------------------------
>
>                 Key: CASSANDRA-2037
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2037
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Erik Onnen
>            Priority: Critical
>
> MessagingSerice is a system singleton with a static Multimap field targets. Multimaps are not thread safe but no attempt is made to synchronize access to that field. Multimap ultimately uses the standard java HashMap which is susceptible to a race condition where threads will get stuck during a get operation yielding multiple threads similar to the following stack:
> "pool-1-thread-6451" prio=10 tid=0x00007fa5242c9000 nid=0x10f4 runnable [0x00007fa52fde4000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.HashMap.get(HashMap.java:303)
> 	at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:205)
> 	at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
> 	at com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:72)
> 	at com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:60)
> 	at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:303)
> 	at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:353)
> 	at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
> 	at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
> 	at org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
> 	at org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
> 	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
> 	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (CASSANDRA-2037) Unsafe Multimap Access in MessagingService

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-2037.
---------------------------------------

    Resolution: Duplicate

yes, fixed for CASSANDRA-1959

> Unsafe Multimap Access in MessagingService
> ------------------------------------------
>
>                 Key: CASSANDRA-2037
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2037
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Erik Onnen
>            Priority: Critical
>
> MessagingSerice is a system singleton with a static Multimap field targets. Multimaps are not thread safe but no attempt is made to synchronize access to that field. Multimap ultimately uses the standard java HashMap which is susceptible to a race condition where threads will get stuck during a get operation yielding multiple threads similar to the following stack:
> "pool-1-thread-6451" prio=10 tid=0x00007fa5242c9000 nid=0x10f4 runnable [0x00007fa52fde4000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.HashMap.get(HashMap.java:303)
> 	at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:205)
> 	at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
> 	at com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:72)
> 	at com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:60)
> 	at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:303)
> 	at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:353)
> 	at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
> 	at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
> 	at org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
> 	at org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
> 	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
> 	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2037) Unsafe Multimap Access in MessagingService

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986713#action_12986713 ] 

Thibaut commented on CASSANDRA-2037:
------------------------------------

Created https://issues.apache.org/jira/browse/CASSANDRA-2054

> Unsafe Multimap Access in MessagingService
> ------------------------------------------
>
>                 Key: CASSANDRA-2037
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2037
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Erik Onnen
>            Priority: Critical
>         Attachments: jstackerror.txt
>
>
> MessagingSerice is a system singleton with a static Multimap field targets. Multimaps are not thread safe but no attempt is made to synchronize access to that field. Multimap ultimately uses the standard java HashMap which is susceptible to a race condition where threads will get stuck during a get operation yielding multiple threads similar to the following stack:
> "pool-1-thread-6451" prio=10 tid=0x00007fa5242c9000 nid=0x10f4 runnable [0x00007fa52fde4000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.HashMap.get(HashMap.java:303)
> 	at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:205)
> 	at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
> 	at com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:72)
> 	at com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:60)
> 	at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:303)
> 	at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:353)
> 	at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
> 	at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
> 	at org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
> 	at org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
> 	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
> 	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-2037) Unsafe Multimap Access in MessagingService

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thibaut updated CASSANDRA-2037:
-------------------------------

    Attachment: jstackerror.txt

Jstack shortly after node returned to normal state

> Unsafe Multimap Access in MessagingService
> ------------------------------------------
>
>                 Key: CASSANDRA-2037
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2037
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Erik Onnen
>            Priority: Critical
>         Attachments: jstackerror.txt
>
>
> MessagingSerice is a system singleton with a static Multimap field targets. Multimaps are not thread safe but no attempt is made to synchronize access to that field. Multimap ultimately uses the standard java HashMap which is susceptible to a race condition where threads will get stuck during a get operation yielding multiple threads similar to the following stack:
> "pool-1-thread-6451" prio=10 tid=0x00007fa5242c9000 nid=0x10f4 runnable [0x00007fa52fde4000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.HashMap.get(HashMap.java:303)
> 	at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:205)
> 	at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
> 	at com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:72)
> 	at com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:60)
> 	at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:303)
> 	at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:353)
> 	at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
> 	at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
> 	at org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
> 	at org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
> 	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
> 	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2037) Unsafe Multimap Access in MessagingService

Posted by "Erik Onnen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985416#action_12985416 ] 

Erik Onnen commented on CASSANDRA-2037:
---------------------------------------

Looks like this was fixed in trunk w/ 1057935

> Unsafe Multimap Access in MessagingService
> ------------------------------------------
>
>                 Key: CASSANDRA-2037
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2037
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Erik Onnen
>            Priority: Critical
>
> MessagingSerice is a system singleton with a static Multimap field targets. Multimaps are not thread safe but no attempt is made to synchronize access to that field. Multimap ultimately uses the standard java HashMap which is susceptible to a race condition where threads will get stuck during a get operation yielding multiple threads similar to the following stack:
> "pool-1-thread-6451" prio=10 tid=0x00007fa5242c9000 nid=0x10f4 runnable [0x00007fa52fde4000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.HashMap.get(HashMap.java:303)
> 	at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:205)
> 	at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
> 	at com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:72)
> 	at com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:60)
> 	at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:303)
> 	at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:353)
> 	at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
> 	at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
> 	at org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
> 	at org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
> 	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
> 	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2037) Unsafe Multimap Access in MessagingService

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985702#action_12985702 ] 

Jonathan Ellis commented on CASSANDRA-2037:
-------------------------------------------

Thibaut, are you doing reads, writes, or both?

> Unsafe Multimap Access in MessagingService
> ------------------------------------------
>
>                 Key: CASSANDRA-2037
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2037
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Erik Onnen
>            Priority: Critical
>
> MessagingSerice is a system singleton with a static Multimap field targets. Multimaps are not thread safe but no attempt is made to synchronize access to that field. Multimap ultimately uses the standard java HashMap which is susceptible to a race condition where threads will get stuck during a get operation yielding multiple threads similar to the following stack:
> "pool-1-thread-6451" prio=10 tid=0x00007fa5242c9000 nid=0x10f4 runnable [0x00007fa52fde4000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.HashMap.get(HashMap.java:303)
> 	at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:205)
> 	at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
> 	at com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:72)
> 	at com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:60)
> 	at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:303)
> 	at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:353)
> 	at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
> 	at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
> 	at org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
> 	at org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
> 	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
> 	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2037) Unsafe Multimap Access in MessagingService

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986543#action_12986543 ] 

Jonathan Ellis commented on CASSANDRA-2037:
-------------------------------------------

thibaut, can you create a new ticket for this?  I don't think it's related to the original multimap problem here.

(next thing to check: is the cpu maxing related to JVM GC?  uncomment the verbose GC logging from cassandra-env.sh.)

> Unsafe Multimap Access in MessagingService
> ------------------------------------------
>
>                 Key: CASSANDRA-2037
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2037
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Erik Onnen
>            Priority: Critical
>         Attachments: jstackerror.txt
>
>
> MessagingSerice is a system singleton with a static Multimap field targets. Multimaps are not thread safe but no attempt is made to synchronize access to that field. Multimap ultimately uses the standard java HashMap which is susceptible to a race condition where threads will get stuck during a get operation yielding multiple threads similar to the following stack:
> "pool-1-thread-6451" prio=10 tid=0x00007fa5242c9000 nid=0x10f4 runnable [0x00007fa52fde4000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.HashMap.get(HashMap.java:303)
> 	at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:205)
> 	at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
> 	at com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:72)
> 	at com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:60)
> 	at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:303)
> 	at org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:353)
> 	at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
> 	at org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
> 	at org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
> 	at org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
> 	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
> 	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.