You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sasha Dolgy (JIRA)" <ji...@apache.org> on 2011/09/10 23:25:08 UTC

[jira] [Created] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Ec2Snitch nodetool issue after upgrade to 0.8.5
-----------------------------------------------

                 Key: CASSANDRA-3175
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.8.5
         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
            Reporter: Sasha Dolgy


Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:

Address         DC          Rack        Status State   Load Owns    Token
       148362247927262972740864614603570725035
172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503

works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.

the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:

Address         DC          Rack        Status State   Load Owns    Token

       148362247927262972740864614603570725035
172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
Exception in thread "main" java.lang.NullPointerException
       at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
       at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
       at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
       at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
       at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
       at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
       at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
       at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
       at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
       at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
       at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
       at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
       at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
       at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
       at sun.rmi.transport.Transport$1.run(Transport.java:159)
       at java.security.AccessController.doPrivileged(Native Method)
       at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
       at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
       at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
       at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       at java.lang.Thread.run(Thread.java:662)

I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.

There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:

nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
Mode: Normal
 Nothing streaming to /172.16.14.10
 Nothing streaming from /172.16.14.10
Pool Name                    Active   Pending      Completed
Commands                        n/a         0              3
Responses                       n/a         1           1483

nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
Mode: Normal
 Nothing streaming to /172.16.14.12
 Nothing streaming from /172.16.14.12
Pool Name                    Active   Pending      Completed
Commands                        n/a         0              3
Responses                       n/a         0           1784

Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Vijay (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102366#comment-13102366 ] 

Vijay commented on CASSANDRA-3175:
----------------------------------

I couldn't replicate this with US-East, may be partly because i am not able to set CIDR notation... can you plz change listern_address with a real IP and see if the issue goes away?

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-3175:
-----------------------------------------

    Assignee: Brandon Williams

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Brandon Williams
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Sasha Dolgy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103025#comment-13103025 ] 

Sasha Dolgy commented on CASSANDRA-3175:
----------------------------------------

Removed the issues I was having which caused this by running on the system keyspace:  del LocationInfo[52696e67]; and then proceeding to shut down each node.  once all were down, brought them all back up ... 

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Sasha Dolgy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102247#comment-13102247 ] 

Sasha Dolgy commented on CASSANDRA-3175:
----------------------------------------

executing command on 172.16.12.10: nodetool -h 172.16.12.10 -p 9090 ring (error message)
executing command on 172.16.12.11: nodetool -h 172.16.12.11 -p 9090 ring (error message)
executing command on 172.16.14.10: nodetool -h 172.16.14.10 -p 9090 ring
executing command on 172.16.14.12: nodetool -h 172.16.14.12 -p 9090 ring 

No errors in the system.log on any node.  

nodetool -h 172.16.12.11 -p 9090 version
ReleaseVersion: 0.8.5

nodetool -h 172.16.12.10 -p 9090 version
ReleaseVersion: 0.8.5

nodetool -h 172.16.14.10 -p 9090 version
ReleaseVersion: 0.8.5

nodetool -h 172.16.14.12 -p 9090 version
ReleaseVersion: 0.8.5

netstat -an | grep 7000 (on 172.16.14.12):

tcp        0      0 172.16.14.12:45116      172.16.14.10:7000       ESTABLISHED
tcp        0      0 172.16.14.12:57148      172.16.12.10:7000       ESTABLISHED
tcp        0      0 172.16.14.12:7000       172.16.12.11:42336      ESTABLISHED
tcp        0      0 172.16.14.12:42728      172.16.12.11:7000       ESTABLISHED
tcp        0      0 172.16.14.12:7000       172.16.12.10:52259      ESTABLISHED
tcp        0      0 172.16.14.12:43229      172.16.12.11:7000       ESTABLISHED
tcp        0      0 172.16.14.12:7000       172.16.14.10:57429      ESTABLISHED
tcp        0      0 172.16.14.12:48923      172.16.12.10:7000       ESTABLISHED
tcp        0      0 172.16.14.12:7000       172.16.14.10:52346      ESTABLISHED
tcp        0      0 172.16.14.12:34007      172.16.14.10:7000       ESTABLISHED
tcp        0      0 172.16.14.12:7000       172.16.12.10:57794      ESTABLISHED
tcp        0      0 172.16.14.12:7000       172.16.12.11:33248      ESTABLISHED

netstat -an | grep 7000 (on 172.16.12.11):

netstat -an | grep 7000
tcp        0      0 172.16.12.11:7000       0.0.0.0:*               LISTEN     
tcp        0      0 172.16.12.11:37427      172.16.14.10:7000       ESTABLISHED
tcp        0      0 172.16.12.11:7000       172.16.14.10:50314      ESTABLISHED
tcp        0      0 172.16.12.11:7000       172.16.12.10:56657      ESTABLISHED
tcp        0      0 172.16.12.11:7000       172.16.14.10:47700      ESTABLISHED
tcp        0      0 172.16.12.11:50901      172.16.12.10:7000       ESTABLISHED
tcp        0      0 172.16.12.11:33248      172.16.14.12:7000       ESTABLISHED
tcp        0      0 172.16.12.11:51640      172.16.14.10:7000       ESTABLISHED
tcp        0      0 172.16.12.11:7000       172.16.14.12:43229      ESTABLISHED
tcp        0      0 172.16.12.11:34244      172.16.12.10:7000       ESTABLISHED
tcp        0      0 172.16.12.11:7000       172.16.14.12:42728      ESTABLISHED
tcp        0      0 172.16.12.11:7000       172.16.12.10:55986      ESTABLISHED
tcp        0      0 172.16.12.11:42336      172.16.14.12:7000       ESTABLISHED
tcp        0      1 172.16.12.11:42521      10.130.185.136:7000     SYN_SENT   


cassandra.yaml configuration: 

storage_port: 7000
listen_address: Configured as the 172.16.0.0/16 IP of each server listed above
rpc_address: 0.0.0.0
rpc_port: 9160

On the nodes having the issue, the "SYN_SENT" appears and is showing towards the internal EC2 IP of one of the "good" nodes.  Which shouldn't be happening.  It should be using the private IP we have defined as the listen_address on each node.

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams reassigned CASSANDRA-3175:
-------------------------------------------

    Assignee: Vijay  (was: Brandon Williams)

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Vijay (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102979#comment-13102979 ] 

Vijay commented on CASSANDRA-3175:
----------------------------------

which node owns 10.130.185.136 IP provided by EC2? i think you have to check your vpn settings...

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Sasha Dolgy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102861#comment-13102861 ] 

Sasha Dolgy commented on CASSANDRA-3175:
----------------------------------------

>From a node that works: 
[default@system] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.Ec2Snitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
	1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10]
[default@system] 

>From a node that doesn't work:
[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.Ec2Snitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
	1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10]
	UNREACHABLE: [10.130.185.136]
[default@unknown] 

We do not use any internal EC2 IP's ... how did this appear?  

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams resolved CASSANDRA-3175.
-----------------------------------------

    Resolution: Invalid

We tracked this down to the issue describe in CASSANDRA-3186, so closing this ticket in favor of that one.

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102893#comment-13102893 ] 

Brandon Williams commented on CASSANDRA-3175:
---------------------------------------------

It looks like at some point 10.130.185.136 was part of the cluster.  I would try shutting down any nodes that show it, at the same time, and then starting them back up.  If it comes back after that it's being re-gossipped from other machines and you'll likely have to do a removetoken on, but that's complicated by the fact that getting the token will be difficult.  A full ring restart will definitely solve it.

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Sasha Dolgy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102548#comment-13102548 ] 

Sasha Dolgy commented on CASSANDRA-3175:
----------------------------------------

Hi Vijay,

We don't use the "real IP" for intra-node communication.  To circumvent the problems with Ec2Snitch between regions, we leverage vpn tunnels, as per:  http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/advice-for-EC2-deployment-td6294613i20.html .. "we use a combination of Vyatta & OpenVPN on the nodes that are EC2 and 
nodes that aren't Ec2....works a treat."  this statement is still accurate ... except with 0.8.5, 'nodetool ring' throws an error ... as the issue exists on only 2 of the 4 nodes, could you help me understand what is actually happening to cause this error?  

Recently I've tried to put entries for the nodes in /etc/hosts on the nodes where the error occurs ... but that didn't fix it.  

How does Cassandra determine the NAME of the node of each Data Center?  Where does it get this information?  

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Vijay (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102891#comment-13102891 ] 

Vijay commented on CASSANDRA-3175:
----------------------------------

I am kind of in the dark on what happened.... Can you grep for the 10.130.185.136 in the logs (system.log) and see if it was marked alive at some point in time? which node owns 10.130.185.136?

Can you also try downgrading it to .8.2 to see if the issue goes away?

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Sasha Dolgy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102861#comment-13102861 ] 

Sasha Dolgy edited comment on CASSANDRA-3175 at 9/12/11 6:10 PM:
-----------------------------------------------------------------

>From the two nodes that work: 
[default@system] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.Ec2Snitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
	1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10]
[default@system] 

>From the two nodes that don't work:
[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.Ec2Snitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
	1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10]
	UNREACHABLE: [10.130.185.136]
[default@unknown] 

We do not use any internal EC2 IP's ... how did this appear?  Or better still, how can it get flushed?

      was (Author: sdolgy):
    From a node that works: 
[default@system] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.Ec2Snitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
	1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10]
[default@system] 

>From a node that doesn't work:
[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.Ec2Snitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
	1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10]
	UNREACHABLE: [10.130.185.136]
[default@unknown] 

We do not use any internal EC2 IP's ... how did this appear?  
  
> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3175:
--------------------------------------

         Priority: Minor  (was: Major)
    Fix Version/s: 0.8.6

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Brandon Williams
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Vijay (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102188#comment-13102188 ] 

Vijay commented on CASSANDRA-3175:
----------------------------------

do you see a error in the system.log? i couldn't reproduce the issue yet... All the nodes are on .8.5 right? Are using nodetool -h localhost?

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Vijay (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102749#comment-13102749 ] 

Vijay commented on CASSANDRA-3175:
----------------------------------

Sasha, This may be happening when at least one node's gossip is not received from the other node which is the 172.16.14.x network.... Which means you might want to see if the connection from all of the nodes in 14.x network to 12.x network is working as expected.

"How does Cassandra determine the NAME of the node of each Data Center? Where does it get this information?"
the way EC2Snitch works is by adding the gossip (application state) to the node and letting the nodes discover the Rack/DC information by gossiping, if we dont know any one nodes information then this ex will occur.

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Sasha Dolgy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102912#comment-13102912 ] 

Sasha Dolgy commented on CASSANDRA-3175:
----------------------------------------

Shut down both nodes having the problem.  I brought only one back online and it still continues to have the problem.  

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.Ec2Snitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
	UNREACHABLE: [10.130.185.136, 172.16.12.11]
	03893e00-dd6b-11e0-0000-564008fe649f: [172.16.12.10, 172.16.14.12, 172.16.14.10]
[default@unknown] 

A full ring restart didn't solve this ... I've already tried that the other day.  

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3175) Ec2Snitch nodetool issue after upgrade to 0.8.5

Posted by "Sasha Dolgy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102883#comment-13102883 ] 

Sasha Dolgy commented on CASSANDRA-3175:
----------------------------------------

/mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5] 2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed.
/mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7] 2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed.
/mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9] 2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed.

Appears only in the logs for one node that is generating the issue.  172.16.12.10

Where do I find where the AntiEntropyService.getNeighbors(tablename, range) is pulling it's information from?

> Ec2Snitch nodetool issue after upgrade to 0.8.5
> -----------------------------------------------
>
>                 Key: CASSANDRA-3175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3175
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>         Environment: Amazon Ec2 with 4 nodes in region ap-southeast.  2 in 1a, 2 in 1b
>            Reporter: Sasha Dolgy
>            Assignee: Vijay
>            Priority: Minor
>              Labels: ec2snitch, nodetool
>             Fix For: 0.8.6
>
>
> Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem.  It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes.  the rest are all working fine:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.63 MB 22.11% 56713727820156410577229101238628035242
> 172.16.14.10    ap-southeast1b          Up     Normal  1.85 MB 33.33% 113427455640312821154458202477256070484
> 172.16.14.12    ap-southeast1b          Up     Normal  1.36 MB 20.53% 14836224792726297274086461460357072503
> works ... on 2 nodes which happen to be on the 172.16.14.0/24 network.
> the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run:
> Address         DC          Rack        Status State   Load Owns    Token
>        148362247927262972740864614603570725035
> 172.16.12.11    ap-southeast1a          Up     Normal  1.58 MB 24.02% 19095547144942516281182777765338228798
> 172.16.12.10    ap-southeast1a          Up     Normal  1.62 MB 22.11% 56713727820156410577229101238628035242
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93)
>        at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122)
>        at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>        at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> I've stopped the node and started it ... still doesn't make a difference.  I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes.
> There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for:
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10
> Mode: Normal
>  Nothing streaming to /172.16.14.10
>  Nothing streaming from /172.16.14.10
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         1           1483
> nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.12
> Mode: Normal
>  Nothing streaming to /172.16.14.12
>  Nothing streaming from /172.16.14.12
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0              3
> Responses                       n/a         0           1784
> Problem only looks to appear via nodetool, and doesn't seem to be impacting anything else.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira