You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Yan Chunlu <sp...@gmail.com> on 2011/07/28 20:24:58 UTC

how to solve one node is in heavy load in unbalanced cluster

I have three nodes and RF=3.here is the current ring:


Address Status State Load Owns Token

84944475733633104818662955375549269696
node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696


it is very un-balanced and I would like to re-balance it using
"nodetool move" asap. unfortunately I haven't been run node repair for
a long time.

aaron suggested it's better to run node repair on every node then re-balance it.


problem is the node3 is in heavy-load currently, and the entire
cluster slow down if I start doing node repair. I have to
disablegossip and disablethrift to stop the repair.

only cassandra running on that server and I have no idea what it was
doing. the cpu load is about 20+ currently. compcationstats and
netstats shows it was not doing anything.

I have change client to not to connect to node3, but still, it seems
in heavy load and io utils is 100%.


the log seems normal(although not sure what about the "Dropped read
message" thing):

 INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
2563726360 used; max is 4248829952
 WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
 INFO 13:21:38,560 Pool Name                    Active   Pending
 INFO 13:21:38,560 ReadStage                         8      7555
 INFO 13:21:38,561 RequestResponseStage              0         0
 INFO 13:21:38,561 ReadRepairStage                   0         0



is there anyway to tell what node3 was doing? or at least is there any
way to make it not slowdown the whole cluster?

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

thanks for the confirmation aaron!

On Sun, Aug 7, 2011 at 4:01 PM, aaron morton <aa...@thelastpickle.com>wrote:

> move first removes the node from the cluster, then adds it back
> http://wiki.apache.org/cassandra/Operations#Moving_nodes
>
> If you have 3 nodes and rf 3, removing the node will result in the error
> you are seeing. There is not enough nodes in the cluster to implement the
> replication factor.
>
> You can drop the RF down to 2 temporarily and then put it back to 3 later,
> see http://wiki.apache.org/cassandra/Operations#Replication
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5 Aug 2011, at 03:39, Yan Chunlu wrote:
>
> hi, any  help? thanks!
>
> On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu <sp...@gmail.com> wrote:
>
>> forgot to mention I am using cassandra 0.7.4
>>
>>
>> On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu <sp...@gmail.com> wrote:
>>
>>> also nothing happens about the streaming:
>>>
>>> nodetool -h node3 netstats
>>> Mode: Normal
>>> Not sending any streams.
>>>  Nothing streaming from /10.28.53.11
>>> Pool Name                    Active   Pending      Completed
>>> Commands                        n/a         0      165086750
>>> Responses                       n/a         0       99372520
>>>
>>>
>>>
>>> On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu <sp...@gmail.com>wrote:
>>>
>>>> sorry the ring info should be this:
>>>>
>>>> nodetool -h node3 ring
>>>> Address         Status State   Load            Owns    Token
>>>>
>>>>
>>>>  84944475733633104818662955375549269696
>>>> node1      Up     Normal  13.18 GB        81.09%
>>>>  52773518586096316348543097376923124102
>>>> node2     Up     Normal  22.85 GB        10.48%
>>>>  70597222385644499881390884416714081360
>>>> node3      Up     Leaving 25.44 GB        8.43%
>>>> 84944475733633104818662955375549269696
>>>>
>>>>
>>>>
>>>> On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu <sp...@gmail.com>wrote:
>>>>
>>>>> I have tried the nodetool move but get the following error....
>>>>>
>>>>> node3:~# nodetool -h node3 move 0
>>>>> Exception in thread "main" java.lang.IllegalStateException: replication
>>>>> factor (3) exceeds number of endpoints (2)
>>>>>  at
>>>>> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>>>>> at
>>>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
>>>>>  at
>>>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
>>>>> at
>>>>> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
>>>>>  at
>>>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
>>>>> at
>>>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
>>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>> at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>  at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>  at
>>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>>>> at
>>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>>>>  at
>>>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>>>>> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>>>>  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>>>>> at
>>>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>>>>  at
>>>>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>>>>> at
>>>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>>>>  at
>>>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>>>>> at
>>>>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>>>>>  at
>>>>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>>>>> at
>>>>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>>>>>  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
>>>>> at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>>>> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>>>>>  at sun.rmi.transport.Transport$1.run(Transport.java:159)
>>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>>  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>>>>> at
>>>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>>>>>  at
>>>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>>>>> at
>>>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>>  at java.lang.Thread.run(Thread.java:662)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> then nodetool shows the node is leaving....
>>>>>
>>>>>
>>>>> nodetool -h node3 ring
>>>>> Address         Status State   Load            Owns    Token
>>>>>
>>>>>
>>>>>  84944475733633104818662955375549269696
>>>>> node1      Up     Normal  13.18 GB        81.09%
>>>>>  52773518586096316348543097376923124102
>>>>> node2     Up     Normal  22.85 GB        10.48%
>>>>>  70597222385644499881390884416714081360
>>>>>  node3      Up     Leaving 25.44 GB        8.43%
>>>>> 84944475733633104818662955375549269696
>>>>>
>>>>> the log didn't show any error message neither anything abnormal.  is
>>>>> there something wrong?
>>>>>
>>>>>
>>>>> I used to have RF=2, and changed it to RF=3 using cassandra-cli.
>>>>>
>>>>>
>>>>> On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu <sp...@gmail.com>wrote:
>>>>>
>>>>>> thanks a lot! I will try the "move".
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 1, 2011 at 7:07 AM, mcasandra <mo...@gmail.com>wrote:
>>>>>>
>>>>>>>
>>>>>>> springrider wrote:
>>>>>>> >
>>>>>>> > is that okay to do nodetool move before a completely repair?
>>>>>>> >
>>>>>>> > using this equation?
>>>>>>> > def tokens(nodes):
>>>>>>> >
>>>>>>> >    - for x in xrange(nodes):
>>>>>>> >       - print 2 ** 127 / nodes * x
>>>>>>> >
>>>>>>>
>>>>>>> Yes use that logic to get the tokens. I think it's safe to run move
>>>>>>> first
>>>>>>> and reair later. You are moving some nodes data as is so it's no
>>>>>>> worse than
>>>>>>> what you have right now.
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
>>>>>>> Sent from the cassandra-user@incubator.apache.org mailing list
>>>>>>> archive at Nabble.com.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by aaron morton <aa...@thelastpickle.com>.

move first removes the node from the cluster, then adds it back http://wiki.apache.org/cassandra/Operations#Moving_nodes

If you have 3 nodes and rf 3, removing the node will result in the error you are seeing. There is not enough nodes in the cluster to implement the replication factor. 

You can drop the RF down to 2 temporarily and then put it back to 3 later, see http://wiki.apache.org/cassandra/Operations#Replication

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5 Aug 2011, at 03:39, Yan Chunlu wrote:

> hi, any  help? thanks!
> 
> On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu <sp...@gmail.com> wrote:
> forgot to mention I am using cassandra 0.7.4
> 
> 
> On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu <sp...@gmail.com> wrote:
> also nothing happens about the streaming:
> 
> nodetool -h node3 netstats
> Mode: Normal
> Not sending any streams.
>  Nothing streaming from /10.28.53.11
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0      165086750
> Responses                       n/a         0       99372520
> 
> 
> 
> On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu <sp...@gmail.com> wrote:
> sorry the ring info should be this:
> 
> nodetool -h node3 ring
> Address         Status State   Load            Owns    Token                                       
>                                                        84944475733633104818662955375549269696      
> node1      Up     Normal  13.18 GB        81.09%  52773518586096316348543097376923124102      
> node2     Up     Normal  22.85 GB        10.48%  70597222385644499881390884416714081360      
> node3      Up     Leaving 25.44 GB        8.43%   84944475733633104818662955375549269696 
> 
> 
> 
> On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu <sp...@gmail.com> wrote:
> I have tried the nodetool move but get the following error....
> 
> node3:~# nodetool -h node3 move 0
> Exception in thread "main" java.lang.IllegalStateException: replication factor (3) exceeds number of endpoints (2)
> 	at org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
> 	at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
> 	at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
> 	at org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
> 	at org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
> 	at org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
> 	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
> 	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
> 	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
> 	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
> 	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
> 	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
> 	at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
> 	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
> 	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
> 	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
> 	at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
> 	at sun.rmi.transport.Transport$1.run(Transport.java:159)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
> 	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> 
> 
> 
> 
> then nodetool shows the node is leaving....
> 
> 
> nodetool -h node3 ring
> Address         Status State   Load            Owns    Token                                       
>                                                        84944475733633104818662955375549269696      
> node1      Up     Normal  13.18 GB        81.09%  52773518586096316348543097376923124102      
> node2     Up     Normal  22.85 GB        10.48%  70597222385644499881390884416714081360      
> node3      Up     Leaving 25.44 GB        8.43%   84944475733633104818662955375549269696 
> 
> the log didn't show any error message neither anything abnormal.  is there something wrong?
> 
> 
> I used to have RF=2, and changed it to RF=3 using cassandra-cli.
> 
> 
> On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu <sp...@gmail.com> wrote:
> thanks a lot! I will try the "move".
> 
> 
> On Mon, Aug 1, 2011 at 7:07 AM, mcasandra <mo...@gmail.com> wrote:
> 
> springrider wrote:
> >
> > is that okay to do nodetool move before a completely repair?
> >
> > using this equation?
> > def tokens(nodes):
> >
> >    - for x in xrange(nodes):
> >       - print 2 ** 127 / nodes * x
> >
> 
> Yes use that logic to get the tokens. I think it's safe to run move first
> and reair later. You are moving some nodes data as is so it's no worse than
> what you have right now.
> 
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

hi, any  help? thanks!

On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu <sp...@gmail.com> wrote:

> forgot to mention I am using cassandra 0.7.4
>
>
> On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu <sp...@gmail.com> wrote:
>
>> also nothing happens about the streaming:
>>
>> nodetool -h node3 netstats
>> Mode: Normal
>> Not sending any streams.
>>  Nothing streaming from /10.28.53.11
>> Pool Name                    Active   Pending      Completed
>> Commands                        n/a         0      165086750
>> Responses                       n/a         0       99372520
>>
>>
>>
>> On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu <sp...@gmail.com> wrote:
>>
>>> sorry the ring info should be this:
>>>
>>> nodetool -h node3 ring
>>> Address         Status State   Load            Owns    Token
>>>
>>>
>>>  84944475733633104818662955375549269696
>>> node1      Up     Normal  13.18 GB        81.09%
>>>  52773518586096316348543097376923124102
>>> node2     Up     Normal  22.85 GB        10.48%
>>>  70597222385644499881390884416714081360
>>> node3      Up     Leaving 25.44 GB        8.43%
>>> 84944475733633104818662955375549269696
>>>
>>>
>>>
>>> On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu <sp...@gmail.com>wrote:
>>>
>>>> I have tried the nodetool move but get the following error....
>>>>
>>>> node3:~# nodetool -h node3 move 0
>>>> Exception in thread "main" java.lang.IllegalStateException: replication
>>>> factor (3) exceeds number of endpoints (2)
>>>>  at
>>>> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>>>> at
>>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
>>>>  at
>>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
>>>> at
>>>> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
>>>>  at
>>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
>>>> at
>>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>  at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>  at
>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>>> at
>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>>>  at
>>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>>>> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>>>  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>>>> at
>>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>>>  at
>>>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>>>> at
>>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>>>  at
>>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>>>> at
>>>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>>>>  at
>>>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>>>> at
>>>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>>>>  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>>> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>>>>  at sun.rmi.transport.Transport$1.run(Transport.java:159)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>>>> at
>>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>>>>  at
>>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>>>> at
>>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>  at java.lang.Thread.run(Thread.java:662)
>>>>
>>>>
>>>>
>>>>
>>>> then nodetool shows the node is leaving....
>>>>
>>>>
>>>> nodetool -h node3 ring
>>>> Address         Status State   Load            Owns    Token
>>>>
>>>>
>>>>  84944475733633104818662955375549269696
>>>> node1      Up     Normal  13.18 GB        81.09%
>>>>  52773518586096316348543097376923124102
>>>> node2     Up     Normal  22.85 GB        10.48%
>>>>  70597222385644499881390884416714081360
>>>>  node3      Up     Leaving 25.44 GB        8.43%
>>>> 84944475733633104818662955375549269696
>>>>
>>>> the log didn't show any error message neither anything abnormal.  is
>>>> there something wrong?
>>>>
>>>>
>>>> I used to have RF=2, and changed it to RF=3 using cassandra-cli.
>>>>
>>>>
>>>> On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu <sp...@gmail.com>wrote:
>>>>
>>>>> thanks a lot! I will try the "move".
>>>>>
>>>>>
>>>>> On Mon, Aug 1, 2011 at 7:07 AM, mcasandra <mo...@gmail.com>wrote:
>>>>>
>>>>>>
>>>>>> springrider wrote:
>>>>>> >
>>>>>> > is that okay to do nodetool move before a completely repair?
>>>>>> >
>>>>>> > using this equation?
>>>>>> > def tokens(nodes):
>>>>>> >
>>>>>> >    - for x in xrange(nodes):
>>>>>> >       - print 2 ** 127 / nodes * x
>>>>>> >
>>>>>>
>>>>>> Yes use that logic to get the tokens. I think it's safe to run move
>>>>>> first
>>>>>> and reair later. You are moving some nodes data as is so it's no worse
>>>>>> than
>>>>>> what you have right now.
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
>>>>>> Sent from the cassandra-user@incubator.apache.org mailing list
>>>>>> archive at Nabble.com.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

forgot to mention I am using cassandra 0.7.4

On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu <sp...@gmail.com> wrote:

> also nothing happens about the streaming:
>
> nodetool -h node3 netstats
> Mode: Normal
> Not sending any streams.
>  Nothing streaming from /10.28.53.11
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0      165086750
> Responses                       n/a         0       99372520
>
>
>
> On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu <sp...@gmail.com> wrote:
>
>> sorry the ring info should be this:
>>
>> nodetool -h node3 ring
>> Address         Status State   Load            Owns    Token
>>
>>
>>  84944475733633104818662955375549269696
>> node1      Up     Normal  13.18 GB        81.09%
>>  52773518586096316348543097376923124102
>> node2     Up     Normal  22.85 GB        10.48%
>>  70597222385644499881390884416714081360
>> node3      Up     Leaving 25.44 GB        8.43%
>> 84944475733633104818662955375549269696
>>
>>
>>
>> On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu <sp...@gmail.com> wrote:
>>
>>> I have tried the nodetool move but get the following error....
>>>
>>> node3:~# nodetool -h node3 move 0
>>> Exception in thread "main" java.lang.IllegalStateException: replication
>>> factor (3) exceeds number of endpoints (2)
>>>  at
>>> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>>> at
>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
>>>  at
>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
>>> at
>>> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
>>>  at
>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
>>> at
>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>  at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>  at
>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>> at
>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>>  at
>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>>> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>>  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>>> at
>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>>  at
>>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>>> at
>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>>  at
>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>>> at
>>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>>>  at
>>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>>> at
>>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>>>  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>>>  at sun.rmi.transport.Transport$1.run(Transport.java:159)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>>> at
>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>>>  at
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>>> at
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>  at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>>
>>>
>>> then nodetool shows the node is leaving....
>>>
>>>
>>> nodetool -h node3 ring
>>> Address         Status State   Load            Owns    Token
>>>
>>>
>>>  84944475733633104818662955375549269696
>>> node1      Up     Normal  13.18 GB        81.09%
>>>  52773518586096316348543097376923124102
>>> node2     Up     Normal  22.85 GB        10.48%
>>>  70597222385644499881390884416714081360
>>>  node3      Up     Leaving 25.44 GB        8.43%
>>> 84944475733633104818662955375549269696
>>>
>>> the log didn't show any error message neither anything abnormal.  is
>>> there something wrong?
>>>
>>>
>>> I used to have RF=2, and changed it to RF=3 using cassandra-cli.
>>>
>>>
>>> On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu <sp...@gmail.com>wrote:
>>>
>>>> thanks a lot! I will try the "move".
>>>>
>>>>
>>>> On Mon, Aug 1, 2011 at 7:07 AM, mcasandra <mo...@gmail.com>wrote:
>>>>
>>>>>
>>>>> springrider wrote:
>>>>> >
>>>>> > is that okay to do nodetool move before a completely repair?
>>>>> >
>>>>> > using this equation?
>>>>> > def tokens(nodes):
>>>>> >
>>>>> >    - for x in xrange(nodes):
>>>>> >       - print 2 ** 127 / nodes * x
>>>>> >
>>>>>
>>>>> Yes use that logic to get the tokens. I think it's safe to run move
>>>>> first
>>>>> and reair later. You are moving some nodes data as is so it's no worse
>>>>> than
>>>>> what you have right now.
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
>>>>> Sent from the cassandra-user@incubator.apache.org mailing list archive
>>>>> at Nabble.com.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

also nothing happens about the streaming:

nodetool -h node3 netstats
Mode: Normal
Not sending any streams.
 Nothing streaming from /10.28.53.11
Pool Name                    Active   Pending      Completed
Commands                        n/a         0      165086750
Responses                       n/a         0       99372520



On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu <sp...@gmail.com> wrote:

> sorry the ring info should be this:
>
> nodetool -h node3 ring
> Address         Status State   Load            Owns    Token
>
>
>  84944475733633104818662955375549269696
> node1      Up     Normal  13.18 GB        81.09%
>  52773518586096316348543097376923124102
> node2     Up     Normal  22.85 GB        10.48%
>  70597222385644499881390884416714081360
> node3      Up     Leaving 25.44 GB        8.43%
> 84944475733633104818662955375549269696
>
>
>
> On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu <sp...@gmail.com> wrote:
>
>> I have tried the nodetool move but get the following error....
>>
>> node3:~# nodetool -h node3 move 0
>> Exception in thread "main" java.lang.IllegalStateException: replication
>> factor (3) exceeds number of endpoints (2)
>>  at
>> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>> at
>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
>>  at
>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
>> at
>> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
>>  at
>> org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
>> at
>> org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>  at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>>  at
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>> at
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>  at
>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>> at
>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>  at
>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>> at
>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>  at
>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>> at
>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>>  at
>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>> at
>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>>  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>  at java.lang.reflect.Method.invoke(Method.java:597)
>> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>>  at sun.rmi.transport.Transport$1.run(Transport.java:159)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>> at
>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>>  at
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>> at
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>  at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>>
>> then nodetool shows the node is leaving....
>>
>>
>> nodetool -h node3 ring
>> Address         Status State   Load            Owns    Token
>>
>>
>>  84944475733633104818662955375549269696
>> node1      Up     Normal  13.18 GB        81.09%
>>  52773518586096316348543097376923124102
>> node2     Up     Normal  22.85 GB        10.48%
>>  70597222385644499881390884416714081360
>>  node3      Up     Leaving 25.44 GB        8.43%
>> 84944475733633104818662955375549269696
>>
>> the log didn't show any error message neither anything abnormal.  is there
>> something wrong?
>>
>>
>> I used to have RF=2, and changed it to RF=3 using cassandra-cli.
>>
>>
>> On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu <sp...@gmail.com>wrote:
>>
>>> thanks a lot! I will try the "move".
>>>
>>>
>>> On Mon, Aug 1, 2011 at 7:07 AM, mcasandra <mo...@gmail.com>wrote:
>>>
>>>>
>>>> springrider wrote:
>>>> >
>>>> > is that okay to do nodetool move before a completely repair?
>>>> >
>>>> > using this equation?
>>>> > def tokens(nodes):
>>>> >
>>>> >    - for x in xrange(nodes):
>>>> >       - print 2 ** 127 / nodes * x
>>>> >
>>>>
>>>> Yes use that logic to get the tokens. I think it's safe to run move
>>>> first
>>>> and reair later. You are moving some nodes data as is so it's no worse
>>>> than
>>>> what you have right now.
>>>>
>>>> --
>>>> View this message in context:
>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
>>>> Sent from the cassandra-user@incubator.apache.org mailing list archive
>>>> at Nabble.com.
>>>>
>>>
>>>
>>
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

sorry the ring info should be this:

nodetool -h node3 ring
Address         Status State   Load            Owns    Token


 84944475733633104818662955375549269696
node1      Up     Normal  13.18 GB        81.09%
 52773518586096316348543097376923124102
node2     Up     Normal  22.85 GB        10.48%
 70597222385644499881390884416714081360
node3      Up     Leaving 25.44 GB        8.43%
84944475733633104818662955375549269696



On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu <sp...@gmail.com> wrote:

> I have tried the nodetool move but get the following error....
>
> node3:~# nodetool -h node3 move 0
> Exception in thread "main" java.lang.IllegalStateException: replication
> factor (3) exceeds number of endpoints (2)
>  at
> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
> at
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
>  at
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
> at
> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
>  at
> org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
> at
> org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
> at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>  at
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
> at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>  at
> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
> at
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>  at
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
> at
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>  at sun.rmi.transport.Transport$1.run(Transport.java:159)
> at java.security.AccessController.doPrivileged(Native Method)
>  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
> at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>  at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>  at java.lang.Thread.run(Thread.java:662)
>
>
>
>
> then nodetool shows the node is leaving....
>
>
> nodetool -h node3 ring
> Address         Status State   Load            Owns    Token
>
>
>  84944475733633104818662955375549269696
> node1      Up     Normal  13.18 GB        81.09%
>  52773518586096316348543097376923124102
> node2     Up     Normal  22.85 GB        10.48%
>  70597222385644499881390884416714081360
> node3      Up     Leaving 25.44 GB        8.43%
> 84944475733633104818662955375549269696
>
> the log didn't show any error message neither anything abnormal.  is there
> something wrong?
>
>
> I used to have RF=2, and changed it to RF=3 using cassandra-cli.
>
>
> On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu <sp...@gmail.com> wrote:
>
>> thanks a lot! I will try the "move".
>>
>>
>> On Mon, Aug 1, 2011 at 7:07 AM, mcasandra <mo...@gmail.com> wrote:
>>
>>>
>>> springrider wrote:
>>> >
>>> > is that okay to do nodetool move before a completely repair?
>>> >
>>> > using this equation?
>>> > def tokens(nodes):
>>> >
>>> >    - for x in xrange(nodes):
>>> >       - print 2 ** 127 / nodes * x
>>> >
>>>
>>> Yes use that logic to get the tokens. I think it's safe to run move first
>>> and reair later. You are moving some nodes data as is so it's no worse
>>> than
>>> what you have right now.
>>>
>>> --
>>> View this message in context:
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
>>> Sent from the cassandra-user@incubator.apache.org mailing list archive
>>> at Nabble.com.
>>>
>>
>>
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

I have tried the nodetool move but get the following error....

node3:~# nodetool -h node3 move 0
Exception in thread "main" java.lang.IllegalStateException: replication
factor (3) exceeds number of endpoints (2)
 at
org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
at
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
 at
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
at
org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
 at
org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
at
org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
 at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
 at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
 at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
 at
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
 at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
 at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
 at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
 at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
 at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)




then nodetool shows the node is leaving....


nodetool -h reagon ring
Address         Status State   Load            Owns    Token


 84944475733633104818662955375549269696
node3      Up     Normal  13.18 GB        81.09%
 52773518586096316348543097376923124102
node3     Up     Normal  22.85 GB        10.48%
 70597222385644499881390884416714081360
node3      Up     Leaving 25.44 GB        8.43%
84944475733633104818662955375549269696

the log didn't show any error message neither anything abnormal.  is there
something wrong?


I used to have RF=2, and changed it to RF=3 using cassandra-cli.


On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu <sp...@gmail.com> wrote:

> thanks a lot! I will try the "move".
>
>
> On Mon, Aug 1, 2011 at 7:07 AM, mcasandra <mo...@gmail.com> wrote:
>
>>
>> springrider wrote:
>> >
>> > is that okay to do nodetool move before a completely repair?
>> >
>> > using this equation?
>> > def tokens(nodes):
>> >
>> >    - for x in xrange(nodes):
>> >       - print 2 ** 127 / nodes * x
>> >
>>
>> Yes use that logic to get the tokens. I think it's safe to run move first
>> and reair later. You are moving some nodes data as is so it's no worse
>> than
>> what you have right now.
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
>> Sent from the cassandra-user@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

thanks a lot! I will try the "move".

On Mon, Aug 1, 2011 at 7:07 AM, mcasandra <mo...@gmail.com> wrote:

>
> springrider wrote:
> >
> > is that okay to do nodetool move before a completely repair?
> >
> > using this equation?
> > def tokens(nodes):
> >
> >    - for x in xrange(nodes):
> >       - print 2 ** 127 / nodes * x
> >
>
> Yes use that logic to get the tokens. I think it's safe to run move first
> and reair later. You are moving some nodes data as is so it's no worse than
> what you have right now.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by mcasandra <mo...@gmail.com>.

springrider wrote:
> 
> is that okay to do nodetool move before a completely repair?
> 
> using this equation?
> def tokens(nodes):
> 
>    - for x in xrange(nodes):
>       - print 2 ** 127 / nodes * x
> 

Yes use that logic to get the tokens. I think it's safe to run move first
and reair later. You are moving some nodes data as is so it's no worse than
what you have right now.

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

okay, thanks Aaron!

On Mon, Aug 1, 2011 at 5:43 AM, aaron morton <aa...@thelastpickle.com>wrote:

> aaron suggested it's better to run node repair on every node then
> re-balance it.
>
>
> That's me been cautious with other peoples data.
>
> It looks like node 3 is overwhelmed. Try getting the move sorted.
>
> Cheers
>
>  -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1 Aug 2011, at 05:48, Yan Chunlu wrote:
>
> is that okay to do nodetool move before a completely repair?
>
> using this equation?
> def tokens(nodes):
>
>    - for x in xrange(nodes):
>       - print 2 ** 127 / nodes * x
>
>
> On Mon, Aug 1, 2011 at 1:17 AM, mcasandra <mo...@gmail.com> wrote:
>
>> First run nodetool move and then you can run nodetool repair. Before you
>> run
>> nodetool move you will need to determine tokens that each node will be
>> responsible for. Then use that token to perform move.
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html
>> Sent from the cassandra-user@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>
>
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by aaron morton <aa...@thelastpickle.com>.

> aaron suggested it's better to run node repair on every node then re-balance it.

That's me been cautious with other peoples data.

It looks like node 3 is overwhelmed. Try getting the move sorted. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 1 Aug 2011, at 05:48, Yan Chunlu wrote:

> is that okay to do nodetool move before a completely repair?
> 
> using this equation?
> def tokens(nodes):
> for x in xrange(nodes):
> print 2 ** 127 / nodes * x
> 
> On Mon, Aug 1, 2011 at 1:17 AM, mcasandra <mo...@gmail.com> wrote:
> First run nodetool move and then you can run nodetool repair. Before you run
> nodetool move you will need to determine tokens that each node will be
> responsible for. Then use that token to perform move.
> 
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

is that okay to do nodetool move before a completely repair?

using this equation?
def tokens(nodes):

   - for x in xrange(nodes):
      - print 2 ** 127 / nodes * x


On Mon, Aug 1, 2011 at 1:17 AM, mcasandra <mo...@gmail.com> wrote:

> First run nodetool move and then you can run nodetool repair. Before you
> run
> nodetool move you will need to determine tokens that each node will be
> responsible for. Then use that token to perform move.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by mcasandra <mo...@gmail.com>.

First run nodetool move and then you can run nodetool repair. Before you run
nodetool move you will need to determine tokens that each node will be
responsible for. Then use that token to perform move.

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

any help? thanks!

On Fri, Jul 29, 2011 at 12:05 PM, Yan Chunlu <sp...@gmail.com> wrote:

> and by the way, my RF=3 and the other two nodes have much more capacity,
> why does they always routed the request to node3?
>
> coud I do a rebalance now? before node repair?
>
>
> On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu <sp...@gmail.com>wrote:
>
>> add new nodes seems added more pressure  to the cluster?  how about your
>> data size?
>>
>>
>> On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan <fr...@aimatch.com> wrote:
>>
>>> "Dropped read message" might be an indicator of capacity issue. We
>>> experienced the similar issue with 0.7.6.
>>>
>>> We ended up adding two extra nodes and physically rebooted the offending
>>> node(s).
>>>
>>> The entire cluster then calmed down.
>>>
>>> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu <sp...@gmail.com>wrote:
>>>
>>>> I have three nodes and RF=3.here is the current ring:
>>>>
>>>>
>>>> Address Status State Load Owns Token
>>>>
>>>> 84944475733633104818662955375549269696
>>>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
>>>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
>>>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>>>>
>>>>
>>>> it is very un-balanced and I would like to re-balance it using
>>>> "nodetool move" asap. unfortunately I haven't been run node repair for
>>>> a long time.
>>>>
>>>> aaron suggested it's better to run node repair on every node then
>>>> re-balance it.
>>>>
>>>>
>>>> problem is the node3 is in heavy-load currently, and the entire
>>>> cluster slow down if I start doing node repair. I have to
>>>> disablegossip and disablethrift to stop the repair.
>>>>
>>>> only cassandra running on that server and I have no idea what it was
>>>> doing. the cpu load is about 20+ currently. compcationstats and
>>>> netstats shows it was not doing anything.
>>>>
>>>> I have change client to not to connect to node3, but still, it seems
>>>> in heavy load and io utils is 100%.
>>>>
>>>>
>>>> the log seems normal(although not sure what about the "Dropped read
>>>> message" thing):
>>>>
>>>>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
>>>> 2563726360 used; max is 4248829952
>>>>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>>>>  INFO 13:21:38,560 Pool Name                    Active   Pending
>>>>  INFO 13:21:38,560 ReadStage                         8      7555
>>>>  INFO 13:21:38,561 RequestResponseStage              0         0
>>>>  INFO 13:21:38,561 ReadRepairStage                   0         0
>>>>
>>>>
>>>>
>>>> is there anyway to tell what node3 was doing? or at least is there any
>>>> way to make it not slowdown the whole cluster?
>>>>
>>>
>>>
>>>
>>> --
>>> Frank Duan
>>> aiMatch
>>> frank@aimatch.com
>>> c: 703.869.9951
>>> www.aiMatch.com
>>>
>>>
>>
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

and by the way, my RF=3 and the other two nodes have much more capacity, why
does they always routed the request to node3?

coud I do a rebalance now? before node repair?

On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu <sp...@gmail.com> wrote:

> add new nodes seems added more pressure  to the cluster?  how about your
> data size?
>
>
> On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan <fr...@aimatch.com> wrote:
>
>> "Dropped read message" might be an indicator of capacity issue. We
>> experienced the similar issue with 0.7.6.
>>
>> We ended up adding two extra nodes and physically rebooted the offending
>> node(s).
>>
>> The entire cluster then calmed down.
>>
>> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu <sp...@gmail.com>wrote:
>>
>>> I have three nodes and RF=3.here is the current ring:
>>>
>>>
>>> Address Status State Load Owns Token
>>>
>>> 84944475733633104818662955375549269696
>>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
>>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
>>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>>>
>>>
>>> it is very un-balanced and I would like to re-balance it using
>>> "nodetool move" asap. unfortunately I haven't been run node repair for
>>> a long time.
>>>
>>> aaron suggested it's better to run node repair on every node then
>>> re-balance it.
>>>
>>>
>>> problem is the node3 is in heavy-load currently, and the entire
>>> cluster slow down if I start doing node repair. I have to
>>> disablegossip and disablethrift to stop the repair.
>>>
>>> only cassandra running on that server and I have no idea what it was
>>> doing. the cpu load is about 20+ currently. compcationstats and
>>> netstats shows it was not doing anything.
>>>
>>> I have change client to not to connect to node3, but still, it seems
>>> in heavy load and io utils is 100%.
>>>
>>>
>>> the log seems normal(although not sure what about the "Dropped read
>>> message" thing):
>>>
>>>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
>>> 2563726360 used; max is 4248829952
>>>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>>>  INFO 13:21:38,560 Pool Name                    Active   Pending
>>>  INFO 13:21:38,560 ReadStage                         8      7555
>>>  INFO 13:21:38,561 RequestResponseStage              0         0
>>>  INFO 13:21:38,561 ReadRepairStage                   0         0
>>>
>>>
>>>
>>> is there anyway to tell what node3 was doing? or at least is there any
>>> way to make it not slowdown the whole cluster?
>>>
>>
>>
>>
>> --
>> Frank Duan
>> aiMatch
>> frank@aimatch.com
>> c: 703.869.9951
>> www.aiMatch.com
>>
>>
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Yan Chunlu <sp...@gmail.com>.

add new nodes seems added more pressure  to the cluster?  how about your
data size?

On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan <fr...@aimatch.com> wrote:

> "Dropped read message" might be an indicator of capacity issue. We
> experienced the similar issue with 0.7.6.
>
> We ended up adding two extra nodes and physically rebooted the offending
> node(s).
>
> The entire cluster then calmed down.
>
> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu <sp...@gmail.com> wrote:
>
>> I have three nodes and RF=3.here is the current ring:
>>
>>
>> Address Status State Load Owns Token
>>
>> 84944475733633104818662955375549269696
>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>>
>>
>> it is very un-balanced and I would like to re-balance it using
>> "nodetool move" asap. unfortunately I haven't been run node repair for
>> a long time.
>>
>> aaron suggested it's better to run node repair on every node then
>> re-balance it.
>>
>>
>> problem is the node3 is in heavy-load currently, and the entire
>> cluster slow down if I start doing node repair. I have to
>> disablegossip and disablethrift to stop the repair.
>>
>> only cassandra running on that server and I have no idea what it was
>> doing. the cpu load is about 20+ currently. compcationstats and
>> netstats shows it was not doing anything.
>>
>> I have change client to not to connect to node3, but still, it seems
>> in heavy load and io utils is 100%.
>>
>>
>> the log seems normal(although not sure what about the "Dropped read
>> message" thing):
>>
>>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
>> 2563726360 used; max is 4248829952
>>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>>  INFO 13:21:38,560 Pool Name                    Active   Pending
>>  INFO 13:21:38,560 ReadStage                         8      7555
>>  INFO 13:21:38,561 RequestResponseStage              0         0
>>  INFO 13:21:38,561 ReadRepairStage                   0         0
>>
>>
>>
>> is there anyway to tell what node3 was doing? or at least is there any
>> way to make it not slowdown the whole cluster?
>>
>
>
>
> --
> Frank Duan
> aiMatch
> frank@aimatch.com
> c: 703.869.9951
> www.aiMatch.com
>
>

Re: how to solve one node is in heavy load in unbalanced cluster

Posted by Frank Duan <fr...@aimatch.com>.

"Dropped read message" might be an indicator of capacity issue. We
experienced the similar issue with 0.7.6.

We ended up adding two extra nodes and physically rebooted the offending
node(s).

The entire cluster then calmed down.

On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu <sp...@gmail.com> wrote:

> I have three nodes and RF=3.here is the current ring:
>
>
> Address Status State Load Owns Token
>
> 84944475733633104818662955375549269696
> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>
>
> it is very un-balanced and I would like to re-balance it using
> "nodetool move" asap. unfortunately I haven't been run node repair for
> a long time.
>
> aaron suggested it's better to run node repair on every node then
> re-balance it.
>
>
> problem is the node3 is in heavy-load currently, and the entire
> cluster slow down if I start doing node repair. I have to
> disablegossip and disablethrift to stop the repair.
>
> only cassandra running on that server and I have no idea what it was
> doing. the cpu load is about 20+ currently. compcationstats and
> netstats shows it was not doing anything.
>
> I have change client to not to connect to node3, but still, it seems
> in heavy load and io utils is 100%.
>
>
> the log seems normal(although not sure what about the "Dropped read
> message" thing):
>
>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
> 2563726360 used; max is 4248829952
>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>  INFO 13:21:38,560 Pool Name                    Active   Pending
>  INFO 13:21:38,560 ReadStage                         8      7555
>  INFO 13:21:38,561 RequestResponseStage              0         0
>  INFO 13:21:38,561 ReadRepairStage                   0         0
>
>
>
> is there anyway to tell what node3 was doing? or at least is there any
> way to make it not slowdown the whole cluster?
>



-- 
Frank Duan
aiMatch
frank@aimatch.com
c: 703.869.9951
www.aiMatch.com