You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by David Koblas <ko...@extra.com> on 2010/05/07 21:08:05 UTC

Overfull node

I've got two (out of five) nodes on my cassandra ring that somehow got 
too full (e.g. over 60% disk space utilization).  I've now gotten a few 
new machines added to the ring, but evertime one of the overfull nodes 
attempts to stream its data it runs out of diskspace...  I've tried half 
a dozen different bad ideas of how to get things moving along a bit 
smoother, but am at a total loss at this point.

Is there any good tricks to get cassandra to not need 2x the disk space 
to stream out, or is something else potentially going on that's causing 
me problems?

Thanks,

Re: Overfull node

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.
I had this happen when I changed the seed node in a running cluster, and
then started and stopped various nodes.  I "fixed" it by restarting
the seed node(s) (and waiting for it to be fully up), then restarting
all the other nodes.

-Anthony

On Fri, May 14, 2010 at 05:11:40PM -0700, David Koblas wrote:
> I've somehow now ended up in a very strange place...
> 
> If I ask '150' or '155' about the ring they report each other, but if I 
> ask the rest of the ring they have '155' but not '150' as members.  All 
> of the storage-conf files are basically clones of each other with the 
> same ring masters.
> 
> $ nodetool -h 10.1.0.155 ring
> Address       Status     Load          
> Range                                      Ring
>                                        
> 99811300882272441299595351868344045866
> 10.3.0.150    Up         3.08 TB       
> 6436333895300580402214871779965756352      |<--|
> 10.1.0.155    Up         3.08 TB       
> 99811300882272441299595351868344045866     |-->|
> 
> $ nodetool -h 10.2.0.174 ring
> Address       Status     Load          
> Range                                      Ring
>                                        
> 144951579690133260853298391132870993575
> 10.2.0.115    Up         2.7 TB        
> 55758122058160717108501182340054262660     |<--|
> 10.1.0.155    Up         3.08 TB       
> 99811300882272441299595351868344045866     |   ^
> 10.2.0.174    Up         1.32 TB       
> 118283207506463595491596277948095451613    v   |
> 10.3.0.151    Up         414.51 GB     
> 127520031787005730998588483181387651399    |   ^
> 10.3.0.152    Up         143.03 GB     
> 132137258578258111824507171284723589567    v   |
> 10.3.0.153    Up         245.51 GB     
> 134446064220575108370358944111505967571    |   ^
> 10.2.0.175    Up         3.16 TB       
> 136754979922617117666448707835107404441    v   |
> 10.2.0.114    Up         1.41 TB       
> 144951579690133260853298391132870993575    |-->|
> 
> $ nodetool -h 10.3.0.150 streams
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> 
> $ nodetool -h 10.1.0.155 streams
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> 
> $ nodetool -h 10.2.0.115 streams
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> 
> 
> 
> On 5/11/10 6:30 AM, Jonathan Ellis wrote:
> >s/keyspace/token/ and you've got it.
> >
> >On Mon, May 10, 2010 at 10:34 AM, David Koblas<ko...@extra.com>  wrote:
> >   
> >>Sounds great, will give it a go.  However, just to make sure I understand
> >>getting the keyspace correct.
> >>
> >>Lets say I've got:
> >>    A -- Node before overfull node in keyspace order
> >>    O -- Overfull node
> >>    B -- Node after O in keyspace order
> >>    N -- New empty node
> >>
> >>I'm going to assume that I should make the following assignment:
> >>    keyspace(N) = keyspace(A) + ( keyspace(O) - keyspace(A) ) / 2
> >>
> >>Or did I miss something else about keyspace ranges?
> >>Thanks
> >>
> >>
> >>On 5/7/10 1:25 PM, Jonathan Ellis wrote:
> >>     
> >>>If you're using RackUnawareStrategy (the default replication strategy)
> >>>then you can "bootstrap" manually fairly easily -- copy all the data
> >>>(not system) sstables from an overfull machine to a new machine,
> >>>assign the new one a token that gives it about half of the old node's
> >>>range, then start it with autobootstrap OFF.  Then run cleanup on both
> >>>new and old nodes to remove the part of the data that belongs to the
> >>>other.
> >>>
> >>>The downside vs real bootstrap is you can't do this safely while
> >>>writes are coming in to the original node.  You can reduce your
> >>>read-only period by doing an intial scp, then doing a flush + rsync
> >>>when you're ready to take it read only.
> >>>
> >>>(https://issues.apache.org/jira/browse/CASSANDRA-579 will make this
> >>>problem obsolete for 0.7 but that doesn't help you on 0.6, of course.)
> >>>
> >>>On Fri, May 7, 2010 at 2:08 PM, David Koblas<ko...@extra.com>    wrote:
> >>>
> >>>       
> >>>>I've got two (out of five) nodes on my cassandra ring that somehow got
> >>>>too
> >>>>full (e.g. over 60% disk space utilization).  I've now gotten a few new
> >>>>machines added to the ring, but evertime one of the overfull nodes
> >>>>attempts
> >>>>to stream its data it runs out of diskspace...  I've tried half a dozen
> >>>>different bad ideas of how to get things moving along a bit smoother, 
> >>>>but
> >>>>am
> >>>>at a total loss at this point.
> >>>>
> >>>>Is there any good tricks to get cassandra to not need 2x the disk space
> >>>>to
> >>>>stream out, or is something else potentially going on that's causing me
> >>>>problems?
> >>>>
> >>>>Thanks,
> >>>>
> >>>>
> >>>>         
> >>>
> >>>
> >>>       
> >>     
> >
> >
> >   

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Overfull node

Posted by David Koblas <ko...@extra.com>.
I've somehow now ended up in a very strange place...

If I ask '150' or '155' about the ring they report each other, but if I 
ask the rest of the ring they have '155' but not '150' as members.  All 
of the storage-conf files are basically clones of each other with the 
same ring masters.

$ nodetool -h 10.1.0.155 ring
Address       Status     Load          
Range                                      Ring
                                        
99811300882272441299595351868344045866
10.3.0.150    Up         3.08 TB       
6436333895300580402214871779965756352      |<--|
10.1.0.155    Up         3.08 TB       
99811300882272441299595351868344045866     |-->|

$ nodetool -h 10.2.0.174 ring
Address       Status     Load          
Range                                      Ring
                                        
144951579690133260853298391132870993575
10.2.0.115    Up         2.7 TB        
55758122058160717108501182340054262660     |<--|
10.1.0.155    Up         3.08 TB       
99811300882272441299595351868344045866     |   ^
10.2.0.174    Up         1.32 TB       
118283207506463595491596277948095451613    v   |
10.3.0.151    Up         414.51 GB     
127520031787005730998588483181387651399    |   ^
10.3.0.152    Up         143.03 GB     
132137258578258111824507171284723589567    v   |
10.3.0.153    Up         245.51 GB     
134446064220575108370358944111505967571    |   ^
10.2.0.175    Up         3.16 TB       
136754979922617117666448707835107404441    v   |
10.2.0.114    Up         1.41 TB       
144951579690133260853298391132870993575    |-->|

$ nodetool -h 10.3.0.150 streams
Mode: Normal
Not sending any streams.
Not receiving any streams.

$ nodetool -h 10.1.0.155 streams
Mode: Normal
Not sending any streams.
Not receiving any streams.

$ nodetool -h 10.2.0.115 streams
Mode: Normal
Not sending any streams.
Not receiving any streams.



On 5/11/10 6:30 AM, Jonathan Ellis wrote:
> s/keyspace/token/ and you've got it.
>
> On Mon, May 10, 2010 at 10:34 AM, David Koblas<ko...@extra.com>  wrote:
>    
>> Sounds great, will give it a go.  However, just to make sure I understand
>> getting the keyspace correct.
>>
>> Lets say I've got:
>>     A -- Node before overfull node in keyspace order
>>     O -- Overfull node
>>     B -- Node after O in keyspace order
>>     N -- New empty node
>>
>> I'm going to assume that I should make the following assignment:
>>     keyspace(N) = keyspace(A) + ( keyspace(O) - keyspace(A) ) / 2
>>
>> Or did I miss something else about keyspace ranges?
>> Thanks
>>
>>
>> On 5/7/10 1:25 PM, Jonathan Ellis wrote:
>>      
>>> If you're using RackUnawareStrategy (the default replication strategy)
>>> then you can "bootstrap" manually fairly easily -- copy all the data
>>> (not system) sstables from an overfull machine to a new machine,
>>> assign the new one a token that gives it about half of the old node's
>>> range, then start it with autobootstrap OFF.  Then run cleanup on both
>>> new and old nodes to remove the part of the data that belongs to the
>>> other.
>>>
>>> The downside vs real bootstrap is you can't do this safely while
>>> writes are coming in to the original node.  You can reduce your
>>> read-only period by doing an intial scp, then doing a flush + rsync
>>> when you're ready to take it read only.
>>>
>>> (https://issues.apache.org/jira/browse/CASSANDRA-579 will make this
>>> problem obsolete for 0.7 but that doesn't help you on 0.6, of course.)
>>>
>>> On Fri, May 7, 2010 at 2:08 PM, David Koblas<ko...@extra.com>    wrote:
>>>
>>>        
>>>> I've got two (out of five) nodes on my cassandra ring that somehow got
>>>> too
>>>> full (e.g. over 60% disk space utilization).  I've now gotten a few new
>>>> machines added to the ring, but evertime one of the overfull nodes
>>>> attempts
>>>> to stream its data it runs out of diskspace...  I've tried half a dozen
>>>> different bad ideas of how to get things moving along a bit smoother, but
>>>> am
>>>> at a total loss at this point.
>>>>
>>>> Is there any good tricks to get cassandra to not need 2x the disk space
>>>> to
>>>> stream out, or is something else potentially going on that's causing me
>>>> problems?
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>          
>>>
>>>
>>>        
>>      
>
>
>    

Re: Overfull node

Posted by Jonathan Ellis <jb...@gmail.com>.
s/keyspace/token/ and you've got it.

On Mon, May 10, 2010 at 10:34 AM, David Koblas <ko...@extra.com> wrote:
> Sounds great, will give it a go.  However, just to make sure I understand
> getting the keyspace correct.
>
> Lets say I've got:
>    A -- Node before overfull node in keyspace order
>    O -- Overfull node
>    B -- Node after O in keyspace order
>    N -- New empty node
>
> I'm going to assume that I should make the following assignment:
>    keyspace(N) = keyspace(A) + ( keyspace(O) - keyspace(A) ) / 2
>
> Or did I miss something else about keyspace ranges?
> Thanks
>
>
> On 5/7/10 1:25 PM, Jonathan Ellis wrote:
>>
>> If you're using RackUnawareStrategy (the default replication strategy)
>> then you can "bootstrap" manually fairly easily -- copy all the data
>> (not system) sstables from an overfull machine to a new machine,
>> assign the new one a token that gives it about half of the old node's
>> range, then start it with autobootstrap OFF.  Then run cleanup on both
>> new and old nodes to remove the part of the data that belongs to the
>> other.
>>
>> The downside vs real bootstrap is you can't do this safely while
>> writes are coming in to the original node.  You can reduce your
>> read-only period by doing an intial scp, then doing a flush + rsync
>> when you're ready to take it read only.
>>
>> (https://issues.apache.org/jira/browse/CASSANDRA-579 will make this
>> problem obsolete for 0.7 but that doesn't help you on 0.6, of course.)
>>
>> On Fri, May 7, 2010 at 2:08 PM, David Koblas<ko...@extra.com>  wrote:
>>
>>>
>>> I've got two (out of five) nodes on my cassandra ring that somehow got
>>> too
>>> full (e.g. over 60% disk space utilization).  I've now gotten a few new
>>> machines added to the ring, but evertime one of the overfull nodes
>>> attempts
>>> to stream its data it runs out of diskspace...  I've tried half a dozen
>>> different bad ideas of how to get things moving along a bit smoother, but
>>> am
>>> at a total loss at this point.
>>>
>>> Is there any good tricks to get cassandra to not need 2x the disk space
>>> to
>>> stream out, or is something else potentially going on that's causing me
>>> problems?
>>>
>>> Thanks,
>>>
>>>
>>
>>
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Overfull node

Posted by David Koblas <ko...@extra.com>.
Sounds great, will give it a go.  However, just to make sure I 
understand getting the keyspace correct.

Lets say I've got:
     A -- Node before overfull node in keyspace order
     O -- Overfull node
     B -- Node after O in keyspace order
     N -- New empty node

I'm going to assume that I should make the following assignment:
     keyspace(N) = keyspace(A) + ( keyspace(O) - keyspace(A) ) / 2

Or did I miss something else about keyspace ranges?
Thanks


On 5/7/10 1:25 PM, Jonathan Ellis wrote:
> If you're using RackUnawareStrategy (the default replication strategy)
> then you can "bootstrap" manually fairly easily -- copy all the data
> (not system) sstables from an overfull machine to a new machine,
> assign the new one a token that gives it about half of the old node's
> range, then start it with autobootstrap OFF.  Then run cleanup on both
> new and old nodes to remove the part of the data that belongs to the
> other.
>
> The downside vs real bootstrap is you can't do this safely while
> writes are coming in to the original node.  You can reduce your
> read-only period by doing an intial scp, then doing a flush + rsync
> when you're ready to take it read only.
>
> (https://issues.apache.org/jira/browse/CASSANDRA-579 will make this
> problem obsolete for 0.7 but that doesn't help you on 0.6, of course.)
>
> On Fri, May 7, 2010 at 2:08 PM, David Koblas<ko...@extra.com>  wrote:
>    
>> I've got two (out of five) nodes on my cassandra ring that somehow got too
>> full (e.g. over 60% disk space utilization).  I've now gotten a few new
>> machines added to the ring, but evertime one of the overfull nodes attempts
>> to stream its data it runs out of diskspace...  I've tried half a dozen
>> different bad ideas of how to get things moving along a bit smoother, but am
>> at a total loss at this point.
>>
>> Is there any good tricks to get cassandra to not need 2x the disk space to
>> stream out, or is something else potentially going on that's causing me
>> problems?
>>
>> Thanks,
>>
>>      
>
>
>    

Re: Overfull node

Posted by Jonathan Ellis <jb...@gmail.com>.
If you're using RackUnawareStrategy (the default replication strategy)
then you can "bootstrap" manually fairly easily -- copy all the data
(not system) sstables from an overfull machine to a new machine,
assign the new one a token that gives it about half of the old node's
range, then start it with autobootstrap OFF.  Then run cleanup on both
new and old nodes to remove the part of the data that belongs to the
other.

The downside vs real bootstrap is you can't do this safely while
writes are coming in to the original node.  You can reduce your
read-only period by doing an intial scp, then doing a flush + rsync
when you're ready to take it read only.

(https://issues.apache.org/jira/browse/CASSANDRA-579 will make this
problem obsolete for 0.7 but that doesn't help you on 0.6, of course.)

On Fri, May 7, 2010 at 2:08 PM, David Koblas <ko...@extra.com> wrote:
> I've got two (out of five) nodes on my cassandra ring that somehow got too
> full (e.g. over 60% disk space utilization).  I've now gotten a few new
> machines added to the ring, but evertime one of the overfull nodes attempts
> to stream its data it runs out of diskspace...  I've tried half a dozen
> different bad ideas of how to get things moving along a bit smoother, but am
> at a total loss at this point.
>
> Is there any good tricks to get cassandra to not need 2x the disk space to
> stream out, or is something else potentially going on that's causing me
> problems?
>
> Thanks,
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com