You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Marcel Steinbach <ma...@chors.de> on 2012/01/17 16:34:59 UTC

Unbalanced cluster with RandomPartitioner

Hi,

we're using RP and have each node assigned the same amount of the token space. The cluster looks like that:

Address         Status State   Load            Owns    Token                                       
                                                       205648943402372032879374446248852460236     
1       Up     Normal  310.83 GB       12.50%  56775407874461455114148055497453867724      
2       Up     Normal  470.24 GB       12.50%  78043055807020109080608968461939380940      
3       Up     Normal  271.57 GB       12.50%  99310703739578763047069881426424894156      
4       Up     Normal  282.61 GB       12.50%  120578351672137417013530794390910407372     
5       Up     Normal  248.76 GB       12.50%  141845999604696070979991707355395920588     
6       Up     Normal  164.12 GB       12.50%  163113647537254724946452620319881433804     
7       Up     Normal  76.23 GB        12.50%  184381295469813378912913533284366947020     
8       Up     Normal  19.79 GB        12.50%  205648943402372032879374446248852460236     

I was under the impression, the RP would distribute the load more evenly.
Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single node. Should we just move the nodes so that the load is more even distributed, or is there something off that needs to be fixed first?

Thanks
Marcel

Re: Unbalanced cluster with RandomPartitioner

Posted by aaron morton <aa...@thelastpickle.com>.

Setting a token outside of the partitioner range sounds like a bug. It's mostly an issue with the RP, but I guess a custom partitioner may also want to validate tokens are within a range.

Can you report it to https://issues.apache.org/jira/browse/CASSANDRA

Thanks


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/01/2012, at 5:58 AM, Marcel Steinbach wrote:

> I thought about our issue again and was thinking, maybe the describeOwnership should take into account, if a token is outside the partitioners maximum token range?
> 
> To recap our problem: we had tokens, that were apart by 12.5% of the token range 2**127, however, we had an offset on each token, which moved the cluster's token range above 2**127. That resulted in two nodes getting almost none or none primary replicas. 
> 
> Afaik, the partitioner itself describes the key ownership in the ring, but it didn't take into account that we left its maximum key range. 
> 
> Of course, it  is silly and not very likely that users make that mistake, however, we did it, and it took me quite some time to figure that out (maybe also because it wasn't me that setup the cluster). 
> 
> To carry it to the extreme, you could construct a cluster of  n nodes with all tokens greater than 2**127, the ownership description would show a ownership of 1/n each but all data would go to the node with the lowest token (given RP and RF=1).
> 
> I think it is wrong to calculate the ownership by subtracting the previous token from the current token and divide it by the maximum token without acknowledging we already might be "out of bounds". 
> 
> Cheers 
> Marcel
> 
> On 20.01.2012, at 16:28, Marcel Steinbach wrote:
> 
>> Thanks for all the responses!
>> 
>> I found our problem:
>> Using the Random Partitioner, the key range is from 0..2**127.When we added nodes, we generated the keys and out of convenience, we added an offset to the tokens because the move was easier like that.
>> 
>> However, we did not execute the modulo 2**127 for the last two tokens, so they were outside the RP's key range. 
>> moving the last two tokens to their mod 2**127 will resolve the problem.
>> 
>> Cheers,
>> Marcel
>> 
>> On 20.01.2012, at 10:32, Marcel Steinbach wrote:
>> 
>>> On 19.01.2012, at 20:15, Narendra Sharma wrote:
>>>> I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others.
>>> With three nodes, it was also imbalanced. 
>>> 
>>> What I don't understand is, why the md5 sums would generate such massive hot spots. 
>>> 
>>> Most of our keys look like that: 
>>> 00013270494972450001234567
>>> with the first 16 digits being a timestamp of one of our application server's startup times, and the last 10 digits being sequentially generated per user. 
>>> 
>>> There may be a lot of keys that start with e.g. "0001327049497245"  (or some other time stamp). But I was under the impression that md5 doesn't bother and generates uniform distribution?
>>> But then again, I know next to nothing about md5. Maybe someone else has a better insight to the algorithm?
>>> 
>>> However, we also use cfs with a date ("yyyymmdd") as key, as well as cfs with uuids as keys. And those cfs in itself are not balanced either. E.g. node 5 has 12 GB live space used in the cf the uuid as key, and node 8 only 428MB. 
>>> 
>>> Cheers,
>>> Marcel
>>> 
>>>> 
>>>> On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach <ma...@chors.de> wrote:
>>>> On 18.01.2012, at 02:19, Maki Watanabe wrote:
>>>>> Are there any significant difference of number of sstables on each nodes?
>>>> No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB)
>>>> 
>>>> On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
>>>>> Are you deleting data or using TTL's?  Expired/deleted data won't go away until the sstable holding it is compacted.  So if compaction has happened on some nodes, but not on others, you will see this.  The disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, but with our data using TTL's if I run major compactions a couple times on that column family it can shrink ~30%-40%.
>>>> Yes, we do delete data. But I agree, the disparity is too big to blame only the deletions. 
>>>> 
>>>> Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. After adding the node, we did
>>>> compactions and cleanups and didn't have a balanced cluster. So that should have removed outdated data, right?
>>>> 
>>>>> 2012/1/18 Marcel Steinbach <ma...@chors.de>:
>>>>>> We are running regular repairs, so I don't think that's the problem.
>>>>>> And the data dir sizes match approx. the load from the nodetool.
>>>>>> Thanks for the advise, though.
>>>>>> 
>>>>>> Our keys are digits only, and all contain a few zeros at the same
>>>>>> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
>>>>>> would generate 'hotspots' for those kind of keys, right?
>>>>>> 
>>>>>> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>>>>>> 
>>>>>> Have you tried running repair first on each node? Also, verify using
>>>>>> df -h on the data dirs
>>>>>> 
>>>>>> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>>>>>> <ma...@chors.de> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> 
>>>>>> we're using RP and have each node assigned the same amount of the token
>>>>>> space. The cluster looks like that:
>>>>>> 
>>>>>> 
>>>>>> Address         Status State   Load            Owns    Token
>>>>>> 
>>>>>> 
>>>>>> 205648943402372032879374446248852460236
>>>>>> 
>>>>>> 1       Up     Normal  310.83 GB       12.50%
>>>>>> 56775407874461455114148055497453867724
>>>>>> 
>>>>>> 2       Up     Normal  470.24 GB       12.50%
>>>>>> 78043055807020109080608968461939380940
>>>>>> 
>>>>>> 3       Up     Normal  271.57 GB       12.50%
>>>>>> 99310703739578763047069881426424894156
>>>>>> 
>>>>>> 4       Up     Normal  282.61 GB       12.50%
>>>>>> 120578351672137417013530794390910407372
>>>>>> 
>>>>>> 5       Up     Normal  248.76 GB       12.50%
>>>>>> 141845999604696070979991707355395920588
>>>>>> 
>>>>>> 6       Up     Normal  164.12 GB       12.50%
>>>>>> 163113647537254724946452620319881433804
>>>>>> 
>>>>>> 7       Up     Normal  76.23 GB        12.50%
>>>>>> 184381295469813378912913533284366947020
>>>>>> 
>>>>>> 8       Up     Normal  19.79 GB        12.50%
>>>>>> 205648943402372032879374446248852460236
>>>>>> 
>>>>>> 
>>>>>> I was under the impression, the RP would distribute the load more evenly.
>>>>>> 
>>>>>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
>>>>>> node. Should we just move the nodes so that the load is more even
>>>>>> distributed, or is there something off that needs to be fixed first?
>>>>>> 
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> Marcel
>>>>>> 
>>>>>> <hr style="border-color:blue">
>>>>>> 
>>>>>> <p>chors GmbH
>>>>>> 
>>>>>> <br><hr style="border-color:blue">
>>>>>> 
>>>>>> <p>specialists in digital and direct marketing solutions<br>
>>>>>> 
>>>>>> Haid-und-Neu-Straße 7<br>
>>>>>> 
>>>>>> 76131 Karlsruhe, Germany<br>
>>>>>> 
>>>>>> www.chors.com</p>
>>>>>> 
>>>>>> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht
>>>>>> Montabaur, HRB 15029</p>
>>>>>> 
>>>>>> <p style="font-size:9px">This e-mail is for the intended recipient only and
>>>>>> may contain confidential or privileged information. If you have received
>>>>>> this e-mail by mistake, please contact us immediately and completely delete
>>>>>> it (and any attachments) and do not forward it or inform any other person of
>>>>>> its contents. If you send us messages by e-mail, we take this as your
>>>>>> authorization to correspond with you by e-mail. E-mail transmission cannot
>>>>>> be guaranteed to be secure or error-free as information could be
>>>>>> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
>>>>>> or contain viruses. Neither chors GmbH nor the sender accept liability for
>>>>>> any errors or omissions in the content of this message which arise as a
>>>>>> result of its e-mail transmission. Please note that all e-mail
>>>>>> communications to and from chors GmbH may be monitored.</p>
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> w3m
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Narendra Sharma
>>>> Software Engineer
>>>> http://www.aeris.com
>>>> http://narendrasharma.blogspot.com/
>>>> 
>>>> 
>> 
>

Re: Unbalanced cluster with RandomPartitioner

Posted by Marcel Steinbach <ma...@chors.de>.

I thought about our issue again and was thinking, maybe the describeOwnership should take into account, if a token is outside the partitioners maximum token range?

To recap our problem: we had tokens, that were apart by 12.5% of the token range 2**127, however, we had an offset on each token, which moved the cluster's token range above 2**127. That resulted in two nodes getting almost none or none primary replicas. 

Afaik, the partitioner itself describes the key ownership in the ring, but it didn't take into account that we left its maximum key range. 

Of course, it  is silly and not very likely that users make that mistake, however, we did it, and it took me quite some time to figure that out (maybe also because it wasn't me that setup the cluster). 

To carry it to the extreme, you could construct a cluster of  n nodes with all tokens greater than 2**127, the ownership description would show a ownership of 1/n each but all data would go to the node with the lowest token (given RP and RF=1).

I think it is wrong to calculate the ownership by subtracting the previous token from the current token and divide it by the maximum token without acknowledging we already might be "out of bounds". 

Cheers 
Marcel

On 20.01.2012, at 16:28, Marcel Steinbach wrote:

> Thanks for all the responses!
> 
> I found our problem:
> Using the Random Partitioner, the key range is from 0..2**127.When we added nodes, we generated the keys and out of convenience, we added an offset to the tokens because the move was easier like that.
> 
> However, we did not execute the modulo 2**127 for the last two tokens, so they were outside the RP's key range. 
> moving the last two tokens to their mod 2**127 will resolve the problem.
> 
> Cheers,
> Marcel
> 
> On 20.01.2012, at 10:32, Marcel Steinbach wrote:
> 
>> On 19.01.2012, at 20:15, Narendra Sharma wrote:
>>> I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others.
>> With three nodes, it was also imbalanced. 
>> 
>> What I don't understand is, why the md5 sums would generate such massive hot spots. 
>> 
>> Most of our keys look like that: 
>> 00013270494972450001234567
>> with the first 16 digits being a timestamp of one of our application server's startup times, and the last 10 digits being sequentially generated per user. 
>> 
>> There may be a lot of keys that start with e.g. "0001327049497245"  (or some other time stamp). But I was under the impression that md5 doesn't bother and generates uniform distribution?
>> But then again, I know next to nothing about md5. Maybe someone else has a better insight to the algorithm?
>> 
>> However, we also use cfs with a date ("yyyymmdd") as key, as well as cfs with uuids as keys. And those cfs in itself are not balanced either. E.g. node 5 has 12 GB live space used in the cf the uuid as key, and node 8 only 428MB. 
>> 
>> Cheers,
>> Marcel
>> 
>>> 
>>> On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach <ma...@chors.de> wrote:
>>> On 18.01.2012, at 02:19, Maki Watanabe wrote:
>>>> Are there any significant difference of number of sstables on each nodes?
>>> No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB)
>>> 
>>> On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
>>>> Are you deleting data or using TTL's?  Expired/deleted data won't go away until the sstable holding it is compacted.  So if compaction has happened on some nodes, but not on others, you will see this.  The disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, but with our data using TTL's if I run major compactions a couple times on that column family it can shrink ~30%-40%.
>>> Yes, we do delete data. But I agree, the disparity is too big to blame only the deletions. 
>>> 
>>> Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. After adding the node, we did
>>> compactions and cleanups and didn't have a balanced cluster. So that should have removed outdated data, right?
>>> 
>>>> 2012/1/18 Marcel Steinbach <ma...@chors.de>:
>>>>> We are running regular repairs, so I don't think that's the problem.
>>>>> And the data dir sizes match approx. the load from the nodetool.
>>>>> Thanks for the advise, though.
>>>>> 
>>>>> Our keys are digits only, and all contain a few zeros at the same
>>>>> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
>>>>> would generate 'hotspots' for those kind of keys, right?
>>>>> 
>>>>> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>>>>> 
>>>>> Have you tried running repair first on each node? Also, verify using
>>>>> df -h on the data dirs
>>>>> 
>>>>> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>>>>> <ma...@chors.de> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> 
>>>>> we're using RP and have each node assigned the same amount of the token
>>>>> space. The cluster looks like that:
>>>>> 
>>>>> 
>>>>> Address         Status State   Load            Owns    Token
>>>>> 
>>>>> 
>>>>> 205648943402372032879374446248852460236
>>>>> 
>>>>> 1       Up     Normal  310.83 GB       12.50%
>>>>> 56775407874461455114148055497453867724
>>>>> 
>>>>> 2       Up     Normal  470.24 GB       12.50%
>>>>> 78043055807020109080608968461939380940
>>>>> 
>>>>> 3       Up     Normal  271.57 GB       12.50%
>>>>> 99310703739578763047069881426424894156
>>>>> 
>>>>> 4       Up     Normal  282.61 GB       12.50%
>>>>> 120578351672137417013530794390910407372
>>>>> 
>>>>> 5       Up     Normal  248.76 GB       12.50%
>>>>> 141845999604696070979991707355395920588
>>>>> 
>>>>> 6       Up     Normal  164.12 GB       12.50%
>>>>> 163113647537254724946452620319881433804
>>>>> 
>>>>> 7       Up     Normal  76.23 GB        12.50%
>>>>> 184381295469813378912913533284366947020
>>>>> 
>>>>> 8       Up     Normal  19.79 GB        12.50%
>>>>> 205648943402372032879374446248852460236
>>>>> 
>>>>> 
>>>>> I was under the impression, the RP would distribute the load more evenly.
>>>>> 
>>>>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
>>>>> node. Should we just move the nodes so that the load is more even
>>>>> distributed, or is there something off that needs to be fixed first?
>>>>> 
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> Marcel
>>>>> 
>>>>> <hr style="border-color:blue">
>>>>> 
>>>>> <p>chors GmbH
>>>>> 
>>>>> <br><hr style="border-color:blue">
>>>>> 
>>>>> <p>specialists in digital and direct marketing solutions<br>
>>>>> 
>>>>> Haid-und-Neu-Straße 7<br>
>>>>> 
>>>>> 76131 Karlsruhe, Germany<br>
>>>>> 
>>>>> www.chors.com</p>
>>>>> 
>>>>> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht
>>>>> Montabaur, HRB 15029</p>
>>>>> 
>>>>> <p style="font-size:9px">This e-mail is for the intended recipient only and
>>>>> may contain confidential or privileged information. If you have received
>>>>> this e-mail by mistake, please contact us immediately and completely delete
>>>>> it (and any attachments) and do not forward it or inform any other person of
>>>>> its contents. If you send us messages by e-mail, we take this as your
>>>>> authorization to correspond with you by e-mail. E-mail transmission cannot
>>>>> be guaranteed to be secure or error-free as information could be
>>>>> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
>>>>> or contain viruses. Neither chors GmbH nor the sender accept liability for
>>>>> any errors or omissions in the content of this message which arise as a
>>>>> result of its e-mail transmission. Please note that all e-mail
>>>>> communications to and from chors GmbH may be monitored.</p>
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> w3m
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Narendra Sharma
>>> Software Engineer
>>> http://www.aeris.com
>>> http://narendrasharma.blogspot.com/
>>> 
>>> 
>

Re: Unbalanced cluster with RandomPartitioner

Posted by Marcel Steinbach <ma...@chors.de>.

Thanks for all the responses!

I found our problem:
Using the Random Partitioner, the key range is from 0..2**127.When we added nodes, we generated the keys and out of convenience, we added an offset to the tokens because the move was easier like that.

However, we did not execute the modulo 2**127 for the last two tokens, so they were outside the RP's key range. 
moving the last two tokens to their mod 2**127 will resolve the problem.

Cheers,
Marcel

On 20.01.2012, at 10:32, Marcel Steinbach wrote:

> On 19.01.2012, at 20:15, Narendra Sharma wrote:
>> I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others.
> With three nodes, it was also imbalanced. 
> 
> What I don't understand is, why the md5 sums would generate such massive hot spots. 
> 
> Most of our keys look like that: 
> 00013270494972450001234567
> with the first 16 digits being a timestamp of one of our application server's startup times, and the last 10 digits being sequentially generated per user. 
> 
> There may be a lot of keys that start with e.g. "0001327049497245"  (or some other time stamp). But I was under the impression that md5 doesn't bother and generates uniform distribution?
> But then again, I know next to nothing about md5. Maybe someone else has a better insight to the algorithm?
> 
> However, we also use cfs with a date ("yyyymmdd") as key, as well as cfs with uuids as keys. And those cfs in itself are not balanced either. E.g. node 5 has 12 GB live space used in the cf the uuid as key, and node 8 only 428MB. 
> 
> Cheers,
> Marcel
> 
>> 
>> On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach <ma...@chors.de> wrote:
>> On 18.01.2012, at 02:19, Maki Watanabe wrote:
>>> Are there any significant difference of number of sstables on each nodes?
>> No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB)
>> 
>> On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
>>> Are you deleting data or using TTL's?  Expired/deleted data won't go away until the sstable holding it is compacted.  So if compaction has happened on some nodes, but not on others, you will see this.  The disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, but with our data using TTL's if I run major compactions a couple times on that column family it can shrink ~30%-40%.
>> Yes, we do delete data. But I agree, the disparity is too big to blame only the deletions. 
>> 
>> Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. After adding the node, we did
>> compactions and cleanups and didn't have a balanced cluster. So that should have removed outdated data, right?
>> 
>>> 2012/1/18 Marcel Steinbach <ma...@chors.de>:
>>>> We are running regular repairs, so I don't think that's the problem.
>>>> And the data dir sizes match approx. the load from the nodetool.
>>>> Thanks for the advise, though.
>>>> 
>>>> Our keys are digits only, and all contain a few zeros at the same
>>>> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
>>>> would generate 'hotspots' for those kind of keys, right?
>>>> 
>>>> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>>>> 
>>>> Have you tried running repair first on each node? Also, verify using
>>>> df -h on the data dirs
>>>> 
>>>> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>>>> <ma...@chors.de> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> 
>>>> we're using RP and have each node assigned the same amount of the token
>>>> space. The cluster looks like that:
>>>> 
>>>> 
>>>> Address         Status State   Load            Owns    Token
>>>> 
>>>> 
>>>> 205648943402372032879374446248852460236
>>>> 
>>>> 1       Up     Normal  310.83 GB       12.50%
>>>> 56775407874461455114148055497453867724
>>>> 
>>>> 2       Up     Normal  470.24 GB       12.50%
>>>> 78043055807020109080608968461939380940
>>>> 
>>>> 3       Up     Normal  271.57 GB       12.50%
>>>> 99310703739578763047069881426424894156
>>>> 
>>>> 4       Up     Normal  282.61 GB       12.50%
>>>> 120578351672137417013530794390910407372
>>>> 
>>>> 5       Up     Normal  248.76 GB       12.50%
>>>> 141845999604696070979991707355395920588
>>>> 
>>>> 6       Up     Normal  164.12 GB       12.50%
>>>> 163113647537254724946452620319881433804
>>>> 
>>>> 7       Up     Normal  76.23 GB        12.50%
>>>> 184381295469813378912913533284366947020
>>>> 
>>>> 8       Up     Normal  19.79 GB        12.50%
>>>> 205648943402372032879374446248852460236
>>>> 
>>>> 
>>>> I was under the impression, the RP would distribute the load more evenly.
>>>> 
>>>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
>>>> node. Should we just move the nodes so that the load is more even
>>>> distributed, or is there something off that needs to be fixed first?
>>>> 
>>>> 
>>>> Thanks
>>>> 
>>>> Marcel
>>>> 
>>>> <hr style="border-color:blue">
>>>> 
>>>> <p>chors GmbH
>>>> 
>>>> <br><hr style="border-color:blue">
>>>> 
>>>> <p>specialists in digital and direct marketing solutions<br>
>>>> 
>>>> Haid-und-Neu-Straße 7<br>
>>>> 
>>>> 76131 Karlsruhe, Germany<br>
>>>> 
>>>> www.chors.com</p>
>>>> 
>>>> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht
>>>> Montabaur, HRB 15029</p>
>>>> 
>>>> <p style="font-size:9px">This e-mail is for the intended recipient only and
>>>> may contain confidential or privileged information. If you have received
>>>> this e-mail by mistake, please contact us immediately and completely delete
>>>> it (and any attachments) and do not forward it or inform any other person of
>>>> its contents. If you send us messages by e-mail, we take this as your
>>>> authorization to correspond with you by e-mail. E-mail transmission cannot
>>>> be guaranteed to be secure or error-free as information could be
>>>> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
>>>> or contain viruses. Neither chors GmbH nor the sender accept liability for
>>>> any errors or omissions in the content of this message which arise as a
>>>> result of its e-mail transmission. Please note that all e-mail
>>>> communications to and from chors GmbH may be monitored.</p>
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> w3m
>> 
>> 
>> 
>> 
>> -- 
>> Narendra Sharma
>> Software Engineer
>> http://www.aeris.com
>> http://narendrasharma.blogspot.com/
>> 
>>

Re: Unbalanced cluster with RandomPartitioner

Posted by Marcel Steinbach <ma...@chors.de>.

On 19.01.2012, at 20:15, Narendra Sharma wrote:
> I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others.
With three nodes, it was also imbalanced. 

What I don't understand is, why the md5 sums would generate such massive hot spots. 

Most of our keys look like that: 
00013270494972450001234567
with the first 16 digits being a timestamp of one of our application server's startup times, and the last 10 digits being sequentially generated per user. 

There may be a lot of keys that start with e.g. "0001327049497245"  (or some other time stamp). But I was under the impression that md5 doesn't bother and generates uniform distribution?
But then again, I know next to nothing about md5. Maybe someone else has a better insight to the algorithm?

However, we also use cfs with a date ("yyyymmdd") as key, as well as cfs with uuids as keys. And those cfs in itself are not balanced either. E.g. node 5 has 12 GB live space used in the cf the uuid as key, and node 8 only 428MB. 

Cheers,
Marcel

> 
> On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach <ma...@chors.de> wrote:
> On 18.01.2012, at 02:19, Maki Watanabe wrote:
>> Are there any significant difference of number of sstables on each nodes?
> No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB)
> 
> On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
>> Are you deleting data or using TTL's?  Expired/deleted data won't go away until the sstable holding it is compacted.  So if compaction has happened on some nodes, but not on others, you will see this.  The disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, but with our data using TTL's if I run major compactions a couple times on that column family it can shrink ~30%-40%.
> Yes, we do delete data. But I agree, the disparity is too big to blame only the deletions. 
> 
> Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. After adding the node, we did
> compactions and cleanups and didn't have a balanced cluster. So that should have removed outdated data, right?
> 
>> 2012/1/18 Marcel Steinbach <ma...@chors.de>:
>>> We are running regular repairs, so I don't think that's the problem.
>>> And the data dir sizes match approx. the load from the nodetool.
>>> Thanks for the advise, though.
>>> 
>>> Our keys are digits only, and all contain a few zeros at the same
>>> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
>>> would generate 'hotspots' for those kind of keys, right?
>>> 
>>> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>>> 
>>> Have you tried running repair first on each node? Also, verify using
>>> df -h on the data dirs
>>> 
>>> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>>> <ma...@chors.de> wrote:
>>> 
>>> Hi,
>>> 
>>> 
>>> we're using RP and have each node assigned the same amount of the token
>>> space. The cluster looks like that:
>>> 
>>> 
>>> Address         Status State   Load            Owns    Token
>>> 
>>> 
>>> 205648943402372032879374446248852460236
>>> 
>>> 1       Up     Normal  310.83 GB       12.50%
>>>  56775407874461455114148055497453867724
>>> 
>>> 2       Up     Normal  470.24 GB       12.50%
>>>  78043055807020109080608968461939380940
>>> 
>>> 3       Up     Normal  271.57 GB       12.50%
>>>  99310703739578763047069881426424894156
>>> 
>>> 4       Up     Normal  282.61 GB       12.50%
>>>  120578351672137417013530794390910407372
>>> 
>>> 5       Up     Normal  248.76 GB       12.50%
>>>  141845999604696070979991707355395920588
>>> 
>>> 6       Up     Normal  164.12 GB       12.50%
>>>  163113647537254724946452620319881433804
>>> 
>>> 7       Up     Normal  76.23 GB        12.50%
>>>  184381295469813378912913533284366947020
>>> 
>>> 8       Up     Normal  19.79 GB        12.50%
>>>  205648943402372032879374446248852460236
>>> 
>>> 
>>> I was under the impression, the RP would distribute the load more evenly.
>>> 
>>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
>>> node. Should we just move the nodes so that the load is more even
>>> distributed, or is there something off that needs to be fixed first?
>>> 
>>> 
>>> Thanks
>>> 
>>> Marcel
>>> 
>>> <hr style="border-color:blue">
>>> 
>>> <p>chors GmbH
>>> 
>>> <br><hr style="border-color:blue">
>>> 
>>> <p>specialists in digital and direct marketing solutions<br>
>>> 
>>> Haid-und-Neu-Straße 7<br>
>>> 
>>> 76131 Karlsruhe, Germany<br>
>>> 
>>> www.chors.com</p>
>>> 
>>> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht
>>> Montabaur, HRB 15029</p>
>>> 
>>> <p style="font-size:9px">This e-mail is for the intended recipient only and
>>> may contain confidential or privileged information. If you have received
>>> this e-mail by mistake, please contact us immediately and completely delete
>>> it (and any attachments) and do not forward it or inform any other person of
>>> its contents. If you send us messages by e-mail, we take this as your
>>> authorization to correspond with you by e-mail. E-mail transmission cannot
>>> be guaranteed to be secure or error-free as information could be
>>> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
>>> or contain viruses. Neither chors GmbH nor the sender accept liability for
>>> any errors or omissions in the content of this message which arise as a
>>> result of its e-mail transmission. Please note that all e-mail
>>> communications to and from chors GmbH may be monitored.</p>
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> w3m
> 
> 
> 
> 
> -- 
> Narendra Sharma
> Software Engineer
> http://www.aeris.com
> http://narendrasharma.blogspot.com/
> 
>

Re: Unbalanced cluster with RandomPartitioner

Posted by Narendra Sharma <na...@gmail.com>.

I believe you need to move the nodes on the ring. What was the load on the
nodes before you added 5 new nodes? Its just that you are getting data in
certain token range more than others.

-Naren

On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach <marcel.steinbach@chors.de
> wrote:

> On 18.01.2012, at 02:19, Maki Watanabe wrote:
>
> Are there any significant difference of number of sstables on each nodes?
>
> No, no significant difference there. Actually, node 8 is among those with
> more sstables but with the least load (20GB)
>
> On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
>
> Are you deleting data or using TTL's?  Expired/deleted data won't go away
> until the sstable holding it is compacted.  So if compaction has happened
> on some nodes, but not on others, you will see this.  The disparity is
> pretty big 400Gb to 20GB, so this probably isn't the issue, but with our
> data using TTL's if I run major compactions a couple times on that column
> family it can shrink ~30%-40%.
>
> Yes, we do delete data. But I agree, the disparity is too big to blame
> only the deletions.
>
> Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks
> ago. After adding the node, we did
> compactions and cleanups and didn't have a balanced cluster. So that
> should have removed outdated data, right?
>
> 2012/1/18 Marcel Steinbach <ma...@chors.de>:
>
> We are running regular repairs, so I don't think that's the problem.
>
> And the data dir sizes match approx. the load from the nodetool.
>
> Thanks for the advise, though.
>
>
> Our keys are digits only, and all contain a few zeros at the same
>
> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
>
> would generate 'hotspots' for those kind of keys, right?
>
>
> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>
>
> Have you tried running repair first on each node? Also, verify using
>
> df -h on the data dirs
>
>
> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>
> <ma...@chors.de> wrote:
>
>
> Hi,
>
>
>
> we're using RP and have each node assigned the same amount of the token
>
> space. The cluster looks like that:
>
>
>
> Address         Status State   Load            Owns    Token
>
>
>
> 205648943402372032879374446248852460236
>
>
> 1       Up     Normal  310.83 GB       12.50%
>
>  56775407874461455114148055497453867724
>
>
> 2       Up     Normal  470.24 GB       12.50%
>
>  78043055807020109080608968461939380940
>
>
> 3       Up     Normal  271.57 GB       12.50%
>
>  99310703739578763047069881426424894156
>
>
> 4       Up     Normal  282.61 GB       12.50%
>
>  120578351672137417013530794390910407372
>
>
> 5       Up     Normal  248.76 GB       12.50%
>
>  141845999604696070979991707355395920588
>
>
> 6       Up     Normal  164.12 GB       12.50%
>
>  163113647537254724946452620319881433804
>
>
> 7       Up     Normal  76.23 GB        12.50%
>
>  184381295469813378912913533284366947020
>
>
> 8       Up     Normal  19.79 GB        12.50%
>
>  205648943402372032879374446248852460236
>
>
>
> I was under the impression, the RP would distribute the load more evenly.
>
>
> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
>
> node. Should we just move the nodes so that the load is more even
>
> distributed, or is there something off that needs to be fixed first?
>
>
>
> Thanks
>
>
> Marcel
>
>
> <hr style="border-color:blue">
>
>
> <p>chors GmbH
>
>
> <br><hr style="border-color:blue">
>
>
> <p>specialists in digital and direct marketing solutions<br>
>
>
> Haid-und-Neu-Straße 7<br>
>
>
> 76131 Karlsruhe, Germany<br>
>
>
> www.chors.com</p>
>
>
> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht
>
> Montabaur, HRB 15029</p>
>
>
> <p style="font-size:9px">This e-mail is for the intended recipient only and
>
> may contain confidential or privileged information. If you have received
>
> this e-mail by mistake, please contact us immediately and completely delete
>
> it (and any attachments) and do not forward it or inform any other person
> of
>
> its contents. If you send us messages by e-mail, we take this as your
>
> authorization to correspond with you by e-mail. E-mail transmission cannot
>
> be guaranteed to be secure or error-free as information could be
>
> intercepted, amended, corrupted, lost, destroyed, arrive late or
> incomplete,
>
> or contain viruses. Neither chors GmbH nor the sender accept liability for
>
> any errors or omissions in the content of this message which arise as a
>
> result of its e-mail transmission. Please note that all e-mail
>
> communications to and from chors GmbH may be monitored.</p>
>
>
>
>
>
>
> --
> w3m
>
>
>


-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com <http://www.persistentsys.com>*
*http://narendrasharma.blogspot.com/*

Re: Unbalanced cluster with RandomPartitioner

Posted by Marcel Steinbach <ma...@chors.de>.

On 18.01.2012, at 02:19, Maki Watanabe wrote:
> Are there any significant difference of number of sstables on each nodes?
No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB)

On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
> Are you deleting data or using TTL's?  Expired/deleted data won't go away until the sstable holding it is compacted.  So if compaction has happened on some nodes, but not on others, you will see this.  The disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, but with our data using TTL's if I run major compactions a couple times on that column family it can shrink ~30%-40%.
Yes, we do delete data. But I agree, the disparity is too big to blame only the deletions. 

Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. After adding the node, we did
compactions and cleanups and didn't have a balanced cluster. So that should have removed outdated data, right?

> 2012/1/18 Marcel Steinbach <ma...@chors.de>:
>> We are running regular repairs, so I don't think that's the problem.
>> And the data dir sizes match approx. the load from the nodetool.
>> Thanks for the advise, though.
>> 
>> Our keys are digits only, and all contain a few zeros at the same
>> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
>> would generate 'hotspots' for those kind of keys, right?
>> 
>> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>> 
>> Have you tried running repair first on each node? Also, verify using
>> df -h on the data dirs
>> 
>> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>> <ma...@chors.de> wrote:
>> 
>> Hi,
>> 
>> 
>> we're using RP and have each node assigned the same amount of the token
>> space. The cluster looks like that:
>> 
>> 
>> Address         Status State   Load            Owns    Token
>> 
>> 
>> 205648943402372032879374446248852460236
>> 
>> 1       Up     Normal  310.83 GB       12.50%
>>  56775407874461455114148055497453867724
>> 
>> 2       Up     Normal  470.24 GB       12.50%
>>  78043055807020109080608968461939380940
>> 
>> 3       Up     Normal  271.57 GB       12.50%
>>  99310703739578763047069881426424894156
>> 
>> 4       Up     Normal  282.61 GB       12.50%
>>  120578351672137417013530794390910407372
>> 
>> 5       Up     Normal  248.76 GB       12.50%
>>  141845999604696070979991707355395920588
>> 
>> 6       Up     Normal  164.12 GB       12.50%
>>  163113647537254724946452620319881433804
>> 
>> 7       Up     Normal  76.23 GB        12.50%
>>  184381295469813378912913533284366947020
>> 
>> 8       Up     Normal  19.79 GB        12.50%
>>  205648943402372032879374446248852460236
>> 
>> 
>> I was under the impression, the RP would distribute the load more evenly.
>> 
>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
>> node. Should we just move the nodes so that the load is more even
>> distributed, or is there something off that needs to be fixed first?
>> 
>> 
>> Thanks
>> 
>> Marcel
>> 
>> <hr style="border-color:blue">
>> 
>> <p>chors GmbH
>> 
>> <br><hr style="border-color:blue">
>> 
>> <p>specialists in digital and direct marketing solutions<br>
>> 
>> Haid-und-Neu-Straße 7<br>
>> 
>> 76131 Karlsruhe, Germany<br>
>> 
>> www.chors.com</p>
>> 
>> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht
>> Montabaur, HRB 15029</p>
>> 
>> <p style="font-size:9px">This e-mail is for the intended recipient only and
>> may contain confidential or privileged information. If you have received
>> this e-mail by mistake, please contact us immediately and completely delete
>> it (and any attachments) and do not forward it or inform any other person of
>> its contents. If you send us messages by e-mail, we take this as your
>> authorization to correspond with you by e-mail. E-mail transmission cannot
>> be guaranteed to be secure or error-free as information could be
>> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
>> or contain viruses. Neither chors GmbH nor the sender accept liability for
>> any errors or omissions in the content of this message which arise as a
>> result of its e-mail transmission. Please note that all e-mail
>> communications to and from chors GmbH may be monitored.</p>
>> 
>> 
> 
> 
> 
> -- 
> w3m

Re: Unbalanced cluster with RandomPartitioner

Posted by aaron morton <aa...@thelastpickle.com>.

Load reported from node tool ring is the live load, which means SSTables that the server has open and will read from during a request. This will include tombstones, expired and over written data. 

nodetool ctstats also includes "dead" load, which is sstables that are in use but still on disk. 


Cheers
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/01/2012, at 12:50 AM, Marcel Steinbach wrote:

> 2012/1/19 aaron morton <aa...@thelastpickle.com>:
>> If you have performed any token moves the data will not be deleted until you
>> run nodetool cleanup.
> We did that after adding nodes to the cluster. And then, the cluster
> wasn't balanced either.
> Also, does the "Load" really account for "dead" data, or is it just live data?
> 
>> To get a baseline I would run nodetool compact to do major compaction and
>> purge any tomb stones as others have said.
> We will do that, but I doubt we have 450GB tomb stones on node 2...
> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 18/01/2012, at 2:19 PM, Maki Watanabe wrote:
>> 
>> Are there any significant difference of number of sstables on each nodes?
>> 
>> 2012/1/18 Marcel Steinbach <ma...@chors.de>:
>> 
>> We are running regular repairs, so I don't think that's the problem.
>> 
>> And the data dir sizes match approx. the load from the nodetool.
>> 
>> Thanks for the advise, though.
>> 
>> 
>> Our keys are digits only, and all contain a few zeros at the same
>> 
>> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
>> 
>> would generate 'hotspots' for those kind of keys, right?
>> 
>> 
>> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>> 
>> 
>> Have you tried running repair first on each node? Also, verify using
>> 
>> df -h on the data dirs
>> 
>> 
>> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>> 
>> <ma...@chors.de> wrote:
>> 
>> 
>> Hi,
>> 
>> 
>> 
>> we're using RP and have each node assigned the same amount of the token
>> 
>> space. The cluster looks like that:
>> 
>> 
>> 
>> Address         Status State   Load            Owns    Token
>> 
>> 
>> 
>> 205648943402372032879374446248852460236
>> 
>> 
>> 1       Up     Normal  310.83 GB       12.50%
>> 
>>  56775407874461455114148055497453867724
>> 
>> 
>> 2       Up     Normal  470.24 GB       12.50%
>> 
>>  78043055807020109080608968461939380940
>> 
>> 
>> 3       Up     Normal  271.57 GB       12.50%
>> 
>>  99310703739578763047069881426424894156
>> 
>> 
>> 4       Up     Normal  282.61 GB       12.50%
>> 
>>  120578351672137417013530794390910407372
>> 
>> 
>> 5       Up     Normal  248.76 GB       12.50%
>> 
>>  141845999604696070979991707355395920588
>> 
>> 
>> 6       Up     Normal  164.12 GB       12.50%
>> 
>>  163113647537254724946452620319881433804
>> 
>> 
>> 7       Up     Normal  76.23 GB        12.50%
>> 
>>  184381295469813378912913533284366947020
>> 
>> 
>> 8       Up     Normal  19.79 GB        12.50%
>> 
>>  205648943402372032879374446248852460236
>> 
>> 
>> 
>> I was under the impression, the RP would distribute the load more evenly.
>> 
>> 
>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
>> 
>> node. Should we just move the nodes so that the load is more even
>> 
>> distributed, or is there something off that needs to be fixed first?
>> 
>> 
>> 
>> Thanks
>> 
>> 
>> Marcel
>> 
>> 
>> <hr style="border-color:blue">
>> 
>> 
>> <p>chors GmbH
>> 
>> 
>> <br><hr style="border-color:blue">
>> 
>> 
>> <p>specialists in digital and direct marketing solutions<br>
>> 
>> 
>> Haid-und-Neu-Straße 7<br>
>> 
>> 
>> 76131 Karlsruhe, Germany<br>
>> 
>> 
>> www.chors.com</p>
>> 
>> 
>> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht
>> 
>> Montabaur, HRB 15029</p>
>> 
>> 
>> <p style="font-size:9px">This e-mail is for the intended recipient only and
>> 
>> may contain confidential or privileged information. If you have received
>> 
>> this e-mail by mistake, please contact us immediately and completely delete
>> 
>> it (and any attachments) and do not forward it or inform any other person of
>> 
>> its contents. If you send us messages by e-mail, we take this as your
>> 
>> authorization to correspond with you by e-mail. E-mail transmission cannot
>> 
>> be guaranteed to be secure or error-free as information could be
>> 
>> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
>> 
>> or contain viruses. Neither chors GmbH nor the sender accept liability for
>> 
>> any errors or omissions in the content of this message which arise as a
>> 
>> result of its e-mail transmission. Please note that all e-mail
>> 
>> communications to and from chors GmbH may be monitored.</p>
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> w3m
>> 
>>

Re: Unbalanced cluster with RandomPartitioner

Posted by Marcel Steinbach <ms...@gmail.com>.

2012/1/19 aaron morton <aa...@thelastpickle.com>:
> If you have performed any token moves the data will not be deleted until you
> run nodetool cleanup.
We did that after adding nodes to the cluster. And then, the cluster
wasn't balanced either.
Also, does the "Load" really account for "dead" data, or is it just live data?

> To get a baseline I would run nodetool compact to do major compaction and
> purge any tomb stones as others have said.
We will do that, but I doubt we have 450GB tomb stones on node 2...

> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/01/2012, at 2:19 PM, Maki Watanabe wrote:
>
> Are there any significant difference of number of sstables on each nodes?
>
> 2012/1/18 Marcel Steinbach <ma...@chors.de>:
>
> We are running regular repairs, so I don't think that's the problem.
>
> And the data dir sizes match approx. the load from the nodetool.
>
> Thanks for the advise, though.
>
>
> Our keys are digits only, and all contain a few zeros at the same
>
> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
>
> would generate 'hotspots' for those kind of keys, right?
>
>
> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>
>
> Have you tried running repair first on each node? Also, verify using
>
> df -h on the data dirs
>
>
> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>
> <ma...@chors.de> wrote:
>
>
> Hi,
>
>
>
> we're using RP and have each node assigned the same amount of the token
>
> space. The cluster looks like that:
>
>
>
> Address         Status State   Load            Owns    Token
>
>
>
> 205648943402372032879374446248852460236
>
>
> 1       Up     Normal  310.83 GB       12.50%
>
>  56775407874461455114148055497453867724
>
>
> 2       Up     Normal  470.24 GB       12.50%
>
>  78043055807020109080608968461939380940
>
>
> 3       Up     Normal  271.57 GB       12.50%
>
>  99310703739578763047069881426424894156
>
>
> 4       Up     Normal  282.61 GB       12.50%
>
>  120578351672137417013530794390910407372
>
>
> 5       Up     Normal  248.76 GB       12.50%
>
>  141845999604696070979991707355395920588
>
>
> 6       Up     Normal  164.12 GB       12.50%
>
>  163113647537254724946452620319881433804
>
>
> 7       Up     Normal  76.23 GB        12.50%
>
>  184381295469813378912913533284366947020
>
>
> 8       Up     Normal  19.79 GB        12.50%
>
>  205648943402372032879374446248852460236
>
>
>
> I was under the impression, the RP would distribute the load more evenly.
>
>
> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
>
> node. Should we just move the nodes so that the load is more even
>
> distributed, or is there something off that needs to be fixed first?
>
>
>
> Thanks
>
>
> Marcel
>
>
> <hr style="border-color:blue">
>
>
> <p>chors GmbH
>
>
> <br><hr style="border-color:blue">
>
>
> <p>specialists in digital and direct marketing solutions<br>
>
>
> Haid-und-Neu-Straße 7<br>
>
>
> 76131 Karlsruhe, Germany<br>
>
>
> www.chors.com</p>
>
>
> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht
>
> Montabaur, HRB 15029</p>
>
>
> <p style="font-size:9px">This e-mail is for the intended recipient only and
>
> may contain confidential or privileged information. If you have received
>
> this e-mail by mistake, please contact us immediately and completely delete
>
> it (and any attachments) and do not forward it or inform any other person of
>
> its contents. If you send us messages by e-mail, we take this as your
>
> authorization to correspond with you by e-mail. E-mail transmission cannot
>
> be guaranteed to be secure or error-free as information could be
>
> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
>
> or contain viruses. Neither chors GmbH nor the sender accept liability for
>
> any errors or omissions in the content of this message which arise as a
>
> result of its e-mail transmission. Please note that all e-mail
>
> communications to and from chors GmbH may be monitored.</p>
>
>
>
>
>
>
> --
> w3m
>
>

Re: Unbalanced cluster with RandomPartitioner

Posted by aaron morton <aa...@thelastpickle.com>.

If you have performed any token moves the data will not be deleted until you run nodetool cleanup. 

To get a baseline I would run nodetool compact to do major compaction and purge any tomb stones as others have said. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/01/2012, at 2:19 PM, Maki Watanabe wrote:

> Are there any significant difference of number of sstables on each nodes?
> 
> 2012/1/18 Marcel Steinbach <ma...@chors.de>:
>> We are running regular repairs, so I don't think that's the problem.
>> And the data dir sizes match approx. the load from the nodetool.
>> Thanks for the advise, though.
>> 
>> Our keys are digits only, and all contain a few zeros at the same
>> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
>> would generate 'hotspots' for those kind of keys, right?
>> 
>> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>> 
>> Have you tried running repair first on each node? Also, verify using
>> df -h on the data dirs
>> 
>> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>> <ma...@chors.de> wrote:
>> 
>> Hi,
>> 
>> 
>> we're using RP and have each node assigned the same amount of the token
>> space. The cluster looks like that:
>> 
>> 
>> Address         Status State   Load            Owns    Token
>> 
>> 
>> 205648943402372032879374446248852460236
>> 
>> 1       Up     Normal  310.83 GB       12.50%
>>  56775407874461455114148055497453867724
>> 
>> 2       Up     Normal  470.24 GB       12.50%
>>  78043055807020109080608968461939380940
>> 
>> 3       Up     Normal  271.57 GB       12.50%
>>  99310703739578763047069881426424894156
>> 
>> 4       Up     Normal  282.61 GB       12.50%
>>  120578351672137417013530794390910407372
>> 
>> 5       Up     Normal  248.76 GB       12.50%
>>  141845999604696070979991707355395920588
>> 
>> 6       Up     Normal  164.12 GB       12.50%
>>  163113647537254724946452620319881433804
>> 
>> 7       Up     Normal  76.23 GB        12.50%
>>  184381295469813378912913533284366947020
>> 
>> 8       Up     Normal  19.79 GB        12.50%
>>  205648943402372032879374446248852460236
>> 
>> 
>> I was under the impression, the RP would distribute the load more evenly.
>> 
>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
>> node. Should we just move the nodes so that the load is more even
>> distributed, or is there something off that needs to be fixed first?
>> 
>> 
>> Thanks
>> 
>> Marcel
>> 
>> <hr style="border-color:blue">
>> 
>> <p>chors GmbH
>> 
>> <br><hr style="border-color:blue">
>> 
>> <p>specialists in digital and direct marketing solutions<br>
>> 
>> Haid-und-Neu-Straße 7<br>
>> 
>> 76131 Karlsruhe, Germany<br>
>> 
>> www.chors.com</p>
>> 
>> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht
>> Montabaur, HRB 15029</p>
>> 
>> <p style="font-size:9px">This e-mail is for the intended recipient only and
>> may contain confidential or privileged information. If you have received
>> this e-mail by mistake, please contact us immediately and completely delete
>> it (and any attachments) and do not forward it or inform any other person of
>> its contents. If you send us messages by e-mail, we take this as your
>> authorization to correspond with you by e-mail. E-mail transmission cannot
>> be guaranteed to be secure or error-free as information could be
>> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
>> or contain viruses. Neither chors GmbH nor the sender accept liability for
>> any errors or omissions in the content of this message which arise as a
>> result of its e-mail transmission. Please note that all e-mail
>> communications to and from chors GmbH may be monitored.</p>
>> 
>> 
> 
> 
> 
> -- 
> w3m

Re: Unbalanced cluster with RandomPartitioner

Posted by Maki Watanabe <wa...@gmail.com>.

Are there any significant difference of number of sstables on each nodes?

2012/1/18 Marcel Steinbach <ma...@chors.de>:
> We are running regular repairs, so I don't think that's the problem.
> And the data dir sizes match approx. the load from the nodetool.
> Thanks for the advise, though.
>
> Our keys are digits only, and all contain a few zeros at the same
> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
> would generate 'hotspots' for those kind of keys, right?
>
> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>
> Have you tried running repair first on each node? Also, verify using
> df -h on the data dirs
>
> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
> <ma...@chors.de> wrote:
>
> Hi,
>
>
> we're using RP and have each node assigned the same amount of the token
> space. The cluster looks like that:
>
>
> Address         Status State   Load            Owns    Token
>
>
> 205648943402372032879374446248852460236
>
> 1       Up     Normal  310.83 GB       12.50%
>  56775407874461455114148055497453867724
>
> 2       Up     Normal  470.24 GB       12.50%
>  78043055807020109080608968461939380940
>
> 3       Up     Normal  271.57 GB       12.50%
>  99310703739578763047069881426424894156
>
> 4       Up     Normal  282.61 GB       12.50%
>  120578351672137417013530794390910407372
>
> 5       Up     Normal  248.76 GB       12.50%
>  141845999604696070979991707355395920588
>
> 6       Up     Normal  164.12 GB       12.50%
>  163113647537254724946452620319881433804
>
> 7       Up     Normal  76.23 GB        12.50%
>  184381295469813378912913533284366947020
>
> 8       Up     Normal  19.79 GB        12.50%
>  205648943402372032879374446248852460236
>
>
> I was under the impression, the RP would distribute the load more evenly.
>
> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
> node. Should we just move the nodes so that the load is more even
> distributed, or is there something off that needs to be fixed first?
>
>
> Thanks
>
> Marcel
>
> <hr style="border-color:blue">
>
> <p>chors GmbH
>
> <br><hr style="border-color:blue">
>
> <p>specialists in digital and direct marketing solutions<br>
>
> Haid-und-Neu-Straße 7<br>
>
> 76131 Karlsruhe, Germany<br>
>
> www.chors.com</p>
>
> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht
> Montabaur, HRB 15029</p>
>
> <p style="font-size:9px">This e-mail is for the intended recipient only and
> may contain confidential or privileged information. If you have received
> this e-mail by mistake, please contact us immediately and completely delete
> it (and any attachments) and do not forward it or inform any other person of
> its contents. If you send us messages by e-mail, we take this as your
> authorization to correspond with you by e-mail. E-mail transmission cannot
> be guaranteed to be secure or error-free as information could be
> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
> or contain viruses. Neither chors GmbH nor the sender accept liability for
> any errors or omissions in the content of this message which arise as a
> result of its e-mail transmission. Please note that all e-mail
> communications to and from chors GmbH may be monitored.</p>
>
>



-- 
w3m

Re: Unbalanced cluster with RandomPartitioner

Posted by Jeremiah Jordan <je...@morningstar.com>.

Are you deleting data or using TTL's?  Expired/deleted data won't go 
away until the sstable holding it is compacted.  So if compaction has 
happened on some nodes, but not on others, you will see this.  The 
disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, 
but with our data using TTL's if I run major compactions a couple times 
on that column family it can shrink ~30%-40%.

-Jeremiah

On 01/17/2012 12:51 PM, Marcel Steinbach wrote:
> We are running regular repairs, so I don't think that's the problem.
> And the data dir sizes match approx. the load from the nodetool.
> Thanks for the advise, though.
>
> Our keys are digits only, and all contain a few zeros at the same 
> offsets. I'm not that familiar with the md5 algorithm, but I doubt 
> that it would generate 'hotspots' for those kind of keys, right?
>
> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>
>> Have you tried running repair first on each node? Also, verify using
>> df -h on the data dirs
>>
>> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>> <marcel.steinbach@chors.de <ma...@chors.de>> wrote:
>>> Hi,
>>>
>>> we're using RP and have each node assigned the same amount of the 
>>> token space. The cluster looks like that:
>>>
>>> Address         Status State   Load            Owns    Token
>>>                                                       
>>> 205648943402372032879374446248852460236
>>> 1       Up     Normal  310.83 GB       12.50% 
>>>  56775407874461455114148055497453867724
>>> 2       Up     Normal  470.24 GB       12.50% 
>>>  78043055807020109080608968461939380940
>>> 3       Up     Normal  271.57 GB       12.50% 
>>>  99310703739578763047069881426424894156
>>> 4       Up     Normal  282.61 GB       12.50% 
>>>  120578351672137417013530794390910407372
>>> 5       Up     Normal  248.76 GB       12.50% 
>>>  141845999604696070979991707355395920588
>>> 6       Up     Normal  164.12 GB       12.50% 
>>>  163113647537254724946452620319881433804
>>> 7       Up     Normal  76.23 GB        12.50% 
>>>  184381295469813378912913533284366947020
>>> 8       Up     Normal  19.79 GB        12.50% 
>>>  205648943402372032879374446248852460236
>>>
>>> I was under the impression, the RP would distribute the load more 
>>> evenly.
>>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a 
>>> single node. Should we just move the nodes so that the load is more 
>>> even distributed, or is there something off that needs to be fixed 
>>> first?
>>>
>>> Thanks
>>> Marcel
>>> <hr style="border-color:blue">
>>> <p>chors GmbH
>>> <br><hr style="border-color:blue">
>>> <p>specialists in digital and direct marketing solutions<br>
>>> Haid-und-Neu-Straße 7<br>
>>> 76131 Karlsruhe, Germany<br>
>>> www.chors.com</p>
>>> <p>Managing Directors: Dr. Volker Hatz, Markus 
>>> Plattner<br>Amtsgericht Montabaur, HRB 15029</p>
>>> <p style="font-size:9px">This e-mail is for the intended recipient 
>>> only and may contain confidential or privileged information. If you 
>>> have received this e-mail by mistake, please contact us immediately 
>>> and completely delete it (and any attachments) and do not forward it 
>>> or inform any other person of its contents. If you send us messages 
>>> by e-mail, we take this as your authorization to correspond with you 
>>> by e-mail. E-mail transmission cannot be guaranteed to be secure or 
>>> error-free as information could be intercepted, amended, corrupted, 
>>> lost, destroyed, arrive late or incomplete, or contain viruses. 
>>> Neither chors GmbH nor the sender accept liability for any errors or 
>>> omissions in the content of this message which arise as a result of 
>>> its e-mail transmission. Please note that all e-mail communications 
>>> to and from chors GmbH may be monitored.</p>
>

Re: Unbalanced cluster with RandomPartitioner

Posted by Marcel Steinbach <ma...@chors.de>.

We are running regular repairs, so I don't think that's the problem. 
And the data dir sizes match approx. the load from the nodetool.
Thanks for the advise, though.

Our keys are digits only, and all contain a few zeros at the same offsets. I'm not that familiar with the md5 algorithm, but I doubt that it would generate 'hotspots' for those kind of keys, right?

On 17.01.2012, at 17:34, Mohit Anchlia wrote:

> Have you tried running repair first on each node? Also, verify using
> df -h on the data dirs
> 
> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
> <ma...@chors.de> wrote:
>> Hi,
>> 
>> we're using RP and have each node assigned the same amount of the token space. The cluster looks like that:
>> 
>> Address         Status State   Load            Owns    Token
>>                                                       205648943402372032879374446248852460236
>> 1       Up     Normal  310.83 GB       12.50%  56775407874461455114148055497453867724
>> 2       Up     Normal  470.24 GB       12.50%  78043055807020109080608968461939380940
>> 3       Up     Normal  271.57 GB       12.50%  99310703739578763047069881426424894156
>> 4       Up     Normal  282.61 GB       12.50%  120578351672137417013530794390910407372
>> 5       Up     Normal  248.76 GB       12.50%  141845999604696070979991707355395920588
>> 6       Up     Normal  164.12 GB       12.50%  163113647537254724946452620319881433804
>> 7       Up     Normal  76.23 GB        12.50%  184381295469813378912913533284366947020
>> 8       Up     Normal  19.79 GB        12.50%  205648943402372032879374446248852460236
>> 
>> I was under the impression, the RP would distribute the load more evenly.
>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single node. Should we just move the nodes so that the load is more even distributed, or is there something off that needs to be fixed first?
>> 
>> Thanks
>> Marcel
>> <hr style="border-color:blue">
>> <p>chors GmbH
>> <br><hr style="border-color:blue">
>> <p>specialists in digital and direct marketing solutions<br>
>> Haid-und-Neu-Straße 7<br>
>> 76131 Karlsruhe, Germany<br>
>> www.chors.com</p>
>> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht Montabaur, HRB 15029</p>
>> <p style="font-size:9px">This e-mail is for the intended recipient only and may contain confidential or privileged information. If you have received this e-mail by mistake, please contact us immediately and completely delete it (and any attachments) and do not forward it or inform any other person of its contents. If you send us messages by e-mail, we take this as your authorization to correspond with you by e-mail. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Neither chors GmbH nor the sender accept liability for any errors or omissions in the content of this message which arise as a result of its e-mail transmission. Please note that all e-mail communications to and from chors GmbH may be monitored.</p>

Re: Unbalanced cluster with RandomPartitioner

Posted by Mohit Anchlia <mo...@gmail.com>.

Have you tried running repair first on each node? Also, verify using
df -h on the data dirs

On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
<ma...@chors.de> wrote:
> Hi,
>
> we're using RP and have each node assigned the same amount of the token space. The cluster looks like that:
>
> Address         Status State   Load            Owns    Token
>                                                       205648943402372032879374446248852460236
> 1       Up     Normal  310.83 GB       12.50%  56775407874461455114148055497453867724
> 2       Up     Normal  470.24 GB       12.50%  78043055807020109080608968461939380940
> 3       Up     Normal  271.57 GB       12.50%  99310703739578763047069881426424894156
> 4       Up     Normal  282.61 GB       12.50%  120578351672137417013530794390910407372
> 5       Up     Normal  248.76 GB       12.50%  141845999604696070979991707355395920588
> 6       Up     Normal  164.12 GB       12.50%  163113647537254724946452620319881433804
> 7       Up     Normal  76.23 GB        12.50%  184381295469813378912913533284366947020
> 8       Up     Normal  19.79 GB        12.50%  205648943402372032879374446248852460236
>
> I was under the impression, the RP would distribute the load more evenly.
> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single node. Should we just move the nodes so that the load is more even distributed, or is there something off that needs to be fixed first?
>
> Thanks
> Marcel
> <hr style="border-color:blue">
> <p>chors GmbH
> <br><hr style="border-color:blue">
> <p>specialists in digital and direct marketing solutions<br>
> Haid-und-Neu-Straße 7<br>
> 76131 Karlsruhe, Germany<br>
> www.chors.com</p>
> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht Montabaur, HRB 15029</p>
> <p style="font-size:9px">This e-mail is for the intended recipient only and may contain confidential or privileged information. If you have received this e-mail by mistake, please contact us immediately and completely delete it (and any attachments) and do not forward it or inform any other person of its contents. If you send us messages by e-mail, we take this as your authorization to correspond with you by e-mail. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Neither chors GmbH nor the sender accept liability for any errors or omissions in the content of this message which arise as a result of its e-mail transmission. Please note that all e-mail communications to and from chors GmbH may be monitored.</p>