You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Karl Hiramoto <ka...@hiramoto.org> on 2011/01/16 17:45:02 UTC

balancing load

Hi,

I have a keyspace with  Replication Factor: 2
and it seems though that most of my data goes to one node.


What am I missing to have Cassandra balance more evenly?

./nodetool  -h host1 ring
Address         Status State   Load            Owns   
Token                                       
                                                      
82740373310283352874863875878673027619      
10.1.4.14     Up     Normal  17.45 GB        77.48% 
44427918469925720421829352515848570517      
10.1.4.12     Up     Normal  8.1 GB          8.12%  
58247356085106932369828800153350419939      
10.1.4.13     Up     Normal  49.51 KB        1.66%  
61078635599166706937511052402724559481      
10.1.4.15     Up     Normal  54.48 KB        6.37%  
71909504454725029906187464140698793550      
10.1.4.10     Up     Normal  44.38 KB        6.37%  
82740373310283352874863875878673027619


I use phpcasa as a client and it should randomly choose a host to
connect to.

--
Karl

Re: balancing load

Posted by Peter Schuller <pe...@infidyne.com>.

> So for full cluster balance required invoke nodetool move sequential over
> all tokens?

For a new cluster, the recommended method is to pre-calculate the
tokens and bring nodes up with appropriate tokens.

For existing clusters, it depends. E.g. if you're doubling the amount
of nodes you can just add them. If you're looking to add e.g. 25% more
nodes that requires moving nodes around. Be aware that moving a node
implies decomission + bootstrap, so the node will temporarily not be
contributing to cluster capacity during the move.

-- 
/ Peter Schuller

Re: balancing load

Posted by Karl Hiramoto <ka...@hiramoto.org>.

On 17/01/2011 19:27, Edward Capriolo wrote:
> cfstats is reporting you have an 8GB Row! I think you could be writing
> all your data to a few keys.
Your right,  my n00b fault,   I was writing everything to one key,  the 
problem was i had   Offer['id'][$UID] = value
it made it easy before to do a "count Offer['id'] in the CLI"
I was doing that for months, but only recently went from one node to 
five and added a million times more data exposed the issue.

Now everything balances well.

Thanks all.

Re: balancing load

Posted by Edward Capriolo <ed...@gmail.com>.

On Mon, Jan 17, 2011 at 1:20 PM, Karl Hiramoto <ka...@hiramoto.org> wrote:
> On 01/17/11 15:54, Edward Capriolo wrote:
>> Just to head the next possible problem. If you run 'nodetool cleanup'
>> on each node and some of your nodes still have more data then others,
>> then it probably means your are writing the majority of data to a few
>> keys. ( you probably do not want to do that )
>>
>> If that happens, you can use nodetool cfstats on each node and ensure
>> that the 'max row compacted size' is roughly the same on all nodes. If
>> you have one or two really big rows that could explain your imbalance.
>
>
> Well, I did a lengthy repair/cleanup  on each node.  but still have data
> mainly on two nodes (I have RF=2)
>  ./apache-cassandra-0.7.0/bin/nodetool --host host3 ring
> Address         Status State   Load            Owns
> Token
>
> 119098828422328462212181112601118874007
> 10.1.4.10     Up     Normal  347.11 MB       30.00%
> 0
> 10.1.4.12     Up     Normal  49.41 KB        20.00%
> 34028236692093846346337460743176821145
> 10.1.4.13     Up     Normal  54.35 KB        20.00%
> 68056473384187692692674921486353642290
> 10.1.4.15     Up     Normal  19.09 GB        16.21%
> 95643579558861158157614918209686336369
> 10.1.4.14     Up     Normal  15.62 GB        13.79%
> 119098828422328462212181112601118874007
>
>
> in "cfstats" i see:
> Compacted row minimum size: 1131752
> Compacted row maximum size: 8582860529
> Compacted row mean size: 1402350749
>
> on the lowest used node i see:
> Compacted row minimum size: 0
> Compacted row maximum size: 0
> Compacted row mean size: 0
>
> I basicly have  MyKeyspace.Offer[UID] = value    my "value"  is at most
> 500 bytes.
>
> UID i just use a 12 random alpha numeric values  [A-Z][0-9]
>
> Should i try and adjust my tokens to fix the imbalance or something else?
>
> I'm using Redhat EL  5.5
>
> java -version
> java version "1.6.0_17"
> OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-x86_64)
> OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
>
> I have some errors in my logs:
>
> ERROR [ReadStage:1747] 2011-01-17 18:13:53,988
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.lang.AssertionError
>        at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.readIndexedColumns(SSTableNamesIterator.java:178)
>        at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:132)
>        at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:70)
>        at
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
>        at
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
>        at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1215)
>        at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
>        at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
>        at org.apache.cassandra.db.Table.getRow(Table.java:384)
>        at
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
>        at
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
>        at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:636)
> ERROR [ReadStage:1747] 2011-01-17 18:13:53,989
> AbstractCassandraDaemon.java (line 91) Fatal exception in thread
> Thread[ReadStage:1747,5,main]
> java.lang.AssertionError
>        at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.readIndexedColumns(SSTableNamesIterator.java:178)
>        at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:132)
>        at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:70)
>        at
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
>        at
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
>        at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1215)
>        at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
>        at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
>        at org.apache.cassandra.db.Table.getRow(Table.java:384)
>        at
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
>        at
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
>        at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:636)
>
>
>
> Thanks,
>
> Karl
>
>


cfstats is reporting you have an 8GB Row! I think you could be writing
all your data to a few keys.

Re: balancing load

Posted by Karl Hiramoto <ka...@hiramoto.org>.

On 01/17/11 15:54, Edward Capriolo wrote:
> Just to head the next possible problem. If you run 'nodetool cleanup'
> on each node and some of your nodes still have more data then others,
> then it probably means your are writing the majority of data to a few
> keys. ( you probably do not want to do that )
>
> If that happens, you can use nodetool cfstats on each node and ensure
> that the 'max row compacted size' is roughly the same on all nodes. If
> you have one or two really big rows that could explain your imbalance.


Well, I did a lengthy repair/cleanup  on each node.  but still have data
mainly on two nodes (I have RF=2)
 ./apache-cassandra-0.7.0/bin/nodetool --host host3 ring
Address         Status State   Load            Owns   
Token                                      
                                                      
119098828422328462212181112601118874007    
10.1.4.10     Up     Normal  347.11 MB       30.00% 
0                                          
10.1.4.12     Up     Normal  49.41 KB        20.00% 
34028236692093846346337460743176821145     
10.1.4.13     Up     Normal  54.35 KB        20.00% 
68056473384187692692674921486353642290     
10.1.4.15     Up     Normal  19.09 GB        16.21% 
95643579558861158157614918209686336369     
10.1.4.14     Up     Normal  15.62 GB        13.79% 
119098828422328462212181112601118874007


in "cfstats" i see:
Compacted row minimum size: 1131752
Compacted row maximum size: 8582860529
Compacted row mean size: 1402350749

on the lowest used node i see:
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0

I basicly have  MyKeyspace.Offer[UID] = value    my "value"  is at most
500 bytes.

UID i just use a 12 random alpha numeric values  [A-Z][0-9]

Should i try and adjust my tokens to fix the imbalance or something else?

I'm using Redhat EL  5.5

java -version
java version "1.6.0_17"
OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-x86_64)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

I have some errors in my logs:

ERROR [ReadStage:1747] 2011-01-17 18:13:53,988
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.AssertionError
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.readIndexedColumns(SSTableNamesIterator.java:178)
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:132)
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:70)
        at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
        at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
        at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1215)
        at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
        at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
        at org.apache.cassandra.db.Table.getRow(Table.java:384)
        at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
        at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
ERROR [ReadStage:1747] 2011-01-17 18:13:53,989
AbstractCassandraDaemon.java (line 91) Fatal exception in thread
Thread[ReadStage:1747,5,main]
java.lang.AssertionError
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.readIndexedColumns(SSTableNamesIterator.java:178)
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:132)
        at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:70)
        at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
        at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
        at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1215)
        at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
        at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
        at org.apache.cassandra.db.Table.getRow(Table.java:384)
        at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
        at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)



Thanks,

Karl

Re: balancing load

Posted by Peter Schuller <pe...@infidyne.com>.

> @Peter Isn't clean up a special case of compaction? IE it works as a
> major compaction + removes data not belonging to the node?

Yes, sorry. Brain lapse. Ignore my.

-- 
/ Peter Schuller

Re: balancing load

Posted by Edward Capriolo <ed...@gmail.com>.

On Mon, Jan 17, 2011 at 10:51 AM, Peter Schuller
<pe...@infidyne.com> wrote:
>> Just to head the next possible problem. If you run 'nodetool cleanup'
>> on each node and some of your nodes still have more data then others,
>> then it probably means your are writing the majority of data to a few
>> keys. ( you probably do not want to do that )
>
> It may also be that a compact is needed if the discrepancies are
> within the variation expected during normal operation due to
> compaction (this assumes overwrites/deletions in write traffic).
>
> --
> / Peter Schuller
>

@Peter Isn't clean up a special case of compaction? IE it works as a
major compaction + removes data not belonging to the node?

Re: balancing load

Posted by Peter Schuller <pe...@infidyne.com>.

> Just to head the next possible problem. If you run 'nodetool cleanup'
> on each node and some of your nodes still have more data then others,
> then it probably means your are writing the majority of data to a few
> keys. ( you probably do not want to do that )

It may also be that a compact is needed if the discrepancies are
within the variation expected during normal operation due to
compaction (this assumes overwrites/deletions in write traffic).

-- 
/ Peter Schuller

Re: balancing load

Posted by Edward Capriolo <ed...@gmail.com>.

On Mon, Jan 17, 2011 at 2:44 AM, aaron morton <aa...@thelastpickle.com> wrote:
> The nodes will not automatically delete stale data, to do that you need to run nodetool cleanup.
>
> See step 3 in the Range Changes > Bootstrap http://wiki.apache.org/cassandra/Operations#Range_changes
>
> If you are feeling paranoid before hand, you could run nodetool repair on each node in turn to make sure they have the correct data. http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data
>
> You may also have some tombstones in there, they will not be deleted until after GCGraceSeconds
> http://wiki.apache.org/cassandra/DistributedDeletes
>
> Hope that helps.
> Aaron
>
> On 17 Jan 2011, at 20:34, Karl Hiramoto wrote:
>
>> Thanks for the help.  I used "nodetool move", so now each node owns 20%
>> of the space, but it seems that the data load is still mostly on 2 nodes.
>>
>>
>> nodetool  --host slave4 ring
>> Address         Status State   Load            Owns
>> Token
>>
>>      136112946768375385385349842972707284580
>> 10.1.4.10     Up     Normal  335.9 MB        20.00%
>> 0
>> 10.1.4.12     Up     Normal  54.42 KB        20.00%
>> 34028236692093846346337460743176821145
>> 10.1.4.13     Up     Normal  59.32 KB        20.00%
>> 68056473384187692692674921486353642290
>> 10.1.4.14     Up     Normal  6.33 GB         20.00%
>> 102084710076281539039012382229530463435
>> 10.1.4.15     Up     Normal  6.36 GB         20.00%
>> 136112946768375385385349842972707284580
>>
>>
>>
>>
>> --
>> Karl
>
>

Just to head the next possible problem. If you run 'nodetool cleanup'
on each node and some of your nodes still have more data then others,
then it probably means your are writing the majority of data to a few
keys. ( you probably do not want to do that )

If that happens, you can use nodetool cfstats on each node and ensure
that the 'max row compacted size' is roughly the same on all nodes. If
you have one or two really big rows that could explain your imbalance.

Re: balancing load

Posted by aaron morton <aa...@thelastpickle.com>.

The nodes will not automatically delete stale data, to do that you need to run nodetool cleanup. 

See step 3 in the Range Changes > Bootstrap http://wiki.apache.org/cassandra/Operations#Range_changes

If you are feeling paranoid before hand, you could run nodetool repair on each node in turn to make sure they have the correct data. http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data

You may also have some tombstones in there, they will not be deleted until after GCGraceSeconds 
http://wiki.apache.org/cassandra/DistributedDeletes

Hope that helps. 
Aaron

On 17 Jan 2011, at 20:34, Karl Hiramoto wrote:

> Thanks for the help.  I used "nodetool move", so now each node owns 20%
> of the space, but it seems that the data load is still mostly on 2 nodes.
> 
> 
> nodetool  --host slave4 ring
> Address         Status State   Load            Owns   
> Token                                      
> 
>      136112946768375385385349842972707284580    
> 10.1.4.10     Up     Normal  335.9 MB        20.00% 
> 0                                          
> 10.1.4.12     Up     Normal  54.42 KB        20.00% 
> 34028236692093846346337460743176821145     
> 10.1.4.13     Up     Normal  59.32 KB        20.00% 
> 68056473384187692692674921486353642290     
> 10.1.4.14     Up     Normal  6.33 GB         20.00% 
> 102084710076281539039012382229530463435    
> 10.1.4.15     Up     Normal  6.36 GB         20.00% 
> 136112946768375385385349842972707284580  
> 
> 
> 
> 
> --
> Karl

RE: balancing load

Posted by "raoyixuan (Shandy)" <ra...@huawei.com>.

You can issue the nodetool cleanup to clean up the data in old nodes.

-----Original Message-----
From: Karl Hiramoto [mailto:karl@hiramoto.org] 
Sent: Monday, January 17, 2011 3:34 PM
To: user@cassandra.apache.org
Subject: Re: balancing load

Thanks for the help.  I used "nodetool move", so now each node owns 20%
of the space, but it seems that the data load is still mostly on 2 nodes.


nodetool  --host slave4 ring
Address         Status State   Load            Owns   
Token                                      
                                                                        
      136112946768375385385349842972707284580    
10.1.4.10     Up     Normal  335.9 MB        20.00% 
0                                          
10.1.4.12     Up     Normal  54.42 KB        20.00% 
34028236692093846346337460743176821145     
10.1.4.13     Up     Normal  59.32 KB        20.00% 
68056473384187692692674921486353642290     
10.1.4.14     Up     Normal  6.33 GB         20.00% 
102084710076281539039012382229530463435    
10.1.4.15     Up     Normal  6.36 GB         20.00% 
136112946768375385385349842972707284580  




--
Karl

Re: balancing load

Posted by Karl Hiramoto <ka...@hiramoto.org>.

Thanks for the help.  I used "nodetool move", so now each node owns 20%
of the space, but it seems that the data load is still mostly on 2 nodes.


nodetool  --host slave4 ring
Address         Status State   Load            Owns   
Token                                      
                                                                        
      136112946768375385385349842972707284580    
10.1.4.10     Up     Normal  335.9 MB        20.00% 
0                                          
10.1.4.12     Up     Normal  54.42 KB        20.00% 
34028236692093846346337460743176821145     
10.1.4.13     Up     Normal  59.32 KB        20.00% 
68056473384187692692674921486353642290     
10.1.4.14     Up     Normal  6.33 GB         20.00% 
102084710076281539039012382229530463435    
10.1.4.15     Up     Normal  6.36 GB         20.00% 
136112946768375385385349842972707284580  




--
Karl

Re: balancing load

Posted by ruslan usifov <ru...@gmail.com>.

2011/1/16 Edward Capriolo <ed...@gmail.com>

> On Sun, Jan 16, 2011 at 11:45 AM, Karl Hiramoto <ka...@hiramoto.org> wrote:
> > Hi,
> >
> > I have a keyspace with  Replication Factor: 2
> > and it seems though that most of my data goes to one node.
> >
> >
> > What am I missing to have Cassandra balance more evenly?
> >
> > ./nodetool  -h host1 ring
> > Address         Status State   Load            Owns
> > Token
> >
> > 82740373310283352874863875878673027619
> > 10.1.4.14     Up     Normal  17.45 GB        77.48%
> > 44427918469925720421829352515848570517
> > 10.1.4.12     Up     Normal  8.1 GB          8.12%
> > 58247356085106932369828800153350419939
> > 10.1.4.13     Up     Normal  49.51 KB        1.66%
> > 61078635599166706937511052402724559481
> > 10.1.4.15     Up     Normal  54.48 KB        6.37%
> > 71909504454725029906187464140698793550
> > 10.1.4.10     Up     Normal  44.38 KB        6.37%
> > 82740373310283352874863875878673027619
> >
> >
> > I use phpcasa as a client and it should randomly choose a host to
> > connect to.
> >
> > --
> > Karl
> >
>
> For a 5 node cluster your initial Tokens should be:
>
> tokens=5 ant -DclassToRun=hpcas.c01.InitialTokens run
> run:
>     [java] 0
>     [java] 34028236692093846346337460743176821145
>     [java] 68056473384187692692674921486353642290
>     [java] 102084710076281539039012382229530463435
>     [java] 136112946768375385385349842972707284580
>
> To see how these numbers were calculated :
> http://wiki.apache.org/cassandra/Operations#Token_selection
>
> Use nodetool move and nodetool cleanup to correct the imbalance of your
> cluster.
>

So for full cluster balance required invoke nodetool move sequential over
all tokens?

Re: cass0.7: Creating colum family & Sorting

Posted by Victor Kabdebon <vi...@gmail.com>.

Comparator comparates only the column inside a Key.
Key sorting is done by your partitionner.


Best regards,
Victor Kabdebon

2011/1/16 kh jo <jo...@yahoo.com>

> I am having some problems with creating column families and sorting them,
>
> I want to create a countries column family where I can get a sorted list of
> countries(by country's name)
>
> the following command fails:
>
> create column family Countries with comparator=LongType
> and column_metadata=[
>     {column_name: cid, validation_class: LongType, index_type: KEYS},
>     {column_name: cname, validation_class: UTF8Type},
>     {column_name: code, validation_class: UTF8Type, index_type: KEYS}
> ];
>
> IT SHOWS: 'id' could not be translated into a LongType.
>
>
> the following works:
>
> create column family Countries with comparator=UTF8Type
> and column_metadata=[
>     {column_name: cid, validation_class: LongType, index_type: KEYS},
>     {column_name: cname, validation_class: UTF8Type},
>     {column_name: code, validation_class: UTF8Type, index_type: KEYS}
> ];
>
>
> but when I insert some columns, they are not sorted as I want
>
> $countries = new ColumnFamily(Cassandra::con(), 'Countries');
> $countries->insert('Afghanistan', array('cid'=> '1', 'cname' =>
> 'Afghanistan', 'code' => 'AF'));
> $countries->insert('Germany', array('cid'=> '2', 'cname' => 'Germany',
> 'code' =>'DE'));
> $countries->insert('Zimbabwe', array('cid'=> '3', 'cname' => 'Zimbabwe',
> 'code' =>'ZM'));
>
> now:
> list Countries;
>
> shows:
> -------------------
> RowKey: Germany
> => (column=cid, value=2, timestamp=1295211346716047)
> => (column=cname, value=Germany, timestamp=1295211346716047)
> => (column=code, value=DE, timestamp=1295211346716047)
> -------------------
> RowKey: Zimbabwe
> => (column=cid, value=3, timestamp=1295211346713570)
> => (column=cname, value=Zimbabwe, timestamp=1295211346713570)
> => (column=code, value=ZM, timestamp=1295211346713570)
> -------------------
> RowKey: Afghanistan
> => (column=cid, value=1, timestamp=1295211346709448)
> => (column=cname, value=Afghanistan, timestamp=1295211346709448)
> => (column=code, value=AF, timestamp=1295211346709448)
>
>
> I don't see any sorting here?!
>
>

cass0.7: Creating colum family & Sorting

Posted by kh jo <jo...@yahoo.com>.

I am having some problems with creating column families and sorting them,

I want to create a countries column family where I can get a sorted list of countries(by country's name)

the following command fails:

create column family Countries with comparator=LongType
and column_metadata=[
    {column_name: cid, validation_class: LongType, index_type: KEYS},
    {column_name: cname, validation_class: UTF8Type},
    {column_name: code, validation_class: UTF8Type, index_type: KEYS}
];

IT SHOWS: 'id' could not be translated into a LongType.


the following works:


create column family Countries with comparator=UTF8Type

and column_metadata=[

    {column_name: cid, validation_class: LongType, index_type: KEYS},

    {column_name: cname, validation_class: UTF8Type},

    {column_name: code, validation_class: UTF8Type, index_type: KEYS}

];


but when I insert some columns, they are not sorted as I want

$countries = new ColumnFamily(Cassandra::con(), 'Countries');
$countries->insert('Afghanistan', array('cid'=> '1', 'cname' => 'Afghanistan', 'code' => 'AF'));
$countries->insert('Germany', array('cid'=> '2', 'cname' => 'Germany',  'code' =>'DE'));
$countries->insert('Zimbabwe', array('cid'=> '3', 'cname' => 'Zimbabwe',  'code' =>'ZM'));

now:
list Countries;

shows:
-------------------
RowKey: Germany
=> (column=cid, value=2, timestamp=1295211346716047)
=> (column=cname, value=Germany, timestamp=1295211346716047)
=> (column=code, value=DE, timestamp=1295211346716047)
-------------------
RowKey: Zimbabwe
=> (column=cid, value=3, timestamp=1295211346713570)
=> (column=cname, value=Zimbabwe, timestamp=1295211346713570)
=> (column=code, value=ZM, timestamp=1295211346713570)
-------------------
RowKey: Afghanistan
=> (column=cid, value=1, timestamp=1295211346709448)
=> (column=cname, value=Afghanistan, timestamp=1295211346709448)
=> (column=code, value=AF, timestamp=1295211346709448)


I don't see any sorting here?!

Re: balancing load

Posted by Edward Capriolo <ed...@gmail.com>.

On Sun, Jan 16, 2011 at 11:45 AM, Karl Hiramoto <ka...@hiramoto.org> wrote:
> Hi,
>
> I have a keyspace with  Replication Factor: 2
> and it seems though that most of my data goes to one node.
>
>
> What am I missing to have Cassandra balance more evenly?
>
> ./nodetool  -h host1 ring
> Address         Status State   Load            Owns
> Token
>
> 82740373310283352874863875878673027619
> 10.1.4.14     Up     Normal  17.45 GB        77.48%
> 44427918469925720421829352515848570517
> 10.1.4.12     Up     Normal  8.1 GB          8.12%
> 58247356085106932369828800153350419939
> 10.1.4.13     Up     Normal  49.51 KB        1.66%
> 61078635599166706937511052402724559481
> 10.1.4.15     Up     Normal  54.48 KB        6.37%
> 71909504454725029906187464140698793550
> 10.1.4.10     Up     Normal  44.38 KB        6.37%
> 82740373310283352874863875878673027619
>
>
> I use phpcasa as a client and it should randomly choose a host to
> connect to.
>
> --
> Karl
>

For a 5 node cluster your initial Tokens should be:

tokens=5 ant -DclassToRun=hpcas.c01.InitialTokens run
run:
     [java] 0
     [java] 34028236692093846346337460743176821145
     [java] 68056473384187692692674921486353642290
     [java] 102084710076281539039012382229530463435
     [java] 136112946768375385385349842972707284580

To see how these numbers were calculated :
http://wiki.apache.org/cassandra/Operations#Token_selection

Use nodetool move and nodetool cleanup to correct the imbalance of your cluster.

Re: balancing load

Posted by Mark Zitnik <ma...@gmail.com>.

Hi,

if you are starting the cluster at once and not adding nodes to existed
cluster try to calc the tokens.

here is a python script to calc the tokens

def tokens(nodes):

   - for x in xrange(nodes):
      - print 2 ** 127 / nodes * x

also read the operation section in cassandra wiki
http://wiki.apache.org/cassandra/Operations

thanks mark

On Sun, Jan 16, 2011 at 6:45 PM, Karl Hiramoto <ka...@hiramoto.org> wrote:

> Hi,
>
> I have a keyspace with  Replication Factor: 2
> and it seems though that most of my data goes to one node.
>
>
> What am I missing to have Cassandra balance more evenly?
>
> ./nodetool  -h host1 ring
> Address         Status State   Load            Owns
> Token
>
> 82740373310283352874863875878673027619
> 10.1.4.14     Up     Normal  17.45 GB        77.48%
> 44427918469925720421829352515848570517
> 10.1.4.12     Up     Normal  8.1 GB          8.12%
> 58247356085106932369828800153350419939
> 10.1.4.13     Up     Normal  49.51 KB        1.66%
> 61078635599166706937511052402724559481
> 10.1.4.15     Up     Normal  54.48 KB        6.37%
> 71909504454725029906187464140698793550
> 10.1.4.10     Up     Normal  44.38 KB        6.37%
> 82740373310283352874863875878673027619
>
>
> I use phpcasa as a client and it should randomly choose a host to
> connect to.
>
> --
> Karl
>