You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Karl Hiramoto <ka...@hiramoto.org> on 2011/01/16 17:45:02 UTC
balancing load
Hi,
I have a keyspace with Replication Factor: 2
and it seems though that most of my data goes to one node.
What am I missing to have Cassandra balance more evenly?
./nodetool -h host1 ring
Address Status State Load Owns
Token
82740373310283352874863875878673027619
10.1.4.14 Up Normal 17.45 GB 77.48%
44427918469925720421829352515848570517
10.1.4.12 Up Normal 8.1 GB 8.12%
58247356085106932369828800153350419939
10.1.4.13 Up Normal 49.51 KB 1.66%
61078635599166706937511052402724559481
10.1.4.15 Up Normal 54.48 KB 6.37%
71909504454725029906187464140698793550
10.1.4.10 Up Normal 44.38 KB 6.37%
82740373310283352874863875878673027619
I use phpcasa as a client and it should randomly choose a host to
connect to.
--
Karl
Re: balancing load
Posted by Peter Schuller <pe...@infidyne.com>.
> So for full cluster balance required invoke nodetool move sequential over
> all tokens?
For a new cluster, the recommended method is to pre-calculate the
tokens and bring nodes up with appropriate tokens.
For existing clusters, it depends. E.g. if you're doubling the amount
of nodes you can just add them. If you're looking to add e.g. 25% more
nodes that requires moving nodes around. Be aware that moving a node
implies decomission + bootstrap, so the node will temporarily not be
contributing to cluster capacity during the move.
--
/ Peter Schuller
Re: balancing load
Posted by Karl Hiramoto <ka...@hiramoto.org>.
On 17/01/2011 19:27, Edward Capriolo wrote:
> cfstats is reporting you have an 8GB Row! I think you could be writing
> all your data to a few keys.
Your right, my n00b fault, I was writing everything to one key, the
problem was i had Offer['id'][$UID] = value
it made it easy before to do a "count Offer['id'] in the CLI"
I was doing that for months, but only recently went from one node to
five and added a million times more data exposed the issue.
Now everything balances well.
Thanks all.
Re: balancing load
Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Jan 17, 2011 at 1:20 PM, Karl Hiramoto <ka...@hiramoto.org> wrote:
> On 01/17/11 15:54, Edward Capriolo wrote:
>> Just to head the next possible problem. If you run 'nodetool cleanup'
>> on each node and some of your nodes still have more data then others,
>> then it probably means your are writing the majority of data to a few
>> keys. ( you probably do not want to do that )
>>
>> If that happens, you can use nodetool cfstats on each node and ensure
>> that the 'max row compacted size' is roughly the same on all nodes. If
>> you have one or two really big rows that could explain your imbalance.
>
>
> Well, I did a lengthy repair/cleanup on each node. but still have data
> mainly on two nodes (I have RF=2)
> ./apache-cassandra-0.7.0/bin/nodetool --host host3 ring
> Address Status State Load Owns
> Token
>
> 119098828422328462212181112601118874007
> 10.1.4.10 Up Normal 347.11 MB 30.00%
> 0
> 10.1.4.12 Up Normal 49.41 KB 20.00%
> 34028236692093846346337460743176821145
> 10.1.4.13 Up Normal 54.35 KB 20.00%
> 68056473384187692692674921486353642290
> 10.1.4.15 Up Normal 19.09 GB 16.21%
> 95643579558861158157614918209686336369
> 10.1.4.14 Up Normal 15.62 GB 13.79%
> 119098828422328462212181112601118874007
>
>
> in "cfstats" i see:
> Compacted row minimum size: 1131752
> Compacted row maximum size: 8582860529
> Compacted row mean size: 1402350749
>
> on the lowest used node i see:
> Compacted row minimum size: 0
> Compacted row maximum size: 0
> Compacted row mean size: 0
>
> I basicly have MyKeyspace.Offer[UID] = value my "value" is at most
> 500 bytes.
>
> UID i just use a 12 random alpha numeric values [A-Z][0-9]
>
> Should i try and adjust my tokens to fix the imbalance or something else?
>
> I'm using Redhat EL 5.5
>
> java -version
> java version "1.6.0_17"
> OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-x86_64)
> OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
>
> I have some errors in my logs:
>
> ERROR [ReadStage:1747] 2011-01-17 18:13:53,988
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.lang.AssertionError
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.readIndexedColumns(SSTableNamesIterator.java:178)
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:132)
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:70)
> at
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
> at
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1215)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
> at org.apache.cassandra.db.Table.getRow(Table.java:384)
> at
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
> at
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)
> ERROR [ReadStage:1747] 2011-01-17 18:13:53,989
> AbstractCassandraDaemon.java (line 91) Fatal exception in thread
> Thread[ReadStage:1747,5,main]
> java.lang.AssertionError
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.readIndexedColumns(SSTableNamesIterator.java:178)
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:132)
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:70)
> at
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
> at
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1215)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
> at org.apache.cassandra.db.Table.getRow(Table.java:384)
> at
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
> at
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)
>
>
>
> Thanks,
>
> Karl
>
>
cfstats is reporting you have an 8GB Row! I think you could be writing
all your data to a few keys.
Re: balancing load
Posted by Karl Hiramoto <ka...@hiramoto.org>.
On 01/17/11 15:54, Edward Capriolo wrote:
> Just to head the next possible problem. If you run 'nodetool cleanup'
> on each node and some of your nodes still have more data then others,
> then it probably means your are writing the majority of data to a few
> keys. ( you probably do not want to do that )
>
> If that happens, you can use nodetool cfstats on each node and ensure
> that the 'max row compacted size' is roughly the same on all nodes. If
> you have one or two really big rows that could explain your imbalance.
Well, I did a lengthy repair/cleanup on each node. but still have data
mainly on two nodes (I have RF=2)
./apache-cassandra-0.7.0/bin/nodetool --host host3 ring
Address Status State Load Owns
Token
119098828422328462212181112601118874007
10.1.4.10 Up Normal 347.11 MB 30.00%
0
10.1.4.12 Up Normal 49.41 KB 20.00%
34028236692093846346337460743176821145
10.1.4.13 Up Normal 54.35 KB 20.00%
68056473384187692692674921486353642290
10.1.4.15 Up Normal 19.09 GB 16.21%
95643579558861158157614918209686336369
10.1.4.14 Up Normal 15.62 GB 13.79%
119098828422328462212181112601118874007
in "cfstats" i see:
Compacted row minimum size: 1131752
Compacted row maximum size: 8582860529
Compacted row mean size: 1402350749
on the lowest used node i see:
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0
I basicly have MyKeyspace.Offer[UID] = value my "value" is at most
500 bytes.
UID i just use a 12 random alpha numeric values [A-Z][0-9]
Should i try and adjust my tokens to fix the imbalance or something else?
I'm using Redhat EL 5.5
java -version
java version "1.6.0_17"
OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-x86_64)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
I have some errors in my logs:
ERROR [ReadStage:1747] 2011-01-17 18:13:53,988
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.AssertionError
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.readIndexedColumns(SSTableNamesIterator.java:178)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:132)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:70)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1215)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
at org.apache.cassandra.db.Table.getRow(Table.java:384)
at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
ERROR [ReadStage:1747] 2011-01-17 18:13:53,989
AbstractCassandraDaemon.java (line 91) Fatal exception in thread
Thread[ReadStage:1747,5,main]
java.lang.AssertionError
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.readIndexedColumns(SSTableNamesIterator.java:178)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:132)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:70)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1215)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
at org.apache.cassandra.db.Table.getRow(Table.java:384)
at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Thanks,
Karl
Re: balancing load
Posted by Peter Schuller <pe...@infidyne.com>.
> @Peter Isn't clean up a special case of compaction? IE it works as a
> major compaction + removes data not belonging to the node?
Yes, sorry. Brain lapse. Ignore my.
--
/ Peter Schuller
Re: balancing load
Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Jan 17, 2011 at 10:51 AM, Peter Schuller
<pe...@infidyne.com> wrote:
>> Just to head the next possible problem. If you run 'nodetool cleanup'
>> on each node and some of your nodes still have more data then others,
>> then it probably means your are writing the majority of data to a few
>> keys. ( you probably do not want to do that )
>
> It may also be that a compact is needed if the discrepancies are
> within the variation expected during normal operation due to
> compaction (this assumes overwrites/deletions in write traffic).
>
> --
> / Peter Schuller
>
@Peter Isn't clean up a special case of compaction? IE it works as a
major compaction + removes data not belonging to the node?
Re: balancing load
Posted by Peter Schuller <pe...@infidyne.com>.
> Just to head the next possible problem. If you run 'nodetool cleanup'
> on each node and some of your nodes still have more data then others,
> then it probably means your are writing the majority of data to a few
> keys. ( you probably do not want to do that )
It may also be that a compact is needed if the discrepancies are
within the variation expected during normal operation due to
compaction (this assumes overwrites/deletions in write traffic).
--
/ Peter Schuller
Re: balancing load
Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Jan 17, 2011 at 2:44 AM, aaron morton <aa...@thelastpickle.com> wrote:
> The nodes will not automatically delete stale data, to do that you need to run nodetool cleanup.
>
> See step 3 in the Range Changes > Bootstrap http://wiki.apache.org/cassandra/Operations#Range_changes
>
> If you are feeling paranoid before hand, you could run nodetool repair on each node in turn to make sure they have the correct data. http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data
>
> You may also have some tombstones in there, they will not be deleted until after GCGraceSeconds
> http://wiki.apache.org/cassandra/DistributedDeletes
>
> Hope that helps.
> Aaron
>
> On 17 Jan 2011, at 20:34, Karl Hiramoto wrote:
>
>> Thanks for the help. I used "nodetool move", so now each node owns 20%
>> of the space, but it seems that the data load is still mostly on 2 nodes.
>>
>>
>> nodetool --host slave4 ring
>> Address Status State Load Owns
>> Token
>>
>> 136112946768375385385349842972707284580
>> 10.1.4.10 Up Normal 335.9 MB 20.00%
>> 0
>> 10.1.4.12 Up Normal 54.42 KB 20.00%
>> 34028236692093846346337460743176821145
>> 10.1.4.13 Up Normal 59.32 KB 20.00%
>> 68056473384187692692674921486353642290
>> 10.1.4.14 Up Normal 6.33 GB 20.00%
>> 102084710076281539039012382229530463435
>> 10.1.4.15 Up Normal 6.36 GB 20.00%
>> 136112946768375385385349842972707284580
>>
>>
>>
>>
>> --
>> Karl
>
>
Just to head the next possible problem. If you run 'nodetool cleanup'
on each node and some of your nodes still have more data then others,
then it probably means your are writing the majority of data to a few
keys. ( you probably do not want to do that )
If that happens, you can use nodetool cfstats on each node and ensure
that the 'max row compacted size' is roughly the same on all nodes. If
you have one or two really big rows that could explain your imbalance.
Re: balancing load
Posted by aaron morton <aa...@thelastpickle.com>.
The nodes will not automatically delete stale data, to do that you need to run nodetool cleanup.
See step 3 in the Range Changes > Bootstrap http://wiki.apache.org/cassandra/Operations#Range_changes
If you are feeling paranoid before hand, you could run nodetool repair on each node in turn to make sure they have the correct data. http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data
You may also have some tombstones in there, they will not be deleted until after GCGraceSeconds
http://wiki.apache.org/cassandra/DistributedDeletes
Hope that helps.
Aaron
On 17 Jan 2011, at 20:34, Karl Hiramoto wrote:
> Thanks for the help. I used "nodetool move", so now each node owns 20%
> of the space, but it seems that the data load is still mostly on 2 nodes.
>
>
> nodetool --host slave4 ring
> Address Status State Load Owns
> Token
>
> 136112946768375385385349842972707284580
> 10.1.4.10 Up Normal 335.9 MB 20.00%
> 0
> 10.1.4.12 Up Normal 54.42 KB 20.00%
> 34028236692093846346337460743176821145
> 10.1.4.13 Up Normal 59.32 KB 20.00%
> 68056473384187692692674921486353642290
> 10.1.4.14 Up Normal 6.33 GB 20.00%
> 102084710076281539039012382229530463435
> 10.1.4.15 Up Normal 6.36 GB 20.00%
> 136112946768375385385349842972707284580
>
>
>
>
> --
> Karl
RE: balancing load
Posted by "raoyixuan (Shandy)" <ra...@huawei.com>.
You can issue the nodetool cleanup to clean up the data in old nodes.
-----Original Message-----
From: Karl Hiramoto [mailto:karl@hiramoto.org]
Sent: Monday, January 17, 2011 3:34 PM
To: user@cassandra.apache.org
Subject: Re: balancing load
Thanks for the help. I used "nodetool move", so now each node owns 20%
of the space, but it seems that the data load is still mostly on 2 nodes.
nodetool --host slave4 ring
Address Status State Load Owns
Token
136112946768375385385349842972707284580
10.1.4.10 Up Normal 335.9 MB 20.00%
0
10.1.4.12 Up Normal 54.42 KB 20.00%
34028236692093846346337460743176821145
10.1.4.13 Up Normal 59.32 KB 20.00%
68056473384187692692674921486353642290
10.1.4.14 Up Normal 6.33 GB 20.00%
102084710076281539039012382229530463435
10.1.4.15 Up Normal 6.36 GB 20.00%
136112946768375385385349842972707284580
--
Karl
Re: balancing load
Posted by Karl Hiramoto <ka...@hiramoto.org>.
Thanks for the help. I used "nodetool move", so now each node owns 20%
of the space, but it seems that the data load is still mostly on 2 nodes.
nodetool --host slave4 ring
Address Status State Load Owns
Token
136112946768375385385349842972707284580
10.1.4.10 Up Normal 335.9 MB 20.00%
0
10.1.4.12 Up Normal 54.42 KB 20.00%
34028236692093846346337460743176821145
10.1.4.13 Up Normal 59.32 KB 20.00%
68056473384187692692674921486353642290
10.1.4.14 Up Normal 6.33 GB 20.00%
102084710076281539039012382229530463435
10.1.4.15 Up Normal 6.36 GB 20.00%
136112946768375385385349842972707284580
--
Karl
Re: balancing load
Posted by ruslan usifov <ru...@gmail.com>.
2011/1/16 Edward Capriolo <ed...@gmail.com>
> On Sun, Jan 16, 2011 at 11:45 AM, Karl Hiramoto <ka...@hiramoto.org> wrote:
> > Hi,
> >
> > I have a keyspace with Replication Factor: 2
> > and it seems though that most of my data goes to one node.
> >
> >
> > What am I missing to have Cassandra balance more evenly?
> >
> > ./nodetool -h host1 ring
> > Address Status State Load Owns
> > Token
> >
> > 82740373310283352874863875878673027619
> > 10.1.4.14 Up Normal 17.45 GB 77.48%
> > 44427918469925720421829352515848570517
> > 10.1.4.12 Up Normal 8.1 GB 8.12%
> > 58247356085106932369828800153350419939
> > 10.1.4.13 Up Normal 49.51 KB 1.66%
> > 61078635599166706937511052402724559481
> > 10.1.4.15 Up Normal 54.48 KB 6.37%
> > 71909504454725029906187464140698793550
> > 10.1.4.10 Up Normal 44.38 KB 6.37%
> > 82740373310283352874863875878673027619
> >
> >
> > I use phpcasa as a client and it should randomly choose a host to
> > connect to.
> >
> > --
> > Karl
> >
>
> For a 5 node cluster your initial Tokens should be:
>
> tokens=5 ant -DclassToRun=hpcas.c01.InitialTokens run
> run:
> [java] 0
> [java] 34028236692093846346337460743176821145
> [java] 68056473384187692692674921486353642290
> [java] 102084710076281539039012382229530463435
> [java] 136112946768375385385349842972707284580
>
> To see how these numbers were calculated :
> http://wiki.apache.org/cassandra/Operations#Token_selection
>
> Use nodetool move and nodetool cleanup to correct the imbalance of your
> cluster.
>
So for full cluster balance required invoke nodetool move sequential over
all tokens?
Re: cass0.7: Creating colum family & Sorting
Posted by Victor Kabdebon <vi...@gmail.com>.
Comparator comparates only the column inside a Key.
Key sorting is done by your partitionner.
Best regards,
Victor Kabdebon
2011/1/16 kh jo <jo...@yahoo.com>
> I am having some problems with creating column families and sorting them,
>
> I want to create a countries column family where I can get a sorted list of
> countries(by country's name)
>
> the following command fails:
>
> create column family Countries with comparator=LongType
> and column_metadata=[
> {column_name: cid, validation_class: LongType, index_type: KEYS},
> {column_name: cname, validation_class: UTF8Type},
> {column_name: code, validation_class: UTF8Type, index_type: KEYS}
> ];
>
> IT SHOWS: 'id' could not be translated into a LongType.
>
>
> the following works:
>
> create column family Countries with comparator=UTF8Type
> and column_metadata=[
> {column_name: cid, validation_class: LongType, index_type: KEYS},
> {column_name: cname, validation_class: UTF8Type},
> {column_name: code, validation_class: UTF8Type, index_type: KEYS}
> ];
>
>
> but when I insert some columns, they are not sorted as I want
>
> $countries = new ColumnFamily(Cassandra::con(), 'Countries');
> $countries->insert('Afghanistan', array('cid'=> '1', 'cname' =>
> 'Afghanistan', 'code' => 'AF'));
> $countries->insert('Germany', array('cid'=> '2', 'cname' => 'Germany',
> 'code' =>'DE'));
> $countries->insert('Zimbabwe', array('cid'=> '3', 'cname' => 'Zimbabwe',
> 'code' =>'ZM'));
>
> now:
> list Countries;
>
> shows:
> -------------------
> RowKey: Germany
> => (column=cid, value=2, timestamp=1295211346716047)
> => (column=cname, value=Germany, timestamp=1295211346716047)
> => (column=code, value=DE, timestamp=1295211346716047)
> -------------------
> RowKey: Zimbabwe
> => (column=cid, value=3, timestamp=1295211346713570)
> => (column=cname, value=Zimbabwe, timestamp=1295211346713570)
> => (column=code, value=ZM, timestamp=1295211346713570)
> -------------------
> RowKey: Afghanistan
> => (column=cid, value=1, timestamp=1295211346709448)
> => (column=cname, value=Afghanistan, timestamp=1295211346709448)
> => (column=code, value=AF, timestamp=1295211346709448)
>
>
> I don't see any sorting here?!
>
>
cass0.7: Creating colum family & Sorting
Posted by kh jo <jo...@yahoo.com>.
I am having some problems with creating column families and sorting them,
I want to create a countries column family where I can get a sorted list of countries(by country's name)
the following command fails:
create column family Countries with comparator=LongType
and column_metadata=[
{column_name: cid, validation_class: LongType, index_type: KEYS},
{column_name: cname, validation_class: UTF8Type},
{column_name: code, validation_class: UTF8Type, index_type: KEYS}
];
IT SHOWS: 'id' could not be translated into a LongType.
the following works:
create column family Countries with comparator=UTF8Type
and column_metadata=[
{column_name: cid, validation_class: LongType, index_type: KEYS},
{column_name: cname, validation_class: UTF8Type},
{column_name: code, validation_class: UTF8Type, index_type: KEYS}
];
but when I insert some columns, they are not sorted as I want
$countries = new ColumnFamily(Cassandra::con(), 'Countries');
$countries->insert('Afghanistan', array('cid'=> '1', 'cname' => 'Afghanistan', 'code' => 'AF'));
$countries->insert('Germany', array('cid'=> '2', 'cname' => 'Germany', 'code' =>'DE'));
$countries->insert('Zimbabwe', array('cid'=> '3', 'cname' => 'Zimbabwe', 'code' =>'ZM'));
now:
list Countries;
shows:
-------------------
RowKey: Germany
=> (column=cid, value=2, timestamp=1295211346716047)
=> (column=cname, value=Germany, timestamp=1295211346716047)
=> (column=code, value=DE, timestamp=1295211346716047)
-------------------
RowKey: Zimbabwe
=> (column=cid, value=3, timestamp=1295211346713570)
=> (column=cname, value=Zimbabwe, timestamp=1295211346713570)
=> (column=code, value=ZM, timestamp=1295211346713570)
-------------------
RowKey: Afghanistan
=> (column=cid, value=1, timestamp=1295211346709448)
=> (column=cname, value=Afghanistan, timestamp=1295211346709448)
=> (column=code, value=AF, timestamp=1295211346709448)
I don't see any sorting here?!
Re: balancing load
Posted by Edward Capriolo <ed...@gmail.com>.
On Sun, Jan 16, 2011 at 11:45 AM, Karl Hiramoto <ka...@hiramoto.org> wrote:
> Hi,
>
> I have a keyspace with Replication Factor: 2
> and it seems though that most of my data goes to one node.
>
>
> What am I missing to have Cassandra balance more evenly?
>
> ./nodetool -h host1 ring
> Address Status State Load Owns
> Token
>
> 82740373310283352874863875878673027619
> 10.1.4.14 Up Normal 17.45 GB 77.48%
> 44427918469925720421829352515848570517
> 10.1.4.12 Up Normal 8.1 GB 8.12%
> 58247356085106932369828800153350419939
> 10.1.4.13 Up Normal 49.51 KB 1.66%
> 61078635599166706937511052402724559481
> 10.1.4.15 Up Normal 54.48 KB 6.37%
> 71909504454725029906187464140698793550
> 10.1.4.10 Up Normal 44.38 KB 6.37%
> 82740373310283352874863875878673027619
>
>
> I use phpcasa as a client and it should randomly choose a host to
> connect to.
>
> --
> Karl
>
For a 5 node cluster your initial Tokens should be:
tokens=5 ant -DclassToRun=hpcas.c01.InitialTokens run
run:
[java] 0
[java] 34028236692093846346337460743176821145
[java] 68056473384187692692674921486353642290
[java] 102084710076281539039012382229530463435
[java] 136112946768375385385349842972707284580
To see how these numbers were calculated :
http://wiki.apache.org/cassandra/Operations#Token_selection
Use nodetool move and nodetool cleanup to correct the imbalance of your cluster.
Re: balancing load
Posted by Mark Zitnik <ma...@gmail.com>.
Hi,
if you are starting the cluster at once and not adding nodes to existed
cluster try to calc the tokens.
here is a python script to calc the tokens
def tokens(nodes):
- for x in xrange(nodes):
- print 2 ** 127 / nodes * x
also read the operation section in cassandra wiki
http://wiki.apache.org/cassandra/Operations
thanks mark
On Sun, Jan 16, 2011 at 6:45 PM, Karl Hiramoto <ka...@hiramoto.org> wrote:
> Hi,
>
> I have a keyspace with Replication Factor: 2
> and it seems though that most of my data goes to one node.
>
>
> What am I missing to have Cassandra balance more evenly?
>
> ./nodetool -h host1 ring
> Address Status State Load Owns
> Token
>
> 82740373310283352874863875878673027619
> 10.1.4.14 Up Normal 17.45 GB 77.48%
> 44427918469925720421829352515848570517
> 10.1.4.12 Up Normal 8.1 GB 8.12%
> 58247356085106932369828800153350419939
> 10.1.4.13 Up Normal 49.51 KB 1.66%
> 61078635599166706937511052402724559481
> 10.1.4.15 Up Normal 54.48 KB 6.37%
> 71909504454725029906187464140698793550
> 10.1.4.10 Up Normal 44.38 KB 6.37%
> 82740373310283352874863875878673027619
>
>
> I use phpcasa as a client and it should randomly choose a host to
> connect to.
>
> --
> Karl
>