You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Maxim Kramarenko <ma...@trackstudio.com> on 2010/05/27 22:35:21 UTC

Cassandra CF sharding

Hello!

We have mail archive with one large CF for mail body. In our case, it's 
easy to shard data to 5-10 CF by customer id. We like to do this because:

1) We get more manageable instances, because we have many small CF 
instead of one multi-TB CF on each node.

2) Better disk space usage (need to reserve 50% of the largest shard for 
compaction only)

3) Can manage node load not by token only, but also by defining shards 
available per node.

Is my assumptions correct ? Any negative side effects ?

Re: Cassandra CF sharding

Posted by anand_s <me...@gmail.com>.
Hi Maxim,

Curious to know how your experimentation with CF Sharding go. We have
similar limitations and am trying to tackle the exact same problem.

Anybody else have any suggestions/experiments that they have tried around
this?

Thanks
Anand
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-CF-sharding-tp5110445p5655818.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Cassandra CF sharding

Posted by Maxim Kramarenko <ma...@trackstudio.com>.
Hello!

Thank you.

In 1) I hope, that processing smaller files will be more easy to 
monitor. Also, if we have disk failure, we can delete just one file and 
repair, for example. Actually, CF per customer will be the best (easy to 
delete/backup specified customer data only, customers are totally 
independent), but Cassandra likely doesn't support 15000 CF per Keyspace.

Regarding 3) - yes, I understand.

One related question there - if we can choose, should we prefer
5 nodes, 16 cores/16 GB/8 TB disk space each
or
10 nodes, 8 cores/8 GB/4 TB disk space each ?

When it worth to use multiple Cassandra instance per node ? We run now 6 
instances on 3 nodes, and it works much better, than 3 instances on the 
same 3 nodes. Is it rule or exception ?




On 28.05.2010 07:11, Jonathan Ellis wrote:
> 2) is correct, but for 1) I'm not sure what manageability improvements
> you anticipate from dealing with multiple entities instead of one.
> I'm not sure what you're thinking of for 3) but routing is done by key
> only.
>
> 2010/5/27 Maxim Kramarenko<ma...@trackstudio.com>:
>> Hello!
>>
>> We have mail archive with one large CF for mail body. In our case, it's easy
>> to shard data to 5-10 CF by customer id. We like to do this because:
>>
>> 1) We get more manageable instances, because we have many small CF instead
>> of one multi-TB CF on each node.
>>
>> 2) Better disk space usage (need to reserve 50% of the largest shard for
>> compaction only)
>>
>> 3) Can manage node load not by token only, but also by defining shards
>> available per node.
>>
>> Is my assumptions correct ? Any negative side effects ?

Re: Cassandra CF sharding

Posted by Jonathan Ellis <jb...@gmail.com>.
2) is correct, but for 1) I'm not sure what manageability improvements
you anticipate from dealing with multiple entities instead of one.
I'm not sure what you're thinking of for 3) but routing is done by key
only.

2010/5/27 Maxim Kramarenko <ma...@trackstudio.com>:
> Hello!
>
> We have mail archive with one large CF for mail body. In our case, it's easy
> to shard data to 5-10 CF by customer id. We like to do this because:
>
> 1) We get more manageable instances, because we have many small CF instead
> of one multi-TB CF on each node.
>
> 2) Better disk space usage (need to reserve 50% of the largest shard for
> compaction only)
>
> 3) Can manage node load not by token only, but also by defining shards
> available per node.
>
> Is my assumptions correct ? Any negative side effects ?
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com