You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Renato Bacelar da Silveira <re...@indabamobile.co.za> on 2011/08/16 14:53:52 UTC
Cassandra adding 500K + Super Column Family
I am wondering about a certain volume situation.
I currently load a Keyspace with a certain amount of SCFs.
Each SCF (Super Column Family) represents an entity.
Each Entity may have up to 6000 values.
I am planning to have 500,000 Entities (SCF) with
6000 Columns (within Super Columns - number of Super Columns
unknown), and was wondering how much resources something
like this would require?
I am struggling to have 10,000 SCF with 30 Columns (within SuperColumns),
I get very large files, and reach a 4Gb heapspace limit very quickly on
a single node. I use Garbage Collection where needed.
Is there some secret to load 500,000 Super Column Families?
Regards.
--
Renato da Silveira
Senior Developer
Re: Cassandra adding 500K + Super Column Family
Posted by Yi Yang <yy...@me.com>.
Sounds like it's a similar case as mine. The files are definitely, extremely big, 10x space overhead should be a good case if you are just putting values into it.
I'm currently testing CASSANDRA-674 and hopes the better SSTable can solve the space overhead problem. Please follow my e-mail today and I'll continuously work on it today.
If your values are integer and floats, with column name containing ~4 characters, as estimated from my case it will cost you 1~2TB of disk space.
Best,
Steve
On Aug 16, 2011, at 4:20 PM, aaron morton wrote:
> Are you planning to create 500,000 Super Column Families or 500,000 rows in a single Super Column Family ?
>
> The former is a somewhat crazy. Cassandra schemas typically have up to a few tens of Column Families. Each column family involves a certain amount of memory overhead, this is now automatically managed in Cassandra 0.8 (see http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/)
>
> if I understand correctly you have 500K entities with 6k columns each. A simple first approach to modelling this would be to use a Standard CF with a row for each entity. However the best model is the one that serves your read requests best.
>
> Also for background the sub columns in a super column are not indexed see http://wiki.apache.org/cassandra/CassandraLimitations . You would probably run into this problem if you had 6000 sub columns in a super column.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/08/2011, at 12:53 AM, Renato Bacelar da Silveira wrote:
>
>> I am wondering about a certain volume situation.
>>
>> I currently load a Keyspace with a certain amount of SCFs.
>>
>> Each SCF (Super Column Family) represents an entity.
>>
>> Each Entity may have up to 6000 values.
>>
>> I am planning to have 500,000 Entities (SCF) with
>> 6000 Columns (within Super Columns - number of Super Columns
>> unknown), and was wondering how much resources something
>> like this would require?
>>
>> I am struggling to have 10,000 SCF with 30 Columns (within SuperColumns),
>> I get very large files, and reach a 4Gb heapspace limit very quickly on
>> a single node. I use Garbage Collection where needed.
>>
>> Is there some secret to load 500,000 Super Column Families?
>>
>> Regards.
>> --
>> Renato da Silveira
>> Senior Developer
>
Re: Cassandra adding 500K + Super Column Family
Posted by aaron morton <aa...@thelastpickle.com>.
Are you planning to create 500,000 Super Column Families or 500,000 rows in a single Super Column Family ?
The former is a somewhat crazy. Cassandra schemas typically have up to a few tens of Column Families. Each column family involves a certain amount of memory overhead, this is now automatically managed in Cassandra 0.8 (see http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/)
if I understand correctly you have 500K entities with 6k columns each. A simple first approach to modelling this would be to use a Standard CF with a row for each entity. However the best model is the one that serves your read requests best.
Also for background the sub columns in a super column are not indexed see http://wiki.apache.org/cassandra/CassandraLimitations . You would probably run into this problem if you had 6000 sub columns in a super column.
Hope that helps.
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 17/08/2011, at 12:53 AM, Renato Bacelar da Silveira wrote:
> I am wondering about a certain volume situation.
>
> I currently load a Keyspace with a certain amount of SCFs.
>
> Each SCF (Super Column Family) represents an entity.
>
> Each Entity may have up to 6000 values.
>
> I am planning to have 500,000 Entities (SCF) with
> 6000 Columns (within Super Columns - number of Super Columns
> unknown), and was wondering how much resources something
> like this would require?
>
> I am struggling to have 10,000 SCF with 30 Columns (within SuperColumns),
> I get very large files, and reach a 4Gb heapspace limit very quickly on
> a single node. I use Garbage Collection where needed.
>
> Is there some secret to load 500,000 Super Column Families?
>
> Regards.
> --
> Renato da Silveira
> Senior Developer