You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by mcasandra <mo...@gmail.com> on 2011/02/12 08:24:01 UTC

Indexes and hard disk

Are indexes supported in Cassandra? If yes then what kind? Also, if it's
supported then please point me to the place that give more information about
it.

Are there any kind of hard disk in particular recommended by Cassandra? We
generally get only 500GB hard disks on our virtual machines. But I read that
it's better to have 2TB HDD.
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Indexes-and-hard-disk-tp6018234p6018234.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Indexes and hard disk

Posted by Filip Nguyen <ng...@gmail.com>.

On 12.2.2011 16:15, Bill de hÓra wrote:
> (under the hood, a 0.7 index is a 'private' CF).
Thanks for answer, that sounds very clever and neat!

Re: Indexes and hard disk

Posted by Dave Revell <da...@meebo-inc.com>.

Indexes have another important advantage over multiple denormalized column
families. If you make the copies yourself, eventually the copies will
diverge from the base "true" column family due to routine occasional
failures. You'll probably want to find and fix these inconsistencies.

If you're using built-in indexes, you won't have this problem and you can
save some effort.

-Dave
On Feb 12, 2011 7:16 AM, "Bill de hÓra" <bi...@dehora.net> wrote:
> On Sat, 2011-02-12 at 14:03 +0100, Filip Nguyen wrote:
>
>
>> Why the secondary indexes are even present in Cassandra? I thought the
>> point is that development in Cassandra is query driven, that when you
>> want to search and fetch for example by birth date you should create
>> new ColumnFamilly...
>
>
>
> Yes and no. Systems like Cassandra are designed such that you should
> write the data out as you want to read it in (because writes are cheap).
> However most systems will want to access data via a few other criteria.
> For example a blogging system that supports tags will need to list your
> blog entries by date and by tag equally efficiently . As you say, you
> can spin up a new ColumnFamilly for that, but it's such a common need
> that Cassandra 0.7 supports it directly and saves developers having to
> manage indexes by hand (under the hood, a 0.7 index is a 'private' CF).
> This for me is one of the features that really sets Cassandra apart -
> scaling and indexing data at the same time is hard, and very few systems
> do both well.
>
> Bill

Re: Indexes and hard disk

Posted by Bill de hÓra <bi...@dehora.net>.

On Sat, 2011-02-12 at 14:03 +0100, Filip Nguyen wrote:

> Why the secondary indexes are even present in Cassandra? I thought the
> point is that development in Cassandra is query driven, that when you
> want to search and fetch for example by birth date you should create
> new ColumnFamilly...

Yes and no. Systems like Cassandra are designed such that you should
write the data out as you want to read it in (because writes are cheap).
However most systems will want to access data via a few other criteria.
For example a blogging system that supports tags will need to list your
blog entries by date and by tag equally efficiently . As you say, you
can spin up a new ColumnFamilly for that, but it's such a common need
that Cassandra 0.7 supports it directly and saves developers having to
manage indexes by hand  (under the hood, a 0.7 index is a 'private' CF).
This for me is one of the features that really sets Cassandra apart  -
scaling and indexing data at the same time is hard, and very few systems
do both well.

Bill

Re: Indexes and hard disk

Posted by Filip Nguyen <ng...@gmail.com>.

On 12.2.2011 13:56, Bill de hÓra wrote:
> On Fri, 2011-02-11 at 23:24 -0800, mcasandra wrote:
>> Are indexes supported in Cassandra? If yes then what kind? Also, if it's
>> supported then please point me to the place that give more information about
>> it.
> Secondary Indexes:
>
> <http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes>
> <http://pycassa.github.com/pycassa/tutorial.html#indexes>
>
Why the secondary indexes are even present in Cassandra? I thought the point is 
that development in Cassandra is query driven, that when you want to search and 
fetch for example by birth date you should create new ColumnFamilly...

Re: Indexes and hard disk

Posted by Bill de hÓra <bi...@dehora.net>.

On Fri, 2011-02-11 at 23:24 -0800, mcasandra wrote:

> Are indexes supported in Cassandra? If yes then what kind? Also, if it's
> supported then please point me to the place that give more information about
> it.

Secondary Indexes:

<http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes>
<http://pycassa.github.com/pycassa/tutorial.html#indexes>

> Are there any kind of hard disk in particular recommended by Cassandra? We
> generally get only 500GB hard disks on our virtual machines. But I read that
> it's better to have 2TB HDD.

Hardware:

<http://wiki.apache.org/cassandra/CassandraHardware>
<http://www.mail-archive.com/user@cassandra.apache.org/msg06838.html>
<http://www.bitplumber.net/2010/10/a-cassandra-hardware-stack-dell-c1100s-ocz-vertex-2-ssds-with-sandforce-arista-7048s/>
<http://blog.kosmix.com/2011/01/21/cassandra-on-ssd/>

There's no replacement for understanding your application's workload and
the properties of the underlying hardware/vm and jvm you're being
supplied with (assuming you have a say in hardware selection). Cassandra
can run on pretty much anything, but the anything you get might not be
optimal for your app, which means you will have to tune the App and
Cassandra, or get different kit.

Bill

Re: Indexes and hard disk

Posted by Javier Canillas <ja...@gmail.com>.

Well, we have had some problems when a compactation was kicking into the
system under heavy load on version 0.6.3. This caused some nodes to respond
slow and caused some application instability.

What we do, playing with compactation strategy, was to move the major
compactation to a fixed hour of the day when we know there won't be heavy
traffic. But all this was posible because we had enough disk space to
continue holding new SSTables storage without problem.

Hope this information throw some light on my recommendation.

On Sat, Feb 12, 2011 at 3:47 PM, mcasandra <mo...@gmail.com> wrote:

>
>
> Javier Canillas wrote:
> >
> >
> > On the other hand, if your application will use heavily cassandra i
> > recommend making a calcularon of the data volumen that will handle and
> > using a partition that is actually 3 to 5 times bigger. Doing so you
> > will be able to play around with compactation strategy to avoid
> > latency when your application is under heavy load.
> >
> >
>
> Can you please explain with an example? I just started reading about
> Cassandra, all I know is that Cassandra partitions based on the hash space
> and divides equally among all the nodes. The tokens (keys) are then
> distributed equally accross N replica nodes.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Indexes-and-hard-disk-tp6018234p6019297.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: Indexes and hard disk

Posted by mcasandra <mo...@gmail.com>.


Javier Canillas wrote:
> 
> 
> On the other hand, if your application will use heavily cassandra i
> recommend making a calcularon of the data volumen that will handle and
> using a partition that is actually 3 to 5 times bigger. Doing so you
> will be able to play around with compactation strategy to avoid
> latency when your application is under heavy load.
> 
> 

Can you please explain with an example? I just started reading about
Cassandra, all I know is that Cassandra partitions based on the hash space
and divides equally among all the nodes. The tokens (keys) are then
distributed equally accross N replica nodes.

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Indexes-and-hard-disk-tp6018234p6019297.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Indexes and hard disk

Posted by Javier Canillas <ja...@gmail.com>.

As for indexes, cassandra supports columns as indexes (there is a
property called some sort of "type" that can have the value "KEYS".
Start your search in that direction (this is a feature from 0.7)
introduce when you create dinamically columnfamilies and you declare
its columns

On the other hand, if your application will use heavily cassandra i
recommend making a calcularon of the data volumen that will handle and
using a partition that is actually 3 to 5 times bigger. Doing so you
will be able to play around with compactation strategy to avoid
latency when your application is under heavy load.

Hope this information is useful to you

Regards,

Javier canillas

El 12/02/2011, a las 04:24, mcasandra <mo...@gmail.com> escribió:

>
> Are indexes supported in Cassandra? If yes then what kind? Also, if it's
> supported then please point me to the place that give more information about
> it.
>
> Are there any kind of hard disk in particular recommended by Cassandra? We
> generally get only 500GB hard disks on our virtual machines. But I read that
> it's better to have 2TB HDD.
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Indexes-and-hard-disk-tp6018234p6018234.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.