You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Drew Kutcharian <dr...@venarc.com> on 2011/04/03 21:28:09 UTC

Secondary Indexes

Hi Everyone,

I posted the following email a couple of days ago and I didn't get any responses. Makes me wonder, does anyone on this list know/use Secondary Indexes? They seem to me like a pretty big feature and it's a bit disappointing to not be able to get a documentation on it.

The only thing I could find on the Wiki was the end of http://wiki.apache.org/cassandra/StorageConfiguration and that was pointing to the non-existing page http://wiki.apache.org/cassandra/SecondaryIndexes . In addition, I checked the JIRA CASSANDRA-749 and there's a lot of back and forth that I couldn't really figure out what the conclusion was. What gives?

I think the Cassandra committers are doing a heck of a job adding all these cool functionalities but the documenting side doesn't really keep up. Jonathan Ellis's blog post on Secondary Indexes only scratches the surface of the topic, and if you consider that the whole point of using Cassandra is scalability, there isn't a single mention of how Secondary Indexes scale!!! (This same thing applies to Counters too)

I'm not trying to be a complainer, but as someone new to this community, I hope you guys take my comments as productive criticism.

Thanks,

Drew


[ORIGINAL POST]

I just read Jonathan Ellis' great post on Secondary Indexes (http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes) and I was wondering where I can find a bit more info on them. I would like to know:

1) Are there in limitations beside the hash properties (no between queries)? Like size or memory, etc?

2) Are there distributed? If so, how does that work? How are there stored on the nodes?

3) When you write a new row, when/how does the index get updated? What I would like to know is the atomicity of the operation, is the "index write" part of the "row write"?

4) Is there a difference between creating a secondary index vs creating an "index" CF manually such as "users_by_country"? 


Re: Secondary Indexes

Posted by Drew Kutcharian <dr...@venarc.com>.
I just updated added a new page to the wiki: http://wiki.apache.org/cassandra/SecondaryIndexes


On Apr 3, 2011, at 7:37 PM, Drew Kutcharian wrote:

> Yea I know, I just didn't know anyone can update it.
> 
> 
> On Apr 3, 2011, at 1:26 PM, Joe Stump wrote:
> 
>> 
>> On Apr 3, 2011, at 2:22 PM, Drew Kutcharian wrote:
>> 
>>> Thanks Tyler. Can you update the wiki with these answers so they are stored there for others to see too?
>> 
>> Dude, it's a wiki.
> 


Re: Secondary Indexes

Posted by Drew Kutcharian <dr...@venarc.com>.
Yea I know, I just didn't know anyone can update it.


On Apr 3, 2011, at 1:26 PM, Joe Stump wrote:

> 
> On Apr 3, 2011, at 2:22 PM, Drew Kutcharian wrote:
> 
>> Thanks Tyler. Can you update the wiki with these answers so they are stored there for others to see too?
> 
> Dude, it's a wiki.


Re: Secondary Indexes

Posted by Joe Stump <jo...@joestump.net>.
On Apr 3, 2011, at 2:22 PM, Drew Kutcharian wrote:

> Thanks Tyler. Can you update the wiki with these answers so they are stored there for others to see too?

Dude, it's a wiki. 

Re: Secondary Indexes

Posted by Drew Kutcharian <dr...@venarc.com>.
Thanks Tyler. Can you update the wiki with these answers so they are stored there for others to see too?

On Apr 3, 2011, at 12:51 PM, Tyler Hobbs <ty...@datastax.com> wrote:

> I'm not familiar with some of the details, but I'll try to answer your questions in general.  Secondary indexes are implemented as a slightly special separate column family with the indexed value serving as the key; most of the properties of secondary indexes follow from that.
> 
> On Sun, Apr 3, 2011 at 2:28 PM, Drew Kutcharian <dr...@venarc.com> wrote:
> Hi Everyone,
> 
> I posted the following email a couple of days ago and I didn't get any responses. Makes me wonder, does anyone on this list know/use Secondary Indexes? They seem to me like a pretty big feature and it's a bit disappointing to not be able to get a documentation on it.
> 
> The only thing I could find on the Wiki was the end of http://wiki.apache.org/cassandra/StorageConfiguration and that was pointing to the non-existing page http://wiki.apache.org/cassandra/SecondaryIndexes . In addition, I checked the JIRA CASSANDRA-749 and there's a lot of back and forth that I couldn't really figure out what the conclusion was. What gives?
> 
> I think the Cassandra committers are doing a heck of a job adding all these cool functionalities but the documenting side doesn't really keep up. Jonathan Ellis's blog post on Secondary Indexes only scratches the surface of the topic, and if you consider that the whole point of using Cassandra is scalability, there isn't a single mention of how Secondary Indexes scale!!! (This same thing applies to Counters too)
> 
> I'm not trying to be a complainer, but as someone new to this community, I hope you guys take my comments as productive criticism.
> 
> Thanks,
> 
> Drew
> 
> 
> [ORIGINAL POST]
> 
> I just read Jonathan Ellis' great post on Secondary Indexes (http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes) and I was wondering where I can find a bit more info on them. I would like to know:
> 
> 1) Are there in limitations beside the hash properties (no between queries)? Like size or memory, etc? 
>  
> No.
>  
> 
> 2) Are there distributed? If so, how does that work? How are there stored on the nodes?
> 
> Each node only indexes data that it holds locally.
>  
> 
> 3) When you write a new row, when/how does the index get updated? What I would like to know is the atomicity of the operation, is the "index write" part of the "row write"?
> 
> The row and index updates are one atomic operation.
>  
> 
> 4) Is there a difference between creating a secondary index vs creating an "index" CF manually such as "users_by_country"? 
> 
> 
> Yes.  First, when creating your own index, a node may index data held by another node.  Second, updates to the index and data are not atomic.
> 
> Your feedback is certainly helpful and hopefully we can get some of these details into the documentation!
> 
> -- 
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
> 

Re: Secondary Indexes

Posted by Tyler Hobbs <ty...@datastax.com>.
I'm not familiar with some of the details, but I'll try to answer your
questions in general.  Secondary indexes are implemented as a slightly
special separate column family with the indexed value serving as the key;
most of the properties of secondary indexes follow from that.

On Sun, Apr 3, 2011 at 2:28 PM, Drew Kutcharian <dr...@venarc.com> wrote:

> Hi Everyone,
>
> I posted the following email a couple of days ago and I didn't get any
> responses. Makes me wonder, does anyone on this list know/use Secondary
> Indexes? They seem to me like a pretty big feature and it's a bit
> disappointing to not be able to get a documentation on it.
>
> The only thing I could find on the Wiki was the end of
> http://wiki.apache.org/cassandra/StorageConfiguration and that was
> pointing to the non-existing page
> http://wiki.apache.org/cassandra/SecondaryIndexes . In addition, I checked
> the JIRA CASSANDRA-749 and there's a lot of back and forth that I couldn't
> really figure out what the conclusion was. What gives?
>
> I think the Cassandra committers are doing a heck of a job adding all these
> cool functionalities but the documenting side doesn't really keep
> up. Jonathan Ellis's blog post on Secondary Indexes only scratches the
> surface of the topic, and if you consider that the whole point of using
> Cassandra is scalability, there isn't a single mention of how Secondary
> Indexes scale!!! (This same thing applies to Counters too)
>
> I'm not trying to be a complainer, but as someone new to this community, I
> hope you guys take my comments as productive criticism.
>
> *Thanks,
>
> Drew*
>
>
> [ORIGINAL POST]
>
> *I just read Jonathan Ellis' great post on Secondary Indexes (**
> http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes*<http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes>
> *) and I was wondering where I can find a bit more info on them. I would
> like to know:
>
> 1) Are there in limitations beside the hash properties (no between
> queries)? Like size or memory, etc?*
>

No.


> *
> 2) Are there distributed? If so, how does that work? How are there stored
> on the nodes?
> *
>

Each node only indexes data that it holds locally.


> *
> 3) When you write a new row, when/how does the index get updated? What I
> would like to know is the atomicity of the operation, is the "index write"
> part of the "row write"?
> *
>

The row and index updates are one atomic operation.


> *
> 4) Is there a difference between creating a secondary index vs creating an
> "index" CF manually such as "users_by_country"?
>
> *
>

Yes.  First, when creating your own index, a node may index data held by
another node.  Second, updates to the index and data are not atomic.

Your feedback is certainly helpful and hopefully we can get some of these
details into the documentation!

-- 
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library