You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Schubert Zhang <zs...@gmail.com> on 2010/04/26 16:22:46 UTC

Re: newbie question on how columns names are indexed/lucene limitations?

The column index in a row is a sorted-blocked index (like b-tree), just like
bigtable.

On Mon, Apr 26, 2010 at 2:43 AM, Stu Hood <st...@rackspace.com> wrote:

> The indexes within rows are _not_ implemented with Lucene: there is a
> custom index structure that allows for random access within a row. But, you
> should probably read http://wiki.apache.org/cassandra/CassandraLimitationsto understand the current limitations of the file format, some of which are
> scheduled to be fixed soon.
>
> -----Original Message-----
> From: "TuX RaceR" <tu...@gmail.com>
> Sent: Sunday, April 25, 2010 11:54am
> To: user@cassandra.apache.org
> Subject: newbie question on how columns names are indexed/lucene
> limitations?
>
> Hello Cassandra Users,
>
> When use the RandomPartinionner and a simple ColumnFamily/Columns (i.e.
> no SuperColumns) my understanding is that one signle Row can store
> millions of columns.
>
> If I look at the http://wiki.apache.org/cassandra/API, I understand that
> I can get a subset of the millions of columns defined above using:
> SlicePredicate->ColumnNames or SlicePredicate->SliceRange
>
> My question is about the implementation of this columns 'selection'.
> I vaguely remember reading somewhere (but I cannot find the link again)
> that this was implemented using a Lucene index over the column names for
> each row.
> Is that true? Is there a small lucene index per row?
>
> Also we know from that lucene have some limitations
> http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations : you
> cannot index more than 2.1 billions documents as a document ID is mapped
> to a 32 bits int.
>
> As I plan to store in column names the ID of my cassandra documents (the
> global number of documents can go well beyond 2.1 billions), will I be
> hit by the lucene limitations? I.e can I store cassandra documents ID
> (i.e keys) in column names, if in each individual row there are no more
> than few millions of those IDs? I guess the answer is "yes I can",
> because lucandra uses a similar schema but it is not clear for me why.
> Is that because the lucene index is made on each row and what really
> matters in the number of columns in one single row and not the number of
> distinct column names (globally over all the rows)?
>
>
> Thanks in advance
> TuX
>
>
>