You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Claude Warren <cl...@instaclustr.com> on 2021/11/17 09:16:56 UTC

Implementing a secondary index

Greetings,

I am looking to implement a Multidimensional Bloom filter index [1] [2] on
a Cassandra table.  OK, I know that is a lot to take in.  What I need is
any documentation that explains the architecture of the index options, or
someone I can ask questions of -- a mentor if you will.

I have a proof of concept for the index that works from the client side
[3].  What I want to do is move some of that processing to the server
side.

I basically I think I need to do the following:

   1. On each partition create an SST to store the index data.  This table
   comprises, 2 integer data points and the primary key for the data table.
   2. When the index cell gets updated in the original table (there will
   only be on column), update one or more rows in the SST table.
   3. When querying perform multiple queries against the index data, and
   return the primary key values (or the data associated with the primary keys
   -- I am unclear on this bit).

Any help or guidance would be appreciated,
Claude

[1] https://archive.org/details/arxiv-1501.01941/mode/2up
[2] https://archive.fosdem.org/2020/schedule/event/bloom_filters/
[3] https://github.com/Claude-at-Instaclustr/blooming_cassandra




-- 

[image: Instaclustr logo]


*Claude Warren*

Principal Software Engineer

Instaclustr

Re: Implementing a secondary index

Posted by Caleb Rackliffe <ca...@gmail.com>.

Hi Claude,

In code space, the best place to start would be the secondary index API and
the manager that maintains the indexes on a per-table basis:

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/Index.java
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/SecondaryIndexManager.java

If you have any questions about either, feel free to reach out, either here
or in ASF Slack.

P.S. If you're interested in where secondary indexing in Cassandra is
headed, follow
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
.

On Wed, Nov 17, 2021 at 4:34 AM DuyHai Doan <do...@gmail.com> wrote:

> Hello Claude
>
> I have written a blog post about 2nd index architecture a long time ago but
> most of the content should still be relevant, worth checking
>
> https://www.doanduyhai.com/blog/?p=13191
>
> Regards
>
> Duy Hai DOAN
>
> Le mer. 17 nov. 2021 à 10:17, Claude Warren <claude.warren@instaclustr.com
> >
> a écrit :
>
> > Greetings,
> >
> > I am looking to implement a Multidimensional Bloom filter index [1] [2]
> on
> > a Cassandra table.  OK, I know that is a lot to take in.  What I need is
> > any documentation that explains the architecture of the index options, or
> > someone I can ask questions of -- a mentor if you will.
> >
> > I have a proof of concept for the index that works from the client side
> > [3].  What I want to do is move some of that processing to the server
> > side.
> >
> > I basically I think I need to do the following:
> >
> >    1. On each partition create an SST to store the index data.  This
> table
> >    comprises, 2 integer data points and the primary key for the data
> table.
> >    2. When the index cell gets updated in the original table (there will
> >    only be on column), update one or more rows in the SST table.
> >    3. When querying perform multiple queries against the index data, and
> >    return the primary key values (or the data associated with the primary
> > keys
> >    -- I am unclear on this bit).
> >
> > Any help or guidance would be appreciated,
> > Claude
> >
> > [1] https://archive.org/details/arxiv-1501.01941/mode/2up
> > [2] https://archive.fosdem.org/2020/schedule/event/bloom_filters/
> > [3] https://github.com/Claude-at-Instaclustr/blooming_cassandra
> >
> >
> >
> >
> > --
> >
> > [image: Instaclustr logo]
> >
> >
> > *Claude Warren*
> >
> > Principal Software Engineer
> >
> > Instaclustr
> >
>

Re: Implementing a secondary index

Posted by DuyHai Doan <do...@gmail.com>.

Hello Claude

I have written a blog post about 2nd index architecture a long time ago but
most of the content should still be relevant, worth checking

https://www.doanduyhai.com/blog/?p=13191

Regards

Duy Hai DOAN

Le mer. 17 nov. 2021 à 10:17, Claude Warren <cl...@instaclustr.com>
a écrit :

> Greetings,
>
> I am looking to implement a Multidimensional Bloom filter index [1] [2] on
> a Cassandra table.  OK, I know that is a lot to take in.  What I need is
> any documentation that explains the architecture of the index options, or
> someone I can ask questions of -- a mentor if you will.
>
> I have a proof of concept for the index that works from the client side
> [3].  What I want to do is move some of that processing to the server
> side.
>
> I basically I think I need to do the following:
>
>    1. On each partition create an SST to store the index data.  This table
>    comprises, 2 integer data points and the primary key for the data table.
>    2. When the index cell gets updated in the original table (there will
>    only be on column), update one or more rows in the SST table.
>    3. When querying perform multiple queries against the index data, and
>    return the primary key values (or the data associated with the primary
> keys
>    -- I am unclear on this bit).
>
> Any help or guidance would be appreciated,
> Claude
>
> [1] https://archive.org/details/arxiv-1501.01941/mode/2up
> [2] https://archive.fosdem.org/2020/schedule/event/bloom_filters/
> [3] https://github.com/Claude-at-Instaclustr/blooming_cassandra
>
>
>
>
> --
>
> [image: Instaclustr logo]
>
>
> *Claude Warren*
>
> Principal Software Engineer
>
> Instaclustr
>