You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Adam Holmberg <ad...@gmail.com> on 2012/05/31 23:46:28 UTC

PerRowSecondaryIndex Multiple Load

I've been studying and experimenting some with the SecondaryIndex API,
specifically extending the PerRowSecondaryIndex class.

I understand from this recent
thread<http://mail-archives.apache.org/mod_mbox/cassandra-dev/201205.mbox/%3CCAMYB=b6c9HTDgOFHQS-UwS4UF2a6NiMs3+C++iG3M8z4xgzn=g@mail.gmail.com%3E>that
this feature is not yet widely used, but I was hoping someone could
shed some light on its intended concept of operation:

My intuition was to specify my custom index for every column in the column
family that I want to trigger an update for this index. This has the
desired effect for a 'built' index as new row mutations arrive. What I'm
confused about is what happens as the index is built for the first time.
What I'm seeing is that an asynchronous build is kicked off for every
column to which this index is attached (which is obviously undesirable).

It's plain to see why this is happening following the SecondaryIndexManager
reload/addIndexedColumn routines. Now what I'm wondering is if there is
room for improvement here:

Should the Manager wait to initiate an index build until the last column
has been added to a given rowLevelIndex? Or is the impetus on the
PerRowSecondaryIndex implementation to 'fool' the manager into bypassing
the build until the last column is added?

My gut says the former would be preferred since the latter could be a
fragile use of the Interface, but I'm just getting into this area and maybe
I'm thinking about things wrong.

Any input would be appreciated.

Regards,
Adam Holmberg

Re: PerRowSecondaryIndex Multiple Load

Posted by Jonathan Ellis <jb...@gmail.com>.
Created https://issues.apache.org/jira/browse/CASSANDRA-4458

On Thu, Jul 19, 2012 at 7:56 PM, Jake Luciani <ja...@gmail.com> wrote:
> Yeah I worked around this by skipping all but the last call. I need to fix this properly. Thx for the reminder :)
>
>
>
> On Jul 19, 2012, at 7:47 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> That smells like a bug to me.  I don't see how you can avoid getting
>> multiple buildIndexAsync calls kicked off.
>>
>> What do you think, Jake?
>>
>> On Thu, May 31, 2012 at 4:46 PM, Adam Holmberg
>> <ad...@gmail.com> wrote:
>>> I've been studying and experimenting some with the SecondaryIndex API,
>>> specifically extending the PerRowSecondaryIndex class.
>>>
>>> I understand from this recent
>>> thread<http://mail-archives.apache.org/mod_mbox/cassandra-dev/201205.mbox/%3CCAMYB=b6c9HTDgOFHQS-UwS4UF2a6NiMs3+C++iG3M8z4xgzn=g@mail.gmail.com%3E>that
>>> this feature is not yet widely used, but I was hoping someone could
>>> shed some light on its intended concept of operation:
>>>
>>> My intuition was to specify my custom index for every column in the column
>>> family that I want to trigger an update for this index. This has the
>>> desired effect for a 'built' index as new row mutations arrive. What I'm
>>> confused about is what happens as the index is built for the first time.
>>> What I'm seeing is that an asynchronous build is kicked off for every
>>> column to which this index is attached (which is obviously undesirable).
>>>
>>> It's plain to see why this is happening following the SecondaryIndexManager
>>> reload/addIndexedColumn routines. Now what I'm wondering is if there is
>>> room for improvement here:
>>>
>>> Should the Manager wait to initiate an index build until the last column
>>> has been added to a given rowLevelIndex? Or is the impetus on the
>>> PerRowSecondaryIndex implementation to 'fool' the manager into bypassing
>>> the build until the last column is added?
>>>
>>> My gut says the former would be preferred since the latter could be a
>>> fragile use of the Interface, but I'm just getting into this area and maybe
>>> I'm thinking about things wrong.
>>>
>>> Any input would be appreciated.
>>>
>>> Regards,
>>> Adam Holmberg
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: PerRowSecondaryIndex Multiple Load

Posted by Jake Luciani <ja...@gmail.com>.
Yeah I worked around this by skipping all but the last call. I need to fix this properly. Thx for the reminder :)

 

On Jul 19, 2012, at 7:47 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> That smells like a bug to me.  I don't see how you can avoid getting
> multiple buildIndexAsync calls kicked off.
> 
> What do you think, Jake?
> 
> On Thu, May 31, 2012 at 4:46 PM, Adam Holmberg
> <ad...@gmail.com> wrote:
>> I've been studying and experimenting some with the SecondaryIndex API,
>> specifically extending the PerRowSecondaryIndex class.
>> 
>> I understand from this recent
>> thread<http://mail-archives.apache.org/mod_mbox/cassandra-dev/201205.mbox/%3CCAMYB=b6c9HTDgOFHQS-UwS4UF2a6NiMs3+C++iG3M8z4xgzn=g@mail.gmail.com%3E>that
>> this feature is not yet widely used, but I was hoping someone could
>> shed some light on its intended concept of operation:
>> 
>> My intuition was to specify my custom index for every column in the column
>> family that I want to trigger an update for this index. This has the
>> desired effect for a 'built' index as new row mutations arrive. What I'm
>> confused about is what happens as the index is built for the first time.
>> What I'm seeing is that an asynchronous build is kicked off for every
>> column to which this index is attached (which is obviously undesirable).
>> 
>> It's plain to see why this is happening following the SecondaryIndexManager
>> reload/addIndexedColumn routines. Now what I'm wondering is if there is
>> room for improvement here:
>> 
>> Should the Manager wait to initiate an index build until the last column
>> has been added to a given rowLevelIndex? Or is the impetus on the
>> PerRowSecondaryIndex implementation to 'fool' the manager into bypassing
>> the build until the last column is added?
>> 
>> My gut says the former would be preferred since the latter could be a
>> fragile use of the Interface, but I'm just getting into this area and maybe
>> I'm thinking about things wrong.
>> 
>> Any input would be appreciated.
>> 
>> Regards,
>> Adam Holmberg
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: PerRowSecondaryIndex Multiple Load

Posted by Jonathan Ellis <jb...@gmail.com>.
That smells like a bug to me.  I don't see how you can avoid getting
multiple buildIndexAsync calls kicked off.

What do you think, Jake?

On Thu, May 31, 2012 at 4:46 PM, Adam Holmberg
<ad...@gmail.com> wrote:
> I've been studying and experimenting some with the SecondaryIndex API,
> specifically extending the PerRowSecondaryIndex class.
>
> I understand from this recent
> thread<http://mail-archives.apache.org/mod_mbox/cassandra-dev/201205.mbox/%3CCAMYB=b6c9HTDgOFHQS-UwS4UF2a6NiMs3+C++iG3M8z4xgzn=g@mail.gmail.com%3E>that
> this feature is not yet widely used, but I was hoping someone could
> shed some light on its intended concept of operation:
>
> My intuition was to specify my custom index for every column in the column
> family that I want to trigger an update for this index. This has the
> desired effect for a 'built' index as new row mutations arrive. What I'm
> confused about is what happens as the index is built for the first time.
> What I'm seeing is that an asynchronous build is kicked off for every
> column to which this index is attached (which is obviously undesirable).
>
> It's plain to see why this is happening following the SecondaryIndexManager
> reload/addIndexedColumn routines. Now what I'm wondering is if there is
> room for improvement here:
>
> Should the Manager wait to initiate an index build until the last column
> has been added to a given rowLevelIndex? Or is the impetus on the
> PerRowSecondaryIndex implementation to 'fool' the manager into bypassing
> the build until the last column is added?
>
> My gut says the former would be preferred since the latter could be a
> fragile use of the Interface, but I'm just getting into this area and maybe
> I'm thinking about things wrong.
>
> Any input would be appreciated.
>
> Regards,
> Adam Holmberg



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com