You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2012/07/20 01:47:53 UTC

Re: PerRowSecondaryIndex Multiple Load

That smells like a bug to me.  I don't see how you can avoid getting
multiple buildIndexAsync calls kicked off.

What do you think, Jake?

On Thu, May 31, 2012 at 4:46 PM, Adam Holmberg
<ad...@gmail.com> wrote:
> I've been studying and experimenting some with the SecondaryIndex API,
> specifically extending the PerRowSecondaryIndex class.
>
> I understand from this recent
> thread<http://mail-archives.apache.org/mod_mbox/cassandra-dev/201205.mbox/%3CCAMYB=b6c9HTDgOFHQS-UwS4UF2a6NiMs3+C++iG3M8z4xgzn=g@mail.gmail.com%3E>that
> this feature is not yet widely used, but I was hoping someone could
> shed some light on its intended concept of operation:
>
> My intuition was to specify my custom index for every column in the column
> family that I want to trigger an update for this index. This has the
> desired effect for a 'built' index as new row mutations arrive. What I'm
> confused about is what happens as the index is built for the first time.
> What I'm seeing is that an asynchronous build is kicked off for every
> column to which this index is attached (which is obviously undesirable).
>
> It's plain to see why this is happening following the SecondaryIndexManager
> reload/addIndexedColumn routines. Now what I'm wondering is if there is
> room for improvement here:
>
> Should the Manager wait to initiate an index build until the last column
> has been added to a given rowLevelIndex? Or is the impetus on the
> PerRowSecondaryIndex implementation to 'fool' the manager into bypassing
> the build until the last column is added?
>
> My gut says the former would be preferred since the latter could be a
> fragile use of the Interface, but I'm just getting into this area and maybe
> I'm thinking about things wrong.
>
> Any input would be appreciated.
>
> Regards,
> Adam Holmberg



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: PerRowSecondaryIndex Multiple Load

Posted by Jonathan Ellis <jb...@gmail.com>.
Created https://issues.apache.org/jira/browse/CASSANDRA-4458

On Thu, Jul 19, 2012 at 7:56 PM, Jake Luciani <ja...@gmail.com> wrote:
> Yeah I worked around this by skipping all but the last call. I need to fix this properly. Thx for the reminder :)
>
>
>
> On Jul 19, 2012, at 7:47 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> That smells like a bug to me.  I don't see how you can avoid getting
>> multiple buildIndexAsync calls kicked off.
>>
>> What do you think, Jake?
>>
>> On Thu, May 31, 2012 at 4:46 PM, Adam Holmberg
>> <ad...@gmail.com> wrote:
>>> I've been studying and experimenting some with the SecondaryIndex API,
>>> specifically extending the PerRowSecondaryIndex class.
>>>
>>> I understand from this recent
>>> thread<http://mail-archives.apache.org/mod_mbox/cassandra-dev/201205.mbox/%3CCAMYB=b6c9HTDgOFHQS-UwS4UF2a6NiMs3+C++iG3M8z4xgzn=g@mail.gmail.com%3E>that
>>> this feature is not yet widely used, but I was hoping someone could
>>> shed some light on its intended concept of operation:
>>>
>>> My intuition was to specify my custom index for every column in the column
>>> family that I want to trigger an update for this index. This has the
>>> desired effect for a 'built' index as new row mutations arrive. What I'm
>>> confused about is what happens as the index is built for the first time.
>>> What I'm seeing is that an asynchronous build is kicked off for every
>>> column to which this index is attached (which is obviously undesirable).
>>>
>>> It's plain to see why this is happening following the SecondaryIndexManager
>>> reload/addIndexedColumn routines. Now what I'm wondering is if there is
>>> room for improvement here:
>>>
>>> Should the Manager wait to initiate an index build until the last column
>>> has been added to a given rowLevelIndex? Or is the impetus on the
>>> PerRowSecondaryIndex implementation to 'fool' the manager into bypassing
>>> the build until the last column is added?
>>>
>>> My gut says the former would be preferred since the latter could be a
>>> fragile use of the Interface, but I'm just getting into this area and maybe
>>> I'm thinking about things wrong.
>>>
>>> Any input would be appreciated.
>>>
>>> Regards,
>>> Adam Holmberg
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: PerRowSecondaryIndex Multiple Load

Posted by Jake Luciani <ja...@gmail.com>.
Yeah I worked around this by skipping all but the last call. I need to fix this properly. Thx for the reminder :)

 

On Jul 19, 2012, at 7:47 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> That smells like a bug to me.  I don't see how you can avoid getting
> multiple buildIndexAsync calls kicked off.
> 
> What do you think, Jake?
> 
> On Thu, May 31, 2012 at 4:46 PM, Adam Holmberg
> <ad...@gmail.com> wrote:
>> I've been studying and experimenting some with the SecondaryIndex API,
>> specifically extending the PerRowSecondaryIndex class.
>> 
>> I understand from this recent
>> thread<http://mail-archives.apache.org/mod_mbox/cassandra-dev/201205.mbox/%3CCAMYB=b6c9HTDgOFHQS-UwS4UF2a6NiMs3+C++iG3M8z4xgzn=g@mail.gmail.com%3E>that
>> this feature is not yet widely used, but I was hoping someone could
>> shed some light on its intended concept of operation:
>> 
>> My intuition was to specify my custom index for every column in the column
>> family that I want to trigger an update for this index. This has the
>> desired effect for a 'built' index as new row mutations arrive. What I'm
>> confused about is what happens as the index is built for the first time.
>> What I'm seeing is that an asynchronous build is kicked off for every
>> column to which this index is attached (which is obviously undesirable).
>> 
>> It's plain to see why this is happening following the SecondaryIndexManager
>> reload/addIndexedColumn routines. Now what I'm wondering is if there is
>> room for improvement here:
>> 
>> Should the Manager wait to initiate an index build until the last column
>> has been added to a given rowLevelIndex? Or is the impetus on the
>> PerRowSecondaryIndex implementation to 'fool' the manager into bypassing
>> the build until the last column is added?
>> 
>> My gut says the former would be preferred since the latter could be a
>> fragile use of the Interface, but I'm just getting into this area and maybe
>> I'm thinking about things wrong.
>> 
>> Any input would be appreciated.
>> 
>> Regards,
>> Adam Holmberg
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com