You are viewing a plain text version of this content. The canonical link for it is here.
Posted to marketing@cassandra.apache.org by Patrick McFadin <pm...@gmail.com> on 2023/05/11 03:51:27 UTC

Re: [DISCUSS] The future of CREATE INDEX

There will be a LOT of content around using SAI in 5.0.

CCing marketing ML

On Wed, May 10, 2023 at 8:38 PM Jeff Jirsa <jj...@gmail.com> wrote:

> Changes like this always scare me, but the benefits probably outweigh the
> risks. Probably obviously to whoever implements but please make sure if
> this happens is super visible in both NEWS and simultaneously updates the
> to-string / to-cql representation of the schema in cqlsh / drivers /
> snapshots
>
> On Wed, May 10, 2023 at 8:27 PM Patrick McFadin <pm...@gmail.com>
> wrote:
>
>> Having pulled a lot of developers out of the 2i fire, I would love it if
>> defaults got a bit more sane. Adding USING...WITH... on CREATE INDEX
>> seems like the right move for most developers that don't read docs and
>> assume behavior.
>>
>> As much as I hate that 2i would be the configured default, I get it. New
>> feature and this is the right thing for users. Would there be any way to
>> switch 2i to SAI for the same index declaration? That would make for a nice
>> upgrade for users moving to 5 without having to re-create indexes.
>>
>> Patrick
>>
>> On Wed, May 10, 2023 at 9:28 AM David Capwell <dc...@apple.com> wrote:
>>
>>> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd
>>> prefer allowing USING...WITH... for CREATE INDEX
>>>
>>>
>>> I have 0 issues with a new syntax to make this more clear
>>>
>>> just deprecating CREATE CUSTOM INDEX (at least after 5.0), but that's
>>> more or less what my original proposal was above (modulo the configurable
>>> default).
>>>
>>>
>>> I have 0 issues deprecating and producing a ClientWarning recommending
>>> the new syntax, but I would be against removing this syntax later on… it
>>> should be low effort to keep, so breaking a user would not be desirable for
>>> me.
>>>
>>> change only the fact that CREATE INDEX retains a configurable default
>>>
>>>
>>> This option allows users to control this behavior, and allows us to
>>> change the default over time.  For 5.0 I am strongly against SAI being the
>>> default (new features disabled by default), but I wouldn’t have issues in
>>> later versions changing the default once its been out for awhile.
>>>
>>> I’m not convinced by the changing defaults argument here. The
>>> characteristics of the two index types are very different, and users with
>>> scripts that make indexes today shouldn’t have their behaviour change.
>>>
>>>
>>> In my mind this is no different from defaulting to BTI in a follow up
>>> release, but if this concern is that the legacy index leaked details such
>>> as index tables, so changing the default would have side effects in the
>>> public domain that users might not expect, then I get it… are there other
>>> concerns?
>>>
>>> On May 10, 2023, at 9:03 AM, Caleb Rackliffe <ca...@gmail.com>
>>> wrote:
>>>
>>> tl;dr If you take my original proposal and change only the fact that CREATE
>>> INDEX retains a configurable default, I think we get to the same place?
>>>
>>> (Then it's just a matter of what we do in 5.0 vs. after 5.0...)
>>>
>>> On Wed, May 10, 2023 at 11:00 AM Caleb Rackliffe <
>>> calebrackliffe@gmail.com> wrote:
>>>
>>>> I see a broad desire here to have a configurable (YAML) default
>>>> implementation for CREATE INDEX. I'm not strongly opposed to that, as
>>>> the concept of a default index implementation is pretty standard for most
>>>> DBMS (see Postgres, etc.). However, keep in mind that if we do that, we
>>>> still need to either revert to CREATE CUSTOM INDEX or add the
>>>> USING...WITH... extensions to CREATE INDEX to override the default or
>>>> specify parameters, which will be in play once SAI supports basic text
>>>> tokenization/filtering. Having to revert to CREATE CUSTOM INDEX sounds
>>>> pretty awful, so I'd prefer allowing USING...WITH... for CREATE INDEX
>>>> and just deprecating CREATE CUSTOM INDEX (at least after 5.0), but
>>>> that's more or less what my original proposal was above (modulo the
>>>> configurable default).
>>>>
>>>> Thoughts?
>>>>
>>>> On Wed, May 10, 2023 at 2:59 AM Benedict <be...@apache.org> wrote:
>>>>
>>>>> I’m not convinced by the changing defaults argument here. The
>>>>> characteristics of the two index types are very different, and users with
>>>>> scripts that make indexes today shouldn’t have their behaviour change.
>>>>>
>>>>> We could introduce new syntax that properly appreciates there’s no
>>>>> default index, perhaps CREATE LOCAL [type] INDEX? To also make clear that
>>>>> these indexes involve a partition key or scatter gather
>>>>>
>>>>> On 10 May 2023, at 06:26, guo Maxwell <cc...@gmail.com> wrote:
>>>>>
>>>>> 
>>>>> +1 , as we must Improve the image of your own default indexing ability.
>>>>>
>>>>> and As for *CREATE CUSTOM INDEX *, should we just left as it is and
>>>>> we can disable the ability for create SAI through  *CREATE CUSTOM
>>>>> INDEX*  in some version after 5.0?
>>>>>
>>>>> for as I know there may be users using this as a plugin-index
>>>>> interface, like https://github.com/Stratio/cassandra-lucene-index
>>>>> (though these project may be inactive, But if someone wants to do something
>>>>> similar in the future, we don't have to stop).
>>>>>
>>>>>
>>>>>
>>>>> Jonathan Ellis <jb...@gmail.com> 于2023年5月10日周三 10:01写道:
>>>>>
>>>>>> +1 for this, especially in the long term.  CREATE INDEX should do the
>>>>>> right thing for most people without requiring extra ceremony.
>>>>>>
>>>>>> On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan <
>>>>>> jeremiah.jordan@gmail.com> wrote:
>>>>>>
>>>>>>> If the consensus is that SAI is the right default index, then we
>>>>>>> should just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM
>>>>>>> INDEX.
>>>>>>>
>>>>>>>
>>>>>>> On May 9, 2023, at 4:44 PM, Caleb Rackliffe <
>>>>>>> calebrackliffe@gmail.com> wrote:
>>>>>>>
>>>>>>> Earlier today, Mick started a thread on the future of our index
>>>>>>> creation DDL on Slack:
>>>>>>>
>>>>>>> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>>>>>>>
>>>>>>> At the moment, there are two ways to create a secondary index.
>>>>>>>
>>>>>>> *1.) CREATE INDEX [IF NOT EXISTS] [name] ON <table> (<column>)*
>>>>>>>
>>>>>>> This creates an optionally named legacy 2i on the provided table and
>>>>>>> column.
>>>>>>>
>>>>>>>     ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>>>>>>>
>>>>>>> *2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON <table>
>>>>>>> (<column>) USING <class|alias> [WITH OPTIONS = <options>]*
>>>>>>>
>>>>>>> This creates a secondary index on the provided table and column
>>>>>>> using the specified 2i implementation class and (optional) parameters.
>>>>>>>
>>>>>>>     ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
>>>>>>> 'StorageAttachedIndex'
>>>>>>>
>>>>>>> (Note that the work on SAI added aliasing, so `StorageAttachedIndex`
>>>>>>> is shorthand for the fully-qualified class name, which is also valid.)
>>>>>>>
>>>>>>> So what is there to discuss?
>>>>>>>
>>>>>>> The concern Mick raised is...
>>>>>>>
>>>>>>> "...just folk continuing to use CREATE INDEX  because they think CREATE
>>>>>>> CUSTOM INDEX is advanced (or just don't know of it), and we leave
>>>>>>> users doing 2i (when they think they are, and/or we definitely want them to
>>>>>>> be, using SAI)"
>>>>>>>
>>>>>>> To paraphrase, we want people to use SAI once it's available where
>>>>>>> possible, and the default behavior of CREATE INDEX could be at odds
>>>>>>> w/ that.
>>>>>>>
>>>>>>> The proposal we seem to have landed on is something like the
>>>>>>> following:
>>>>>>>
>>>>>>> For 5.0:
>>>>>>>
>>>>>>> 1.) Disable by default the creation of new legacy 2i via CREATE
>>>>>>> INDEX.
>>>>>>> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>>>>>>>
>>>>>>> (Note: How this would interact w/ the existing
>>>>>>> secondary_indexes_enabled YAML options isn't clear yet.)
>>>>>>>
>>>>>>> Post-5.0:
>>>>>>>
>>>>>>> 1.) Deprecate and eventually remove SASI when SAI hits full feature
>>>>>>> parity w/ it.
>>>>>>> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something
>>>>>>> of a hybrid between the two. For example, CREATE
>>>>>>> INDEX...USING...WITH. This would both be flexible enough to
>>>>>>> accommodate index implementation selection and prescriptive enough to force
>>>>>>> the user to make a decision (and wouldn't change the legacy behavior of the
>>>>>>> existing CREATE INDEX). In this world, creating a legacy 2i might
>>>>>>> look something like CREATE INDEX...USING `legacy`.
>>>>>>> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
>>>>>>>
>>>>>>> Eventually we would have a single enabled DDL statement for index
>>>>>>> creation that would be minimal but also explicit/able to handle some
>>>>>>> evolution.
>>>>>>>
>>>>>>> What does everyone think?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jonathan Ellis
>>>>>> co-founder, http://www.datastax.com
>>>>>> @spyced
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> you are the apple of my eye !
>>>>>
>>>>>
>>>