You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Dmitry Lychagin <dm...@couchbase.com> on 2018/01/30 02:54:39 UTC

Counting character positions inside a string

All,

We would like to change how string functions count character positions inside a string.
Currently string functions position(), substring() and some others assume that the first character is at position 1.
The proposal is to change the first position to 0, to better align with array element positions (which also start with 0), and other languages (JavaScript, etc).
This change will also apply to binary functions (see below) and will be effective in both SQLPP and AQL.

The following functions will be affected:
position(),
regexp_position(),
substring()/substr(),
sub_binary(),
find_binary()

This might be a disrupting change for some users so we will also introduce a cluster-wide configuration parameter (“compiler.stringoffset”) for backwards compatibility:
compiler.stringoffset = 0   // first character position is assumed to be 0 (new default)
compiler.stringoffset = 1   // first character position is assumed to be 1 (backwards-compatible setting)

The query migration path is straightforward, for example:
substring(“abcdef”, 1) will need to be changed to substring(“abcdef”, 0), etc, same applies to sub_binary().
position(), regexp_position(), and find_binary() will return one less than they used to, but would still return -1 if the value is not found.

Please share your comments and concerns.
Thanks,
-- Dmitry


Re: Counting character positions inside a string

Posted by Dmitry Lychagin <dm...@couchbase.com>.
The change has been merged. 
Please update your queries if necessary or use the configuration parameter described below to enable the old behavior.

Thanks,
-- Dmitry
 
On 1/30/18, 1:26 PM, "Ian Maxon" <im...@uci.edu> wrote:

    +1
    
    On Tue, Jan 30, 2018 at 12:03 PM, Taewoo Kim <wa...@gmail.com> wrote:
    > +1
    >
    > Best,
    > Taewoo
    >
    > On Tue, Jan 30, 2018 at 11:57 AM, Murtadha Hubail <hu...@gmail.com>
    > wrote:
    >
    >> +1
    >>
    >> Cheers,
    >> Murtadha
    >>
    >> On 01/30/2018, 10:55 PM, "Mike Carey" <dt...@gmail.com> wrote:
    >>
    >>     +1
    >>
    >>     Likewise
    >>
    >>
    >>     On 1/30/18 11:22 AM, Till Westmann wrote:
    >>     > Sounds good to me.
    >>     >
    >>     > +1
    >>     >
    >>     > Cheers,
    >>     > Till
    >>     >
    >>     > On 29 Jan 2018, at 18:54, Dmitry Lychagin wrote:
    >>     >
    >>     >> All,
    >>     >>
    >>     >> We would like to change how string functions count character
    >>     >> positions inside a string.
    >>     >> Currently string functions position(), substring() and some others
    >>     >> assume that the first character is at position 1.
    >>     >> The proposal is to change the first position to 0, to better align
    >>     >> with array element positions (which also start with 0), and other
    >>     >> languages (JavaScript, etc).
    >>     >> This change will also apply to binary functions (see below) and will
    >>     >> be effective in both SQLPP and AQL.
    >>     >>
    >>     >> The following functions will be affected:
    >>     >> position(),
    >>     >> regexp_position(),
    >>     >> substring()/substr(),
    >>     >> sub_binary(),
    >>     >> find_binary()
    >>     >>
    >>     >> This might be a disrupting change for some users so we will also
    >>     >> introduce a cluster-wide configuration parameter
    >>     >> (“compiler.stringoffset”) for backwards compatibility:
    >>     >> compiler.stringoffset = 0   // first character position is assumed
    >> to
    >>     >> be 0 (new default)
    >>     >> compiler.stringoffset = 1   // first character position is assumed
    >> to
    >>     >> be 1 (backwards-compatible setting)
    >>     >>
    >>     >> The query migration path is straightforward, for example:
    >>     >> substring(“abcdef”, 1) will need to be changed to
    >> substring(“abcdef”,
    >>     >> 0), etc, same applies to sub_binary().
    >>     >> position(), regexp_position(), and find_binary() will return one
    >> less
    >>     >> than they used to, but would still return -1 if the value is not
    >> found.
    >>     >>
    >>     >> Please share your comments and concerns.
    >>     >> Thanks,
    >>     >> -- Dmitry
    >>
    >>
    >>
    >>
    >>
    


Re: Counting character positions inside a string

Posted by Ian Maxon <im...@uci.edu>.
+1

On Tue, Jan 30, 2018 at 12:03 PM, Taewoo Kim <wa...@gmail.com> wrote:
> +1
>
> Best,
> Taewoo
>
> On Tue, Jan 30, 2018 at 11:57 AM, Murtadha Hubail <hu...@gmail.com>
> wrote:
>
>> +1
>>
>> Cheers,
>> Murtadha
>>
>> On 01/30/2018, 10:55 PM, "Mike Carey" <dt...@gmail.com> wrote:
>>
>>     +1
>>
>>     Likewise
>>
>>
>>     On 1/30/18 11:22 AM, Till Westmann wrote:
>>     > Sounds good to me.
>>     >
>>     > +1
>>     >
>>     > Cheers,
>>     > Till
>>     >
>>     > On 29 Jan 2018, at 18:54, Dmitry Lychagin wrote:
>>     >
>>     >> All,
>>     >>
>>     >> We would like to change how string functions count character
>>     >> positions inside a string.
>>     >> Currently string functions position(), substring() and some others
>>     >> assume that the first character is at position 1.
>>     >> The proposal is to change the first position to 0, to better align
>>     >> with array element positions (which also start with 0), and other
>>     >> languages (JavaScript, etc).
>>     >> This change will also apply to binary functions (see below) and will
>>     >> be effective in both SQLPP and AQL.
>>     >>
>>     >> The following functions will be affected:
>>     >> position(),
>>     >> regexp_position(),
>>     >> substring()/substr(),
>>     >> sub_binary(),
>>     >> find_binary()
>>     >>
>>     >> This might be a disrupting change for some users so we will also
>>     >> introduce a cluster-wide configuration parameter
>>     >> (“compiler.stringoffset”) for backwards compatibility:
>>     >> compiler.stringoffset = 0   // first character position is assumed
>> to
>>     >> be 0 (new default)
>>     >> compiler.stringoffset = 1   // first character position is assumed
>> to
>>     >> be 1 (backwards-compatible setting)
>>     >>
>>     >> The query migration path is straightforward, for example:
>>     >> substring(“abcdef”, 1) will need to be changed to
>> substring(“abcdef”,
>>     >> 0), etc, same applies to sub_binary().
>>     >> position(), regexp_position(), and find_binary() will return one
>> less
>>     >> than they used to, but would still return -1 if the value is not
>> found.
>>     >>
>>     >> Please share your comments and concerns.
>>     >> Thanks,
>>     >> -- Dmitry
>>
>>
>>
>>
>>

Re: Counting character positions inside a string

Posted by Taewoo Kim <wa...@gmail.com>.
+1

Best,
Taewoo

On Tue, Jan 30, 2018 at 11:57 AM, Murtadha Hubail <hu...@gmail.com>
wrote:

> +1
>
> Cheers,
> Murtadha
>
> On 01/30/2018, 10:55 PM, "Mike Carey" <dt...@gmail.com> wrote:
>
>     +1
>
>     Likewise
>
>
>     On 1/30/18 11:22 AM, Till Westmann wrote:
>     > Sounds good to me.
>     >
>     > +1
>     >
>     > Cheers,
>     > Till
>     >
>     > On 29 Jan 2018, at 18:54, Dmitry Lychagin wrote:
>     >
>     >> All,
>     >>
>     >> We would like to change how string functions count character
>     >> positions inside a string.
>     >> Currently string functions position(), substring() and some others
>     >> assume that the first character is at position 1.
>     >> The proposal is to change the first position to 0, to better align
>     >> with array element positions (which also start with 0), and other
>     >> languages (JavaScript, etc).
>     >> This change will also apply to binary functions (see below) and will
>     >> be effective in both SQLPP and AQL.
>     >>
>     >> The following functions will be affected:
>     >> position(),
>     >> regexp_position(),
>     >> substring()/substr(),
>     >> sub_binary(),
>     >> find_binary()
>     >>
>     >> This might be a disrupting change for some users so we will also
>     >> introduce a cluster-wide configuration parameter
>     >> (“compiler.stringoffset”) for backwards compatibility:
>     >> compiler.stringoffset = 0   // first character position is assumed
> to
>     >> be 0 (new default)
>     >> compiler.stringoffset = 1   // first character position is assumed
> to
>     >> be 1 (backwards-compatible setting)
>     >>
>     >> The query migration path is straightforward, for example:
>     >> substring(“abcdef”, 1) will need to be changed to
> substring(“abcdef”,
>     >> 0), etc, same applies to sub_binary().
>     >> position(), regexp_position(), and find_binary() will return one
> less
>     >> than they used to, but would still return -1 if the value is not
> found.
>     >>
>     >> Please share your comments and concerns.
>     >> Thanks,
>     >> -- Dmitry
>
>
>
>
>

Re: Counting character positions inside a string

Posted by Murtadha Hubail <hu...@gmail.com>.
+1

Cheers,
Murtadha

On 01/30/2018, 10:55 PM, "Mike Carey" <dt...@gmail.com> wrote:

    +1
    
    Likewise
    
    
    On 1/30/18 11:22 AM, Till Westmann wrote:
    > Sounds good to me.
    >
    > +1
    >
    > Cheers,
    > Till
    >
    > On 29 Jan 2018, at 18:54, Dmitry Lychagin wrote:
    >
    >> All,
    >>
    >> We would like to change how string functions count character 
    >> positions inside a string.
    >> Currently string functions position(), substring() and some others 
    >> assume that the first character is at position 1.
    >> The proposal is to change the first position to 0, to better align 
    >> with array element positions (which also start with 0), and other 
    >> languages (JavaScript, etc).
    >> This change will also apply to binary functions (see below) and will 
    >> be effective in both SQLPP and AQL.
    >>
    >> The following functions will be affected:
    >> position(),
    >> regexp_position(),
    >> substring()/substr(),
    >> sub_binary(),
    >> find_binary()
    >>
    >> This might be a disrupting change for some users so we will also 
    >> introduce a cluster-wide configuration parameter 
    >> (“compiler.stringoffset”) for backwards compatibility:
    >> compiler.stringoffset = 0   // first character position is assumed to 
    >> be 0 (new default)
    >> compiler.stringoffset = 1   // first character position is assumed to 
    >> be 1 (backwards-compatible setting)
    >>
    >> The query migration path is straightforward, for example:
    >> substring(“abcdef”, 1) will need to be changed to substring(“abcdef”, 
    >> 0), etc, same applies to sub_binary().
    >> position(), regexp_position(), and find_binary() will return one less 
    >> than they used to, but would still return -1 if the value is not found.
    >>
    >> Please share your comments and concerns.
    >> Thanks,
    >> -- Dmitry
    
    



Re: Counting character positions inside a string

Posted by Mike Carey <dt...@gmail.com>.
+1

Likewise


On 1/30/18 11:22 AM, Till Westmann wrote:
> Sounds good to me.
>
> +1
>
> Cheers,
> Till
>
> On 29 Jan 2018, at 18:54, Dmitry Lychagin wrote:
>
>> All,
>>
>> We would like to change how string functions count character 
>> positions inside a string.
>> Currently string functions position(), substring() and some others 
>> assume that the first character is at position 1.
>> The proposal is to change the first position to 0, to better align 
>> with array element positions (which also start with 0), and other 
>> languages (JavaScript, etc).
>> This change will also apply to binary functions (see below) and will 
>> be effective in both SQLPP and AQL.
>>
>> The following functions will be affected:
>> position(),
>> regexp_position(),
>> substring()/substr(),
>> sub_binary(),
>> find_binary()
>>
>> This might be a disrupting change for some users so we will also 
>> introduce a cluster-wide configuration parameter 
>> (“compiler.stringoffset”) for backwards compatibility:
>> compiler.stringoffset = 0   // first character position is assumed to 
>> be 0 (new default)
>> compiler.stringoffset = 1   // first character position is assumed to 
>> be 1 (backwards-compatible setting)
>>
>> The query migration path is straightforward, for example:
>> substring(“abcdef”, 1) will need to be changed to substring(“abcdef”, 
>> 0), etc, same applies to sub_binary().
>> position(), regexp_position(), and find_binary() will return one less 
>> than they used to, but would still return -1 if the value is not found.
>>
>> Please share your comments and concerns.
>> Thanks,
>> -- Dmitry


Re: Counting character positions inside a string

Posted by Till Westmann <ti...@apache.org>.
Sounds good to me.

+1

Cheers,
Till

On 29 Jan 2018, at 18:54, Dmitry Lychagin wrote:

> All,
>
> We would like to change how string functions count character positions 
> inside a string.
> Currently string functions position(), substring() and some others 
> assume that the first character is at position 1.
> The proposal is to change the first position to 0, to better align 
> with array element positions (which also start with 0), and other 
> languages (JavaScript, etc).
> This change will also apply to binary functions (see below) and will 
> be effective in both SQLPP and AQL.
>
> The following functions will be affected:
> position(),
> regexp_position(),
> substring()/substr(),
> sub_binary(),
> find_binary()
>
> This might be a disrupting change for some users so we will also 
> introduce a cluster-wide configuration parameter 
> (“compiler.stringoffset”) for backwards compatibility:
> compiler.stringoffset = 0   // first character position is assumed to 
> be 0 (new default)
> compiler.stringoffset = 1   // first character position is assumed to 
> be 1 (backwards-compatible setting)
>
> The query migration path is straightforward, for example:
> substring(“abcdef”, 1) will need to be changed to 
> substring(“abcdef”, 0), etc, same applies to sub_binary().
> position(), regexp_position(), and find_binary() will return one less 
> than they used to, but would still return -1 if the value is not 
> found.
>
> Please share your comments and concerns.
> Thanks,
> -- Dmitry