You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Jianfeng Jia <ji...@gmail.com> on 2015/09/29 03:03:40 UTC

Undefined behavior for substring-before() and substring-after() in match-not-found case

Hi Devs,

Another question about the string functions.

The example code on the http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions <http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions> shows that these two function are suppose to be called after contains(). I wonder what is the expected behavior if the they can't find the match pattern?

The current result is confusing. 

e.g. 
let $x := "substring"
return [ substring-before($x, "subx"), substring-after($x, “subx”)]

it will return 
[ [ "subst", "" ]
 ]
Should we always return an empty string in such case, or throw an exception like “you shall filter the result by contain() first” ? 
IMHO, I’d like to return a null string. Any opinion?


Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Undefined behavior for substring-before() and substring-after() in match-not-found case

Posted by Mike Carey <dt...@gmail.com>.
Agreed!

On 9/28/15 9:59 PM, Chris Hillery wrote:
> Beyond the signature, the documentation of the XQuery function says :
>
> "If the value of $arg1 does not contain a string that is equal to the value
> of $arg2, then the function returns the zero-length string."
>
> So I'd say your inference is correct.
>
> The XQuery doc also explains what happens if $arg1 or $arg2 is empty, and
> we should probably emulate that as well.
>
> Ceej
> aka Chris Hillery
> On Sep 28, 2015 9:29 PM, "Jianfeng Jia" <ji...@gmail.com> wrote:
>
>> Thanks for the great summary provided by Taewoo!
>>
>> The XQuery’s signature shows that it always returns a string:
>> fn:substring-before($arg1 as xs:string?, $arg2 as xs:string?) as xs:string
>>
>> And the Marklogic's returns an option[string].
>> fn.substringBefore(
>>     $input <https://docs.marklogic.com/fn.substringBefore#input> as
>> String?,
>>     $before <https://docs.marklogic.com/fn.substringBefore#before> as
>> String?,
>>     [$collation <https://docs.marklogic.com/fn.substringBefore#collation>
>> as String]
>> ) as String?
>> Since all the rest string functions are either return a string or throw
>> exceptions, I think return an empty string should be a consistent behavior.
>>
>>
>>> On Sep 28, 2015, at 9:12 PM, Mike Carey <dt...@gmail.com> wrote:
>>>
>>> Yes, and the Marklogic entry reminded me - the answer should probably be
>> modeled (for us) after XQuery - where the answers are fully spelled out
>> already (having been debated by a group of smart people first and
>> implemented by a bunch of XQuery engine providers):
>>> http://www.w3.org/TR/xpath-functions-30/#func-substring-before
>>> http://www.w3.org/TR/xpath-functions-30/#func-substring-after
>>> Cheers,
>>> Mike
>>>
>>> On 9/28/15 6:10 PM, Taewoo Kim wrote:
>>>> Perhaps we can start from here:
>>>>
>> https://docs.google.com/spreadsheets/d/1j6_YSCc_8gEReAWFP84geI30wlnsz7uMFq4TCm7GRz8/edit?usp=sharing
>>>>
>>>> Best,
>>>> Taewoo
>>>>
>>>> On Mon, Sep 28, 2015 at 6:05 PM, Mike Carey <dt...@gmail.com> wrote:
>>>>
>>>>> At times like this it's useful to take a quick look at what other
>> systems
>>>>> do, if they have such functions - e.g., are there precedents we should
>> base
>>>>> our answer on?  (In Java, Postgres, MySQL, ...)
>>>>>
>>>>>
>>>>> On 9/28/15 6:03 PM, Jianfeng Jia wrote:
>>>>>
>>>>>> Hi Devs,
>>>>>>
>>>>>> Another question about the string functions.
>>>>>>
>>>>>> The example code on the
>>>>>>
>> http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions
>>>>>> <
>>>>>>
>> http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions
>>>>>> shows that these two function are suppose to be called after
>> contains(). I
>>>>>> wonder what is the expected behavior if the they can't find the match
>>>>>> pattern?
>>>>>>
>>>>>> The current result is confusing.
>>>>>>
>>>>>> e.g.
>>>>>> let $x := "substring"
>>>>>> return [ substring-before($x, "subx"), substring-after($x, “subx”)]
>>>>>>
>>>>>> it will return
>>>>>> [ [ "subst", "" ]
>>>>>>    ]
>>>>>> Should we always return an empty string in such case, or throw an
>>>>>> exception like “you shall filter the result by contain() first” ?
>>>>>> IMHO, I’d like to return a null string. Any opinion?
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Jianfeng Jia
>>>>>> PhD Candidate of Computer Science
>>>>>> University of California, Irvine
>>>>>>
>>>>>>
>>>>>>
>>
>>
>> Best,
>>
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>>
>>


Re: Undefined behavior for substring-before() and substring-after() in match-not-found case

Posted by Chris Hillery <ch...@hillery.land>.
Beyond the signature, the documentation of the XQuery function says :

"If the value of $arg1 does not contain a string that is equal to the value
of $arg2, then the function returns the zero-length string."

So I'd say your inference is correct.

The XQuery doc also explains what happens if $arg1 or $arg2 is empty, and
we should probably emulate that as well.

Ceej
aka Chris Hillery
On Sep 28, 2015 9:29 PM, "Jianfeng Jia" <ji...@gmail.com> wrote:

> Thanks for the great summary provided by Taewoo!
>
> The XQuery’s signature shows that it always returns a string:
> fn:substring-before($arg1 as xs:string?, $arg2 as xs:string?) as xs:string
>
> And the Marklogic's returns an option[string].
> fn.substringBefore(
>    $input <https://docs.marklogic.com/fn.substringBefore#input> as
> String?,
>    $before <https://docs.marklogic.com/fn.substringBefore#before> as
> String?,
>    [$collation <https://docs.marklogic.com/fn.substringBefore#collation>
> as String]
> ) as String?
> Since all the rest string functions are either return a string or throw
> exceptions, I think return an empty string should be a consistent behavior.
>
>
> > On Sep 28, 2015, at 9:12 PM, Mike Carey <dt...@gmail.com> wrote:
> >
> > Yes, and the Marklogic entry reminded me - the answer should probably be
> modeled (for us) after XQuery - where the answers are fully spelled out
> already (having been debated by a group of smart people first and
> implemented by a bunch of XQuery engine providers):
> > http://www.w3.org/TR/xpath-functions-30/#func-substring-before
> > http://www.w3.org/TR/xpath-functions-30/#func-substring-after
> > Cheers,
> > Mike
> >
> > On 9/28/15 6:10 PM, Taewoo Kim wrote:
> >> Perhaps we can start from here:
> >>
> https://docs.google.com/spreadsheets/d/1j6_YSCc_8gEReAWFP84geI30wlnsz7uMFq4TCm7GRz8/edit?usp=sharing
> >>
> >>
> >> Best,
> >> Taewoo
> >>
> >> On Mon, Sep 28, 2015 at 6:05 PM, Mike Carey <dt...@gmail.com> wrote:
> >>
> >>> At times like this it's useful to take a quick look at what other
> systems
> >>> do, if they have such functions - e.g., are there precedents we should
> base
> >>> our answer on?  (In Java, Postgres, MySQL, ...)
> >>>
> >>>
> >>> On 9/28/15 6:03 PM, Jianfeng Jia wrote:
> >>>
> >>>> Hi Devs,
> >>>>
> >>>> Another question about the string functions.
> >>>>
> >>>> The example code on the
> >>>>
> http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions
> >>>> <
> >>>>
> http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions
> >
> >>>> shows that these two function are suppose to be called after
> contains(). I
> >>>> wonder what is the expected behavior if the they can't find the match
> >>>> pattern?
> >>>>
> >>>> The current result is confusing.
> >>>>
> >>>> e.g.
> >>>> let $x := "substring"
> >>>> return [ substring-before($x, "subx"), substring-after($x, “subx”)]
> >>>>
> >>>> it will return
> >>>> [ [ "subst", "" ]
> >>>>   ]
> >>>> Should we always return an empty string in such case, or throw an
> >>>> exception like “you shall filter the result by contain() first” ?
> >>>> IMHO, I’d like to return a null string. Any opinion?
> >>>>
> >>>>
> >>>> Best,
> >>>>
> >>>> Jianfeng Jia
> >>>> PhD Candidate of Computer Science
> >>>> University of California, Irvine
> >>>>
> >>>>
> >>>>
> >
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>

Re: Undefined behavior for substring-before() and substring-after() in match-not-found case

Posted by Jianfeng Jia <ji...@gmail.com>.
Thanks for the great summary provided by Taewoo!

The XQuery’s signature shows that it always returns a string: 
fn:substring-before($arg1 as xs:string?, $arg2 as xs:string?) as xs:string

And the Marklogic's returns an option[string].
fn.substringBefore(
   $input <https://docs.marklogic.com/fn.substringBefore#input> as String?,
   $before <https://docs.marklogic.com/fn.substringBefore#before> as String?,
   [$collation <https://docs.marklogic.com/fn.substringBefore#collation> as String]
) as String?
Since all the rest string functions are either return a string or throw exceptions, I think return an empty string should be a consistent behavior.


> On Sep 28, 2015, at 9:12 PM, Mike Carey <dt...@gmail.com> wrote:
> 
> Yes, and the Marklogic entry reminded me - the answer should probably be modeled (for us) after XQuery - where the answers are fully spelled out already (having been debated by a group of smart people first and implemented by a bunch of XQuery engine providers):
> http://www.w3.org/TR/xpath-functions-30/#func-substring-before
> http://www.w3.org/TR/xpath-functions-30/#func-substring-after
> Cheers,
> Mike
> 
> On 9/28/15 6:10 PM, Taewoo Kim wrote:
>> Perhaps we can start from here:
>> https://docs.google.com/spreadsheets/d/1j6_YSCc_8gEReAWFP84geI30wlnsz7uMFq4TCm7GRz8/edit?usp=sharing
>> 
>> 
>> Best,
>> Taewoo
>> 
>> On Mon, Sep 28, 2015 at 6:05 PM, Mike Carey <dt...@gmail.com> wrote:
>> 
>>> At times like this it's useful to take a quick look at what other systems
>>> do, if they have such functions - e.g., are there precedents we should base
>>> our answer on?  (In Java, Postgres, MySQL, ...)
>>> 
>>> 
>>> On 9/28/15 6:03 PM, Jianfeng Jia wrote:
>>> 
>>>> Hi Devs,
>>>> 
>>>> Another question about the string functions.
>>>> 
>>>> The example code on the
>>>> http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions
>>>> <
>>>> http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions>
>>>> shows that these two function are suppose to be called after contains(). I
>>>> wonder what is the expected behavior if the they can't find the match
>>>> pattern?
>>>> 
>>>> The current result is confusing.
>>>> 
>>>> e.g.
>>>> let $x := "substring"
>>>> return [ substring-before($x, "subx"), substring-after($x, “subx”)]
>>>> 
>>>> it will return
>>>> [ [ "subst", "" ]
>>>>   ]
>>>> Should we always return an empty string in such case, or throw an
>>>> exception like “you shall filter the result by contain() first” ?
>>>> IMHO, I’d like to return a null string. Any opinion?
>>>> 
>>>> 
>>>> Best,
>>>> 
>>>> Jianfeng Jia
>>>> PhD Candidate of Computer Science
>>>> University of California, Irvine
>>>> 
>>>> 
>>>> 
> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Undefined behavior for substring-before() and substring-after() in match-not-found case

Posted by Mike Carey <dt...@gmail.com>.
Yes, and the Marklogic entry reminded me - the answer should probably be 
modeled (for us) after XQuery - where the answers are fully spelled out 
already (having been debated by a group of smart people first and 
implemented by a bunch of XQuery engine providers):
http://www.w3.org/TR/xpath-functions-30/#func-substring-before
http://www.w3.org/TR/xpath-functions-30/#func-substring-after
Cheers,
Mike

On 9/28/15 6:10 PM, Taewoo Kim wrote:
> Perhaps we can start from here:
> https://docs.google.com/spreadsheets/d/1j6_YSCc_8gEReAWFP84geI30wlnsz7uMFq4TCm7GRz8/edit?usp=sharing
>
>
> Best,
> Taewoo
>
> On Mon, Sep 28, 2015 at 6:05 PM, Mike Carey <dt...@gmail.com> wrote:
>
>> At times like this it's useful to take a quick look at what other systems
>> do, if they have such functions - e.g., are there precedents we should base
>> our answer on?  (In Java, Postgres, MySQL, ...)
>>
>>
>> On 9/28/15 6:03 PM, Jianfeng Jia wrote:
>>
>>> Hi Devs,
>>>
>>> Another question about the string functions.
>>>
>>> The example code on the
>>> http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions
>>> <
>>> http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions>
>>> shows that these two function are suppose to be called after contains(). I
>>> wonder what is the expected behavior if the they can't find the match
>>> pattern?
>>>
>>> The current result is confusing.
>>>
>>> e.g.
>>> let $x := "substring"
>>> return [ substring-before($x, "subx"), substring-after($x, “subx”)]
>>>
>>> it will return
>>> [ [ "subst", "" ]
>>>    ]
>>> Should we always return an empty string in such case, or throw an
>>> exception like “you shall filter the result by contain() first” ?
>>> IMHO, I’d like to return a null string. Any opinion?
>>>
>>>
>>> Best,
>>>
>>> Jianfeng Jia
>>> PhD Candidate of Computer Science
>>> University of California, Irvine
>>>
>>>
>>>


Re: Undefined behavior for substring-before() and substring-after() in match-not-found case

Posted by Taewoo Kim <wa...@gmail.com>.
Perhaps we can start from here:
https://docs.google.com/spreadsheets/d/1j6_YSCc_8gEReAWFP84geI30wlnsz7uMFq4TCm7GRz8/edit?usp=sharing


Best,
Taewoo

On Mon, Sep 28, 2015 at 6:05 PM, Mike Carey <dt...@gmail.com> wrote:

> At times like this it's useful to take a quick look at what other systems
> do, if they have such functions - e.g., are there precedents we should base
> our answer on?  (In Java, Postgres, MySQL, ...)
>
>
> On 9/28/15 6:03 PM, Jianfeng Jia wrote:
>
>> Hi Devs,
>>
>> Another question about the string functions.
>>
>> The example code on the
>> http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions
>> <
>> http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions>
>> shows that these two function are suppose to be called after contains(). I
>> wonder what is the expected behavior if the they can't find the match
>> pattern?
>>
>> The current result is confusing.
>>
>> e.g.
>> let $x := "substring"
>> return [ substring-before($x, "subx"), substring-after($x, “subx”)]
>>
>> it will return
>> [ [ "subst", "" ]
>>   ]
>> Should we always return an empty string in such case, or throw an
>> exception like “you shall filter the result by contain() first” ?
>> IMHO, I’d like to return a null string. Any opinion?
>>
>>
>> Best,
>>
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>>
>>
>>
>

Re: Undefined behavior for substring-before() and substring-after() in match-not-found case

Posted by Mike Carey <dt...@gmail.com>.
At times like this it's useful to take a quick look at what other 
systems do, if they have such functions - e.g., are there precedents we 
should base our answer on?  (In Java, Postgres, MySQL, ...)

On 9/28/15 6:03 PM, Jianfeng Jia wrote:
> Hi Devs,
>
> Another question about the string functions.
>
> The example code on the http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions <http://asterixdb.ics.uci.edu/documentation/aql/functions.html#StringFunctions> shows that these two function are suppose to be called after contains(). I wonder what is the expected behavior if the they can't find the match pattern?
>
> The current result is confusing.
>
> e.g.
> let $x := "substring"
> return [ substring-before($x, "subx"), substring-after($x, “subx”)]
>
> it will return
> [ [ "subst", "" ]
>   ]
> Should we always return an empty string in such case, or throw an exception like “you shall filter the result by contain() first” ?
> IMHO, I’d like to return a null string. Any opinion?
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>