You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2015/10/22 17:06:24 UTC

[lucy-dev] Str_Find return type

On 18/10/2015 14:01, Nick Wellnhofer wrote:
>>> - Make `Find` return a size_t.
>>>    Requires special value for "not found".
>>
>> Hmm, that's a toughie.  If the sentinel is SIZE_MAX, that might not fit in all
>> host numeric types.
>
> Good point. The problem is not so much the byte size of the return type but
> the fact that a host language might not support unsigned integers. Maybe we
> should limit string sizes to SSIZE_MAX and make `Find` return an ssize_t. But
> this requires to emulate the ssize_t type on non-POSIX platforms.

Here's another idea. Most of the time, users of Str_Find don't care about the 
position of the substring and only want to know whether the substring is 
contained or not. For this use case, a method like Str_Contains returning a 
bool is a more appropriate interface.

If someone is interested in the exact position of the substring, it might make 
more sense to return a string iterator pointing to the first occurrence of the 
substring. So what about:

     public bool
     Contains(String *self, String *substring);

     public incremented nullable StringIterator*
     Find(String *self, String *substring);

Nick


Re: [lucy-dev] Str_Find return type

Posted by Nick Wellnhofer <we...@aevum.de>.
On 22/10/2015 23:21, Marvin Humphrey wrote:
> You're absolutely right, avoiding an index which counts code points is
> consistent with our iterator-centric model for string processing.  Good
> insight, and nice API proposal!

I'm also tempted to remove Str_Code_Point_At and make string iterators the 
only way to access character data in a string.

Nick


Re: [lucy-dev] Str_Find return type

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Oct 22, 2015 at 8:06 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 18/10/2015 14:01, Nick Wellnhofer wrote:

>>>> - Make `Find` return a size_t.
>>>>   Requires special value for "not found".
>>>
>>> Hmm, that's a toughie.  If the sentinel is SIZE_MAX, that might not fit in
>>> all host numeric types.
>>
>>
>> Good point. The problem is not so much the byte size of the return type but
>> the fact that a host language might not support unsigned integers. Maybe we
>> should limit string sizes to SSIZE_MAX and make `Find` return an ssize_t.
>> But this requires to emulate the ssize_t type on non-POSIX platforms.
>
> Here's another idea. Most of the time, users of Str_Find don't care about
> the position of the substring and only want to know whether the substring is
> contained or not. For this use case, a method like Str_Contains returning a
> bool is a more appropriate interface.
>
> If someone is interested in the exact position of the substring, it might
> make more sense to return a string iterator pointing to the first occurrence
> of the substring. So what about:
>
>     public bool
>     Contains(String *self, String *substring);
>
>     public incremented nullable StringIterator*
>     Find(String *self, String *substring);

+1

You're absolutely right, avoiding an index which counts code points is
consistent with our iterator-centric model for string processing.  Good
insight, and nice API proposal!

Marvin Humphrey