You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2015/10/22 17:06:24 UTC
[lucy-dev] Str_Find return type
On 18/10/2015 14:01, Nick Wellnhofer wrote:
>>> - Make `Find` return a size_t.
>>> Requires special value for "not found".
>>
>> Hmm, that's a toughie. If the sentinel is SIZE_MAX, that might not fit in all
>> host numeric types.
>
> Good point. The problem is not so much the byte size of the return type but
> the fact that a host language might not support unsigned integers. Maybe we
> should limit string sizes to SSIZE_MAX and make `Find` return an ssize_t. But
> this requires to emulate the ssize_t type on non-POSIX platforms.
Here's another idea. Most of the time, users of Str_Find don't care about the
position of the substring and only want to know whether the substring is
contained or not. For this use case, a method like Str_Contains returning a
bool is a more appropriate interface.
If someone is interested in the exact position of the substring, it might make
more sense to return a string iterator pointing to the first occurrence of the
substring. So what about:
public bool
Contains(String *self, String *substring);
public incremented nullable StringIterator*
Find(String *self, String *substring);
Nick
Re: [lucy-dev] Str_Find return type
Posted by Nick Wellnhofer <we...@aevum.de>.
On 22/10/2015 23:21, Marvin Humphrey wrote:
> You're absolutely right, avoiding an index which counts code points is
> consistent with our iterator-centric model for string processing. Good
> insight, and nice API proposal!
I'm also tempted to remove Str_Code_Point_At and make string iterators the
only way to access character data in a string.
Nick
Re: [lucy-dev] Str_Find return type
Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Oct 22, 2015 at 8:06 AM, Nick Wellnhofer <we...@aevum.de> wrote:
> On 18/10/2015 14:01, Nick Wellnhofer wrote:
>>>> - Make `Find` return a size_t.
>>>> Requires special value for "not found".
>>>
>>> Hmm, that's a toughie. If the sentinel is SIZE_MAX, that might not fit in
>>> all host numeric types.
>>
>>
>> Good point. The problem is not so much the byte size of the return type but
>> the fact that a host language might not support unsigned integers. Maybe we
>> should limit string sizes to SSIZE_MAX and make `Find` return an ssize_t.
>> But this requires to emulate the ssize_t type on non-POSIX platforms.
>
> Here's another idea. Most of the time, users of Str_Find don't care about
> the position of the substring and only want to know whether the substring is
> contained or not. For this use case, a method like Str_Contains returning a
> bool is a more appropriate interface.
>
> If someone is interested in the exact position of the substring, it might
> make more sense to return a string iterator pointing to the first occurrence
> of the substring. So what about:
>
> public bool
> Contains(String *self, String *substring);
>
> public incremented nullable StringIterator*
> Find(String *self, String *substring);
+1
You're absolutely right, avoiding an index which counts code points is
consistent with our iterator-centric model for string processing. Good
insight, and nice API proposal!
Marvin Humphrey