You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by William A Rowe Jr <wr...@rowe-clan.net> on 2015/11/20 23:11:14 UTC

Provide our own impl of str[n]casecmp()?

Any objections to picking this up for APR 1.next/2.0?

It seems that httpd isn't the only one who wants to be strict about
case-insensitive token string recognition, and non-POSIX char case
gets weird quickly.



---------- Forwarded message ----------
From: <ji...@apache.org>
Date: Fri, Nov 20, 2015 at 12:49 PM
Subject: svn commit: r1715401 - in /httpd/httpd/trunk: include/ap_mmn.h
include/httpd.h server/util.c
To: cvs@httpd.apache.org


Author: jim
Date: Fri Nov 20 18:49:38 2015
New Revision: 1715401

URL: http://svn.apache.org/viewvc?rev=1715401&view=rev
Log:
Provide our own impl of str[n]casecmp()

This simply provides it. Next step is to change all uses of
str[n]casecmp to ap_str[n]casecmp and *then* remove those silly
logic paths where we check the 1st char of a string before
we do the strcasecmp (since this is no longer expensive).

Modified:
    httpd/httpd/trunk/include/ap_mmn.h
    httpd/httpd/trunk/include/httpd.h
    httpd/httpd/trunk/server/util.c

Modified: httpd/httpd/trunk/include/ap_mmn.h
URL:
http://svn.apache.org/viewvc/httpd/httpd/trunk/include/ap_mmn.h?rev=1715401&r1=1715400&r2=1715401&view=diff
==============================================================================
--- httpd/httpd/trunk/include/ap_mmn.h (original)
+++ httpd/httpd/trunk/include/ap_mmn.h Fri Nov 20 18:49:38 2015
@@ -495,6 +495,7 @@
  *                         ap_filter_should_yield(). Add empty and filters
to
  *                         conn_rec.
  * 20150222.6 (2.5.0-dev)  Add async_filter to conn_rec.
+ * 20150222.7 (2.5.0-dev)  Add ap_str[n]casecmp();
  */

 #define MODULE_MAGIC_COOKIE 0x41503235UL /* "AP25" */

Modified: httpd/httpd/trunk/include/httpd.h
URL:
http://svn.apache.org/viewvc/httpd/httpd/trunk/include/httpd.h?rev=1715401&r1=1715400&r2=1715401&view=diff
==============================================================================
--- httpd/httpd/trunk/include/httpd.h (original)
+++ httpd/httpd/trunk/include/httpd.h Fri Nov 20 18:49:38 2015
@@ -2438,6 +2438,27 @@ AP_DECLARE(int) ap_array_str_index(const
 AP_DECLARE(int) ap_array_str_contains(const apr_array_header_t *array,
                                       const char *s);

+/**
+ * Known-fast version of strcasecmp()
+ * @param s1 The 1st string to compare
+ * @param s2 The 2nd string to compare
+ * @return integer greater than, equal to, or less than 0, depending on
+ *         if s1 is lexicographically greater than, equal to, or less
+ *         than s2 ignoring case.
+ */
+AP_DECLARE(int) ap_strcasecmp(const char *s1, const char *s2);
+
+/**
+ * Known-fast version of strncasecmp()
+ * @param s1 The 1st string to compare
+ * @param s2 The 2nd string to compare
+ * @param n  Maximum number of characters in the strings to compare
+ * @return integer greater than, equal to, or less than 0, depending on
+ *         if s1 is lexicographically greater than, equal to, or less
+ *         than s2 ignoring case.
+ */
+AP_DECLARE(int) ap_strncasecmp(const char *s1, const char *s2, apr_size_t
n);
+
 #ifdef __cplusplus
 }
 #endif

Modified: httpd/httpd/trunk/server/util.c
URL:
http://svn.apache.org/viewvc/httpd/httpd/trunk/server/util.c?rev=1715401&r1=1715400&r2=1715401&view=diff
==============================================================================
--- httpd/httpd/trunk/server/util.c (original)
+++ httpd/httpd/trunk/server/util.c Fri Nov 20 18:49:38 2015
@@ -97,7 +97,6 @@
 #undef APLOG_MODULE_INDEX
 #define APLOG_MODULE_INDEX AP_CORE_MODULE_INDEX

-
 /*
  * Examine a field value (such as a media-/content-type) string and return
  * it sans any parameters; e.g., strip off any ';charset=foo' and the like.
@@ -3173,3 +3172,71 @@ AP_DECLARE(int) ap_array_str_contains(co
     return (ap_array_str_index(array, s, 0) >= 0);
 }

+/*
+ * Provide our own known-fast implementation of str[n]casecmp()
+ */
+static const unsigned char ucharmap[] = {
+    0x0,  0x1,  0x2,  0x3,  0x4,  0x5,  0x6,  0x7,
+    0x8,  0x9,  0xa,  0xb,  0xc,  0xd,  0xe,  0xf,
+    0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+    0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
+    0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
+    0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
+    0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
+    0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
+    0x40,  'a',  'b',  'c',  'd',  'e',  'f',  'g',
+     'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
+     'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
+     'x',  'y',  'z', 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
+    0x60,  'a',  'b',  'c',  'd',  'e',  'f',  'g',
+     'h',  'i',  'j',  'k',  'l',  'm',  'n',  'o',
+     'p',  'q',  'r',  's',  't',  'u',  'v',  'w',
+     'x',  'y',  'z', 0x7b, 0x7c, 0x7d, 0x7e, 0x7f,
+    0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
+    0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
+    0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
+    0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
+    0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
+    0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
+    0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
+    0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
+    0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
+    0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
+    0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
+    0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
+    0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff
+};
+
+AP_DECLARE(int) ap_strcasecmp(const char *s1, const char *s2)
+{
+    const unsigned char *ps1 = (const unsigned char *) s1;
+    const unsigned char *ps2 = (const unsigned char *) s2;
+
+    while (ucharmap[*ps1] == ucharmap[*ps2++]) {
+        if (*ps1++ == '\0') {
+            return (0);
+        }
+    }
+    return (ucharmap[*ps1] - ucharmap[*--ps2]);
+}
+
+AP_DECLARE(int) ap_strncasecmp(const char *s1, const char *s2, apr_size_t
n)
+{
+    const unsigned char *ps1 = (const unsigned char *) s1;
+    const unsigned char *ps2 = (const unsigned char *) s2;
+    if (n) {
+        do {
+            if (ucharmap[*ps1] != ucharmap[*ps2++]) {
+                return (ucharmap[*ps1] - ucharmap[*--ps2]);
+            }
+            if (*ps1++ == '\0') {
+                /* we know both end here */
+                return (0);
+            }
+        } while (!--n);
+    }
+    return (0);
+}






On Fri, Nov 20, 2015 at 12:57 PM, <ji...@apache.org> wrote:

> Author: jim
> Date: Fri Nov 20 18:57:36 2015
> New Revision: 1715404
>
> URL: http://svn.apache.org/viewvc?rev=1715404&view=rev
> Log:
> make bill happy (if possible!)
> Note that these are ascii specific.
>
> Modified:
>     httpd/httpd/trunk/include/httpd.h
>     httpd/httpd/trunk/server/util.c
>
> Modified: httpd/httpd/trunk/include/httpd.h
> URL:
> http://svn.apache.org/viewvc/httpd/httpd/trunk/include/httpd.h?rev=1715404&r1=1715403&r2=1715404&view=diff
>
> ==============================================================================
> --- httpd/httpd/trunk/include/httpd.h (original)
> +++ httpd/httpd/trunk/include/httpd.h Fri Nov 20 18:57:36 2015
> @@ -2439,7 +2439,7 @@ AP_DECLARE(int) ap_array_str_contains(co
>                                        const char *s);
>
>  /**
> - * Known-fast version of strcasecmp()
> + * Known-fast version of strcasecmp(): ASCII only
>   * @param s1 The 1st string to compare
>   * @param s2 The 2nd string to compare
>   * @return integer greater than, equal to, or less than 0, depending on
> @@ -2449,7 +2449,7 @@ AP_DECLARE(int) ap_array_str_contains(co
>  AP_DECLARE(int) ap_strcasecmp(const char *s1, const char *s2);
>
>  /**
> - * Known-fast version of strncasecmp()
> + * Known-fast version of strncasecmp(): ASCII only
>   * @param s1 The 1st string to compare
>   * @param s2 The 2nd string to compare
>   * @param n  Maximum number of characters in the strings to compare
>
> Modified: httpd/httpd/trunk/server/util.c
> URL:
> http://svn.apache.org/viewvc/httpd/httpd/trunk/server/util.c?rev=1715404&r1=1715403&r2=1715404&view=diff
>
> ==============================================================================
> --- httpd/httpd/trunk/server/util.c (original)
> +++ httpd/httpd/trunk/server/util.c Fri Nov 20 18:57:36 2015
> @@ -3174,6 +3174,7 @@ AP_DECLARE(int) ap_array_str_contains(co
>
>  /*
>   * Provide our own known-fast implementation of str[n]casecmp()
> + * NOTE: ASCII only!
>   */
>  static const unsigned char ucharmap[] = {
>      0x0,  0x1,  0x2,  0x3,  0x4,  0x5,  0x6,  0x7,
>
>
>

Re: Provide our own impl of str[n]casecmp()?

Posted by Jim Jagielski <ji...@jaguNET.com>.
> On Nov 21, 2015, at 10:39 AM, Yann Ylavic <yl...@gmail.com> wrote:
> 
> On Sat, Nov 21, 2015 at 12:59 PM, Branko Čibej <br...@apache.org> wrote:
>> On 21.11.2015 09:31, Graham Leggett wrote:
>>> On 21 Nov 2015, at 12:11 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
>>> 
>>>> Any objections to picking this up for APR 1.next/2.0?
>>>> 
>>>> It seems that httpd isn't the only one who wants to be strict about
>>>> case-insensitive token string recognition, and non-POSIX char case
>>>> gets weird quickly.
>>> +1 to this.
>>> 
>>> Ideally we should add it to APR, and then provide a convenience function in httpd that has the same implementation when the function in APR is missing, and use the APR function when present.
>> 
>> Does it matter that this implementation assumes that the runtime
>> encoding is a superset of ASCII? (FWIW, it doesn't even handle the
>> Unicode Latin-1 range).
> 
> It doesn't matter IMHO, strcasecmp() is defined in the POSIX ("C")
> locale only, and this implementation is equivalent to any strcasecmp()
> in that locale (though strcasecmp() run in another locale could
> produce different results for chars >127).
> 
> The goal would be an efficient implementation on all platforms, for
> ASCII text only (e.g. tokens), where '\xC9' ('É') would be different
> than '\xE9' ('é') but meh :p
> 
> Maybe we could choose another name to avoid any confusion,
> apr_tokencmp() or apr_casecmpstr[n]() (à la cpystrn)?
> 

Yeah... I like the name apr_casecmpstr[n]()

Re: Provide our own impl of str[n]casecmp()?

Posted by Yann Ylavic <yl...@gmail.com>.
Hello Nadia,

you are probably subsribed to the dev@apr.apache.org mailing list (no BCC here).
To unsubscribe, just send an email to dev-unsubscribe@apr.apache.org.

Regards,
Yann.

On Mon, Nov 23, 2015 at 2:45 AM, Nadia
<na...@fortressintelligence.com.sg> wrote:
> Hello Yann,
>
> I've been in this email thread for the longest time and am not involved in any of this. Please remove my email from the bcc thread please. Thanks.
>
> Yours sincerely,
> Nadia Majeed (R1442647)
> Project Specialist
> Fortress Intelligence Pte Ltd (EA No: 10C4262)
> 10 Anson Road
> #34-11 International Plaza
> Singapore 079903
> Tel: (65) 6334 8311
> Fax: (65) 6334 8511
> nadia@fortressintelligence.com.sg
> www.fortressintelligence.com.sg
>
> -----Original Message-----
> From: Yann Ylavic [mailto:ylavic.dev@gmail.com]
> Sent: Saturday, 21 November, 2015 11:39 PM
> To: apr-dev <de...@apr.apache.org>
> Subject: Re: Provide our own impl of str[n]casecmp()?
>
> On Sat, Nov 21, 2015 at 12:59 PM, Branko Čibej <br...@apache.org> wrote:
>> On 21.11.2015 09:31, Graham Leggett wrote:
>>> On 21 Nov 2015, at 12:11 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
>>>
>>>> Any objections to picking this up for APR 1.next/2.0?
>>>>
>>>> It seems that httpd isn't the only one who wants to be strict about
>>>> case-insensitive token string recognition, and non-POSIX char case
>>>> gets weird quickly.
>>> +1 to this.
>>>
>>> Ideally we should add it to APR, and then provide a convenience function in httpd that has the same implementation when the function in APR is missing, and use the APR function when present.
>>
>> Does it matter that this implementation assumes that the runtime
>> encoding is a superset of ASCII? (FWIW, it doesn't even handle the
>> Unicode Latin-1 range).
>
> It doesn't matter IMHO, strcasecmp() is defined in the POSIX ("C") locale only, and this implementation is equivalent to any strcasecmp() in that locale (though strcasecmp() run in another locale could produce different results for chars >127).
>
> The goal would be an efficient implementation on all platforms, for ASCII text only (e.g. tokens), where '\xC9' ('É') would be different than '\xE9' ('é') but meh :p
>
> Maybe we could choose another name to avoid any confusion,
> apr_tokencmp() or apr_casecmpstr[n]() (à la cpystrn)?
>
>
> Regards,
> Yann.
>
>
>

RE: Provide our own impl of str[n]casecmp()?

Posted by Nadia <na...@fortressintelligence.com.sg>.
Hello Yann,

I've been in this email thread for the longest time and am not involved in any of this. Please remove my email from the bcc thread please. Thanks. 

Yours sincerely,
Nadia Majeed (R1442647)
Project Specialist
Fortress Intelligence Pte Ltd (EA No: 10C4262)
10 Anson Road
#34-11 International Plaza
Singapore 079903
Tel: (65) 6334 8311
Fax: (65) 6334 8511
nadia@fortressintelligence.com.sg
www.fortressintelligence.com.sg

-----Original Message-----
From: Yann Ylavic [mailto:ylavic.dev@gmail.com] 
Sent: Saturday, 21 November, 2015 11:39 PM
To: apr-dev <de...@apr.apache.org>
Subject: Re: Provide our own impl of str[n]casecmp()?

On Sat, Nov 21, 2015 at 12:59 PM, Branko Čibej <br...@apache.org> wrote:
> On 21.11.2015 09:31, Graham Leggett wrote:
>> On 21 Nov 2015, at 12:11 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
>>
>>> Any objections to picking this up for APR 1.next/2.0?
>>>
>>> It seems that httpd isn't the only one who wants to be strict about 
>>> case-insensitive token string recognition, and non-POSIX char case 
>>> gets weird quickly.
>> +1 to this.
>>
>> Ideally we should add it to APR, and then provide a convenience function in httpd that has the same implementation when the function in APR is missing, and use the APR function when present.
>
> Does it matter that this implementation assumes that the runtime 
> encoding is a superset of ASCII? (FWIW, it doesn't even handle the 
> Unicode Latin-1 range).

It doesn't matter IMHO, strcasecmp() is defined in the POSIX ("C") locale only, and this implementation is equivalent to any strcasecmp() in that locale (though strcasecmp() run in another locale could produce different results for chars >127).

The goal would be an efficient implementation on all platforms, for ASCII text only (e.g. tokens), where '\xC9' ('É') would be different than '\xE9' ('é') but meh :p

Maybe we could choose another name to avoid any confusion,
apr_tokencmp() or apr_casecmpstr[n]() (à la cpystrn)?


Regards,
Yann.




Re: Provide our own impl of str[n]casecmp()?

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
It solves a specific issue that in server apps, conforming to an ASCII
derived spec, when not running on the anticipated code page/language
context will normalize comparisons in unexpected ways.  E.g. I == i, but if
the spec is ASCII, then I != ī etc.

It was presented as an optimization but my response can be generalized as
'fix your clib, then!'  But it called out an actual issue that many authors
need to be attentive to.
On Nov 23, 2015 14:30, "Christopher Schultz" <ch...@christopherschultz.net>
wrote:

> All,
>
> Can I ask a stupid question? What does the proposed apr_str[n]casecmp
> function do that POSIX.1 strcasecmp doesn't do?
>
> I guess that's two questions, either of which may be stupid.
>
> Is this is a performance issue? Supporting non-ASCII is a waste of time?
>
> Thanks,
> -chris
>
> On 11/23/15 11:58 AM, William A Rowe Jr wrote:
> > Sorting ASCII tokens still seems valuable for various sorts of
> > optimizations,
> > and it really doesn't carry a significant cpu cost to do so...
> >
> > I'd rather we kept the <0 ! >0 behavior.
> >
> > On Mon, Nov 23, 2015 at 9:44 AM, Jim Jagielski <jim@jagunet.com
> > <ma...@jagunet.com>> wrote:
> >
> >     Should we then adjust docs and usage to remove the "greater/less
> than"
> >     criteria and just say equal strings return 0 and non 0 means that
> >     the strings don't compare/are different?
> >
> >     > On Nov 23, 2015, at 10:19 AM, William A Rowe Jr
> >     <wrowe@rowe-clan.net <ma...@rowe-clan.net>> wrote:
> >     >
> >     > On Mon, Nov 23, 2015 at 2:11 AM, Branko Čibej <brane@apache.org
> >     <ma...@apache.org>> wrote:
> >     >
> >     > +1 to apr_casecmpstr[n]() with a big fat warning in the docstring
> that
> >     > it works for ASCII only.
> >     >
> >     > Well, it 'works' (does not segfault, does not case fold them) for
> >     high bit
> >     > characters, but sorts them in a potentially meaningless way.  The
> >     Current
> >     > implementation has already drifted; the currently accepted flavor
> >     looks like;
> >     >
> >     > 2441
> >     > /**
> >     >
> >     > 2442
> >     >  * Known-fast version of strcasecmp(): ASCII case-folding, POSIX
> >     compliant
> >     >
> >     > 2443
> >     >  * @param s1 The 1st string to compare
> >     >
> >     > 2444
> >     >  * @param s2 The 2nd string to compare
> >     >
> >     > 2445
> >     >  * @return integer greater than, equal to, or less than 0,
> >     depending on
> >     >
> >     > 2446
> >     >  *         if s1 is lexicographically greater than, equal to, or
> less
> >     >
> >     > 2447
> >     >  *         than s2 ignoring case.
> >     >
> >     > 2448
> >     >  */
> >     >
> >     > 2449
> >     > AP_DECLARE(int) ap_casecmpstr(const char *s1, const char *s2);
> >     >
> >     > 2450
> >     > 2451
> >     > /**
> >     >
> >     > 2452
> >     >  * Known-fast version of strncasecmp(): ASCII case-folding, POSIX
> >     compliant
> >     >
> >     > 2453
> >     >  * @param s1 The 1st string to compare
> >     >
> >     > 2454
> >     >  * @param s2 The 2nd string to compare
> >     >
> >     > 2455
> >     >  * @param n  Maximum number of characters in the strings to compare
> >     >
> >     > 2456
> >     >  * @return integer greater than, equal to, or less than 0,
> >     depending on
> >     >
> >     > 2457
> >     >  *         if s1 is lexicographically greater than, equal to, or
> less
> >     >
> >     > 2458
> >     >  *         than s2 ignoring case.
> >     >
> >     > 2459
> >     >  */
> >     >
> >     > 2460
> >     > AP_DECLARE(int) ap_casecmpstrn(const char *s1, const char *s2,
> >     apr_size_t n);
> >     >
> >     >
> >     >
> >     > and is implemented here;
> >     >
> >     >
> >
> http://svn.apache.org/viewvc/httpd/httpd/trunk/server/util.c?view=markup&pathrev=1715736#l3175
> >     >
> >     >
> >     >
> >
> >
>

Re: Provide our own impl of str[n]casecmp()?

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
Sorting ASCII tokens still seems valuable for various sorts of
optimizations,
and it really doesn't carry a significant cpu cost to do so...

I'd rather we kept the <0 ! >0 behavior.

On Mon, Nov 23, 2015 at 9:44 AM, Jim Jagielski <ji...@jagunet.com> wrote:

> Should we then adjust docs and usage to remove the "greater/less than"
> criteria and just say equal strings return 0 and non 0 means that
> the strings don't compare/are different?
>
> > On Nov 23, 2015, at 10:19 AM, William A Rowe Jr <wr...@rowe-clan.net>
> wrote:
> >
> > On Mon, Nov 23, 2015 at 2:11 AM, Branko Čibej <br...@apache.org> wrote:
> >
> > +1 to apr_casecmpstr[n]() with a big fat warning in the docstring that
> > it works for ASCII only.
> >
> > Well, it 'works' (does not segfault, does not case fold them) for high
> bit
> > characters, but sorts them in a potentially meaningless way.  The Current
> > implementation has already drifted; the currently accepted flavor looks
> like;
> >
> > 2441
> > /**
> >
> > 2442
> >  * Known-fast version of strcasecmp(): ASCII case-folding, POSIX
> compliant
> >
> > 2443
> >  * @param s1 The 1st string to compare
> >
> > 2444
> >  * @param s2 The 2nd string to compare
> >
> > 2445
> >  * @return integer greater than, equal to, or less than 0, depending on
> >
> > 2446
> >  *         if s1 is lexicographically greater than, equal to, or less
> >
> > 2447
> >  *         than s2 ignoring case.
> >
> > 2448
> >  */
> >
> > 2449
> > AP_DECLARE(int) ap_casecmpstr(const char *s1, const char *s2);
> >
> > 2450
> > 2451
> > /**
> >
> > 2452
> >  * Known-fast version of strncasecmp(): ASCII case-folding, POSIX
> compliant
> >
> > 2453
> >  * @param s1 The 1st string to compare
> >
> > 2454
> >  * @param s2 The 2nd string to compare
> >
> > 2455
> >  * @param n  Maximum number of characters in the strings to compare
> >
> > 2456
> >  * @return integer greater than, equal to, or less than 0, depending on
> >
> > 2457
> >  *         if s1 is lexicographically greater than, equal to, or less
> >
> > 2458
> >  *         than s2 ignoring case.
> >
> > 2459
> >  */
> >
> > 2460
> > AP_DECLARE(int) ap_casecmpstrn(const char *s1, const char *s2,
> apr_size_t n);
> >
> >
> >
> > and is implemented here;
> >
> >
> http://svn.apache.org/viewvc/httpd/httpd/trunk/server/util.c?view=markup&pathrev=1715736#l3175
> >
> >
> >
>
>

Re: Provide our own impl of str[n]casecmp()?

Posted by Jim Jagielski <ji...@jaguNET.com>.
Should we then adjust docs and usage to remove the "greater/less than"
criteria and just say equal strings return 0 and non 0 means that
the strings don't compare/are different?

> On Nov 23, 2015, at 10:19 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
> 
> On Mon, Nov 23, 2015 at 2:11 AM, Branko Čibej <br...@apache.org> wrote:
> 
> +1 to apr_casecmpstr[n]() with a big fat warning in the docstring that
> it works for ASCII only.
> 
> Well, it 'works' (does not segfault, does not case fold them) for high bit 
> characters, but sorts them in a potentially meaningless way.  The Current
> implementation has already drifted; the currently accepted flavor looks like;
> 
> 2441	
> /**
> 
> 2442	
>  * Known-fast version of strcasecmp(): ASCII case-folding, POSIX compliant
> 
> 2443	
>  * @param s1 The 1st string to compare
> 
> 2444	
>  * @param s2 The 2nd string to compare
> 
> 2445	
>  * @return integer greater than, equal to, or less than 0, depending on
> 
> 2446	
>  *         if s1 is lexicographically greater than, equal to, or less
> 
> 2447	
>  *         than s2 ignoring case.
> 
> 2448	
>  */
> 
> 2449	
> AP_DECLARE(int) ap_casecmpstr(const char *s1, const char *s2);
> 
> 2450	
> 2451	
> /**
> 
> 2452	
>  * Known-fast version of strncasecmp(): ASCII case-folding, POSIX compliant
> 
> 2453	
>  * @param s1 The 1st string to compare
> 
> 2454	
>  * @param s2 The 2nd string to compare
> 
> 2455	
>  * @param n  Maximum number of characters in the strings to compare
> 
> 2456	
>  * @return integer greater than, equal to, or less than 0, depending on
> 
> 2457	
>  *         if s1 is lexicographically greater than, equal to, or less
> 
> 2458	
>  *         than s2 ignoring case.
> 
> 2459	
>  */
> 
> 2460	
> AP_DECLARE(int) ap_casecmpstrn(const char *s1, const char *s2, apr_size_t n);
> 
> 
> 
> and is implemented here;
> 
> http://svn.apache.org/viewvc/httpd/httpd/trunk/server/util.c?view=markup&pathrev=1715736#l3175
> 
>  
> 


Re: Provide our own impl of str[n]casecmp()?

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
On Mon, Nov 23, 2015 at 2:11 AM, Branko Čibej <br...@apache.org> wrote:

>
> +1 to apr_casecmpstr[n]() with a big fat warning in the docstring that
> it works for ASCII only.
>

Well, it 'works' (does not segfault, does not case fold them) for high bit
characters, but sorts them in a potentially meaningless way.  The Current
implementation has already drifted; the currently accepted flavor looks
like;

2441/** 2442 * Known-fast version of strcasecmp(): ASCII case-folding,
POSIX compliant 2443 * @param s1 The 1st string to compare 2444 * @param s2
The 2nd string to compare 2445 * @return integer greater than, equal to, or
less than 0, depending on 2446 * if s1 is lexicographically greater than,
equal to, or less 2447 * than s2 ignoring case. 2448 */ 2449AP_DECLARE(int)
ap_casecmpstr(const char *s1, const char *s2); 2450 2451/** 2452 *
Known-fast version of strncasecmp(): ASCII case-folding, POSIX compliant
2453 * @param s1 The 1st string to compare 2454 * @param s2 The 2nd string
to compare 2455 * @param n Maximum number of characters in the strings to
compare 2456 * @return integer greater than, equal to, or less than 0,
depending on 2457 * if s1 is lexicographically greater than, equal to, or
less 2458 * than s2 ignoring case. 2459 */ 2460AP_DECLARE(int)
ap_casecmpstrn(const char *s1, const char *s2, apr_size_t n);


and is implemented here;

http://svn.apache.org/viewvc/httpd/httpd/trunk/server/util.c?view=markup&pathrev=1715736#l3175

Re: Provide our own impl of str[n]casecmp()?

Posted by Branko Čibej <br...@apache.org>.
On 21.11.2015 16:39, Yann Ylavic wrote:
> On Sat, Nov 21, 2015 at 12:59 PM, Branko Čibej <br...@apache.org> wrote:
>> On 21.11.2015 09:31, Graham Leggett wrote:
>>> On 21 Nov 2015, at 12:11 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
>>>
>>>> Any objections to picking this up for APR 1.next/2.0?
>>>>
>>>> It seems that httpd isn't the only one who wants to be strict about
>>>> case-insensitive token string recognition, and non-POSIX char case
>>>> gets weird quickly.
>>> +1 to this.
>>>
>>> Ideally we should add it to APR, and then provide a convenience function in httpd that has the same implementation when the function in APR is missing, and use the APR function when present.
>> Does it matter that this implementation assumes that the runtime
>> encoding is a superset of ASCII? (FWIW, it doesn't even handle the
>> Unicode Latin-1 range).
> It doesn't matter IMHO, strcasecmp() is defined in the POSIX ("C")
> locale only, and this implementation is equivalent to any strcasecmp()
> in that locale (though strcasecmp() run in another locale could
> produce different results for chars >127).
>
> The goal would be an efficient implementation on all platforms, for
> ASCII text only (e.g. tokens), where '\xC9' ('É') would be different
> than '\xE9' ('é') but meh :p
>
> Maybe we could choose another name to avoid any confusion,
> apr_tokencmp() or apr_casecmpstr[n]() (à la cpystrn)?

+1 to apr_casecmpstr[n]() with a big fat warning in the docstring that
it works for ASCII only.

-- Brane


Re: Provide our own impl of str[n]casecmp()?

Posted by Yann Ylavic <yl...@gmail.com>.
On Sat, Nov 21, 2015 at 12:59 PM, Branko Čibej <br...@apache.org> wrote:
> On 21.11.2015 09:31, Graham Leggett wrote:
>> On 21 Nov 2015, at 12:11 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
>>
>>> Any objections to picking this up for APR 1.next/2.0?
>>>
>>> It seems that httpd isn't the only one who wants to be strict about
>>> case-insensitive token string recognition, and non-POSIX char case
>>> gets weird quickly.
>> +1 to this.
>>
>> Ideally we should add it to APR, and then provide a convenience function in httpd that has the same implementation when the function in APR is missing, and use the APR function when present.
>
> Does it matter that this implementation assumes that the runtime
> encoding is a superset of ASCII? (FWIW, it doesn't even handle the
> Unicode Latin-1 range).

It doesn't matter IMHO, strcasecmp() is defined in the POSIX ("C")
locale only, and this implementation is equivalent to any strcasecmp()
in that locale (though strcasecmp() run in another locale could
produce different results for chars >127).

The goal would be an efficient implementation on all platforms, for
ASCII text only (e.g. tokens), where '\xC9' ('É') would be different
than '\xE9' ('é') but meh :p

Maybe we could choose another name to avoid any confusion,
apr_tokencmp() or apr_casecmpstr[n]() (à la cpystrn)?


Regards,
Yann.

Re: Provide our own impl of str[n]casecmp()?

Posted by Branko Čibej <br...@apache.org>.
On 21.11.2015 09:31, Graham Leggett wrote:
> On 21 Nov 2015, at 12:11 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
>
>> Any objections to picking this up for APR 1.next/2.0?
>>
>> It seems that httpd isn't the only one who wants to be strict about 
>> case-insensitive token string recognition, and non-POSIX char case
>> gets weird quickly.
> +1 to this.
>
> Ideally we should add it to APR, and then provide a convenience function in httpd that has the same implementation when the function in APR is missing, and use the APR function when present.

Does it matter that this implementation assumes that the runtime
encoding is a superset of ASCII? (FWIW, it doesn't even handle the
Unicode Latin-1 range).

-- Brane

Re: Provide our own impl of str[n]casecmp()?

Posted by Graham Leggett <mi...@sharp.fm>.
On 21 Nov 2015, at 12:11 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:

> Any objections to picking this up for APR 1.next/2.0?
> 
> It seems that httpd isn't the only one who wants to be strict about 
> case-insensitive token string recognition, and non-POSIX char case
> gets weird quickly.

+1 to this.

Ideally we should add it to APR, and then provide a convenience function in httpd that has the same implementation when the function in APR is missing, and use the APR function when present.

Regards,
Graham
—