You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by William A Rowe Jr <wr...@rowe-clan.net> on 2015/11/23 18:01:57 UTC

Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

On Mon, Nov 23, 2015 at 9:58 AM, <ji...@apache.org> wrote:

> Author: jim
> Date: Mon Nov 23 15:58:25 2015
> New Revision: 1715859
>
> URL: http://svn.apache.org/viewvc?rev=1715859&view=rev
> Log:
> we just worry about "equality" with this implementation...
> So it's not a "real" strcasecmp replacement.
>
> Modified:
>     httpd/httpd/trunk/include/httpd.h
>
> Modified: httpd/httpd/trunk/include/httpd.h
> URL:
> http://svn.apache.org/viewvc/httpd/httpd/trunk/include/httpd.h?rev=1715859&r1=1715858&r2=1715859&view=diff
>
> ==============================================================================
> --- httpd/httpd/trunk/include/httpd.h (original)
> +++ httpd/httpd/trunk/include/httpd.h Mon Nov 23 15:58:25 2015
> @@ -2442,9 +2442,8 @@ AP_DECLARE(int) ap_array_str_contains(co
>   * Known-fast version of strcasecmp(): ASCII case-folding, POSIX compliant
>   * @param s1 The 1st string to compare
>   * @param s2 The 2nd string to compare
> - * @return integer greater than, equal to, or less than 0, depending on
> - *         if s1 is lexicographically greater than, equal to, or less
> - *         than s2 ignoring case.
> + * @return 0 if s1 is lexicographically equal to s2 ignoring case;
> + *         non-0 otherwise.
>   */
>  AP_DECLARE(int) ap_casecmpstr(const char *s1, const char *s2);
>
> @@ -2453,9 +2452,8 @@ AP_DECLARE(int) ap_casecmpstr(const char
>   * @param s1 The 1st string to compare
>   * @param s2 The 2nd string to compare
>   * @param n  Maximum number of characters in the strings to compare
> - * @return integer greater than, equal to, or less than 0, depending on
> - *         if s1 is lexicographically greater than, equal to, or less
> - *         than s2 ignoring case.
> + * @return 0 if s1 is lexicographically equal to s2 ignoring case;
> + *         non-0 otherwise.
>   */
>  AP_DECLARE(int) ap_casecmpstrn(const char *s1, const char *s2, apr_size_t
> n);
>

Howso?  The implementation does provide ASCII numeric-alpha ordering.

Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
On Tue, Nov 24, 2015 at 10:11 AM, Eric Covener <co...@gmail.com> wrote:

> On Tue, Nov 24, 2015 at 11:07 AM, William A Rowe Jr <wr...@rowe-clan.net>
> wrote:
> > Well, we are sorting the entire ASCII so I guess we can drop "for
> > alpha-numerics only".
>
> Maybe it was fixed and I missed it, but didn't you point
> out that [] were not sorted right relative to alphas per
> POSIX strcasecmp?
>

I was pointing out that most implementations sort [] in between upper and
lower
case letters, and in the string-folded ordering, all characters are treated
as their
lower case equivalent for collation.

Unsure how POSIX defined this.  The EBCDIC ordering table used the same
lower-case folding as ASCII, so the sortation of all POSIX characters will
be
identical between our EBCDIC (extended) and ASCII implementations.

[I only just realized that original EBCDIC didn't include all the C
characters,
I only used the code page for 15 years for document imaging but always
coded in ANSI :-]

Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
On Tue, Nov 24, 2015 at 12:34 PM, Jim Jagielski <ji...@jagunet.com> wrote:

> At this stage I'm at the point of diminishing returns...
>
> I don't care what color we paint this bike shed.
>
> I'm just glad I was able to come up with something for
> people to nit-pick at and get all alpha-male about :P
>

Sorry Jim, even board members have to refrain from ad-hominem
attacks on technical discussions, which is what everyone else
has been enjoying.  Good to see so many eyes on this optimization
even if it turned out to be a protocol correctness fix, instead!

Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by Jim Jagielski <ji...@jaguNET.com>.
At this stage I'm at the point of diminishing returns...

I don't care what color we paint this bike shed.

I'm just glad I was able to come up with something for
people to nit-pick at and get all alpha-male about :P

> On Nov 24, 2015, at 1:24 PM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
> 
> On Tue, Nov 24, 2015 at 11:53 AM, Jim Jagielski <ji...@jagunet.com> wrote:
> It's for these types of reasons that I suggest that
> instead of looking at this as some replacement for
> strcasecmp, we instead look at the function on how we
> actually *use* it. As far as I know, we simply use
> it to see if 2 strings are equal, ignoring ASCII case.
> I could be wrong, and I'm sure someone will point out
> if and where I am :) But even so, if *that* is the
> use case, then we are free to define the function
> (and create whatever name) however we want.
> 
> 
> Well, there is another option, reject API bloat and insist that
> users set LC_ALL="C" if they want httpd to behave correctly
> and performantly.  I don't suppose you tested the performance
> after turning off UTF-8 processing on your Mac?
> 
> 


Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by Yann Ylavic <yl...@gmail.com>.
On Tue, Nov 24, 2015 at 7:24 PM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
>
> Well, there is another option, reject API bloat and insist that
> users set LC_ALL="C" if they want httpd to behave correctly
> and performantly.

Which API bloat? If we really want to avoid misleading anybody let's
call the function ap_tokencmp[n](), no?
(but I think casecmpstr[n]() is already quite clear about that...).

Setting the locale to "C" is not always desirable, some modules need
to work in unicode (UTF-8 or whatever), let's simply provide them
helpers, and do the right thing on the HTTP side.

Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
On Tue, Nov 24, 2015 at 11:53 AM, Jim Jagielski <ji...@jagunet.com> wrote:

> It's for these types of reasons that I suggest that
> instead of looking at this as some replacement for
> strcasecmp, we instead look at the function on how we
> actually *use* it. As far as I know, we simply use
> it to see if 2 strings are equal, ignoring ASCII case.
> I could be wrong, and I'm sure someone will point out
> if and where I am :) But even so, if *that* is the
> use case, then we are free to define the function
> (and create whatever name) however we want.
>


Well, there is another option, reject API bloat and insist that
users set LC_ALL="C" if they want httpd to behave correctly
and performantly.  I don't suppose you tested the performance
after turning off UTF-8 processing on your Mac?

Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by Jim Jagielski <ji...@jaguNET.com>.
It's for these types of reasons that I suggest that
instead of looking at this as some replacement for
strcasecmp, we instead look at the function on how we
actually *use* it. As far as I know, we simply use
it to see if 2 strings are equal, ignoring ASCII case.
I could be wrong, and I'm sure someone will point out
if and where I am :) But even so, if *that* is the
use case, then we are free to define the function
(and create whatever name) however we want.

> On Nov 24, 2015, at 11:11 AM, Eric Covener <co...@gmail.com> wrote:
> 
> On Tue, Nov 24, 2015 at 11:07 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
>> Well, we are sorting the entire ASCII so I guess we can drop "for
>> alpha-numerics only".
> 
> Maybe it was fixed and I missed it, but didn't you point
> out that [] were not sorted right relative to alphas per
> POSIX strcasecmp?


Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by Eric Covener <co...@gmail.com>.
On Tue, Nov 24, 2015 at 11:07 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
> Well, we are sorting the entire ASCII so I guess we can drop "for
> alpha-numerics only".

Maybe it was fixed and I missed it, but didn't you point
out that [] were not sorted right relative to alphas per
POSIX strcasecmp?

Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
On Tue, Nov 24, 2015 at 9:46 AM, Eric Covener <co...@gmail.com> wrote:

> On Tue, Nov 24, 2015 at 10:41 AM, William A Rowe Jr <wr...@rowe-clan.net>
> wrote:
> > You are right, calling out "This implementation provides ASCII-ordering
> > of strings (EBCDIC text is also ASCII ordered on those platforms)"
> > might have been clearer?  Eric, what phrasing would make sense to you
> > as a user/maintainer of the httpd EBCDIC port?
>
> "Provides C/POSIX strcasecmp ordering, for alpha-numerics only"?
>

Well, we are sorting the entire ASCII so I guess we can drop "for
alpha-numerics only".

What we want to warn the user is that the sortation of extended characters
outside of
the normal C/POSIX range is undefined (not rejected), and that extended
characters
are not compared case-insensitively.

For extra confusion note the function says next to nothing in the XL
reference;

https://www-01.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/rsrncp.htm?lang=en

but this function goes into some of the dirty details...

https://www-01.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/risasc.htm

I found a nice summary of the state of EBCDIC for c programming over here;

http://www.longpelaexpertise.com/ezine/LostinTranslation1.php

Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by Eric Covener <co...@gmail.com>.
On Tue, Nov 24, 2015 at 10:41 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
> You are right, calling out "This implementation provides ASCII-ordering
> of strings (EBCDIC text is also ASCII ordered on those platforms)"
> might have been clearer?  Eric, what phrasing would make sense to you
> as a user/maintainer of the httpd EBCDIC port?

"Provides C/POSIX strcasecmp ordering, for alpha-numerics only"?

Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
On Tue, Nov 24, 2015 at 9:28 AM, Jim Jagielski <ji...@jagunet.com> wrote:

> if the current EBCDIC validates the >< lex, then by all means
> update the docs to reflect that. You specifically said that
> "The implementation does provide ASCII numeric-alpha ordering."
>

You are right, calling out "This implementation provides ASCII-ordering
of strings (EBCDIC text is also ASCII ordered on those platforms)"
might have been clearer?  Eric, what phrasing would make sense to you
as a user/maintainer of the httpd EBCDIC port?


> The main reason I was suggesting this was that, really,
> it would make sense for the function to return true if
> they compare and 0 if not, which is, of course, not
> how strncasecmp does it (which is why we have the weird
> looking !strcasecmp() for code paths where they DO compare :) ).
> So by allowing that casecmpstr was not, in fact, a *replacement*
> implementation of strcasecmp(), we could define it to be more
> sensible ;)
>

That's possible, although I'm -.5 on losing the functionality for the
earlier reasons stated, and your interpretation is incorrect.  "cmp"
implies compare, lt, 0, gt valuations. "eq" implies equality tests.

Like you said, we aren't creating a replacement clib, so creating
> funcs for our own specific usecase does make some sense.
>

Sure, and the use case(s) can be broad enough to serve more than
one purpose.

If you really need a strict equality test, isn't

#define ap_strcaseequals(s1, s2) (!ap_strcasecmp(s1, s2)) sufficient?

(or choose match or identity but not cmp, please :)

Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by Jim Jagielski <ji...@jaguNET.com>.
if the current EBCDIC validates the >< lex, then by all means
update the docs to reflect that. You specifically said that
"The implementation does provide ASCII numeric-alpha ordering."

The main reason I was suggesting this was that, really,
it would make sense for the function to return true if
they compare and 0 if not, which is, of course, not
how strncasecmp does it (which is why we have the weird
looking !strcasecmp() for code paths where they DO compare :) ).
So by allowing that casecmpstr was not, in fact, a *replacement*
implementation of strcasecmp(), we could define it to be more
sensible ;)

Like you said, we aren't creating a replacement clib, so creating
funcs for our own specific usecase does make some sense.
> On Nov 24, 2015, at 10:15 AM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
> 
> On Tue, Nov 24, 2015 at 6:35 AM, Jim Jagielski <ji...@jagunet.com> wrote:
> Yeah, but not, afaict, EBCDIC.
> 
> It would help for you to review the sources.  It wasn't a correct alpha sort
> until I put in the EBCDIC code table this weekend, now EBCDIC chars
> collate in ASCII order under this function.
>  
> And in our use case, we
> don't care (and never use) the greater/less-than functionality,
> 
> s/our/my/;s/we/I/; - projecting much?  Designing utility functions is about
> designing for utility, not one case.  If it is one case don't abstract it.  That's
> why util[_string].c has a lot of apparently duplicate functionality that we
> don't generally need under stdc '89 and APR, but we just hadn't gone back
> and mopped up yet.
>  
> just the equal to. This allows for possible other improvements/
> enhancements which might "break" the >< but doesn't affect
> how *we* use it.
> 
> Why would we break this?
> 
> Right now you can presume that if you are looking strictly for some
> encoding tokens 'chunked' 'deflate' 'gzip', you can actually walk the 
> token list in alpha order and find out what was missing or unexpected 
> in a slightly more efficient way. 
> 
> 


Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
On Tue, Nov 24, 2015 at 6:35 AM, Jim Jagielski <ji...@jagunet.com> wrote:

> Yeah, but not, afaict, EBCDIC.


It would help for you to review the sources.  It wasn't a correct alpha sort
until I put in the EBCDIC code table this weekend, now EBCDIC chars
collate in ASCII order under this function.


> And in our use case, we
> don't care (and never use) the greater/less-than functionality,
>

s/our/my/;s/we/I/; - projecting much?  Designing utility functions is about
designing for utility, not one case.  If it is one case don't abstract it.
That's
why util[_string].c has a lot of apparently duplicate functionality that we
don't generally need under stdc '89 and APR, but we just hadn't gone back
and mopped up yet.


> just the equal to. This allows for possible other improvements/
> enhancements which might "break" the >< but doesn't affect
> how *we* use it.


Why would we break this?

Right now you can presume that if you are looking strictly for some
encoding tokens 'chunked' 'deflate' 'gzip', you can actually walk the
token list in alpha order and find out what was missing or unexpected
in a slightly more efficient way.

Re: svn commit: r1715859 - /httpd/httpd/trunk/include/httpd.h

Posted by Jim Jagielski <ji...@jaguNET.com>.
Yeah, but not, afaict, EBCDIC. And in our use case, we
don't care (and never use) the greater/less-than functionality,
just the equal to. This allows for possible other improvements/
enhancements which might "break" the >< but doesn't affect
how *we* use it.
> On Nov 23, 2015, at 12:01 PM, William A Rowe Jr <wr...@rowe-clan.net> wrote:
> 
> On Mon, Nov 23, 2015 at 9:58 AM, <ji...@apache.org> wrote:
> Author: jim
> Date: Mon Nov 23 15:58:25 2015
> New Revision: 1715859
> 
> URL: http://svn.apache.org/viewvc?rev=1715859&view=rev
> Log:
> we just worry about "equality" with this implementation...
> So it's not a "real" strcasecmp replacement.
> 
> Modified:
>     httpd/httpd/trunk/include/httpd.h
> 
> Modified: httpd/httpd/trunk/include/httpd.h
> URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/include/httpd.h?rev=1715859&r1=1715858&r2=1715859&view=diff
> ==============================================================================
> --- httpd/httpd/trunk/include/httpd.h (original)
> +++ httpd/httpd/trunk/include/httpd.h Mon Nov 23 15:58:25 2015
> @@ -2442,9 +2442,8 @@ AP_DECLARE(int) ap_array_str_contains(co
>   * Known-fast version of strcasecmp(): ASCII case-folding, POSIX compliant
>   * @param s1 The 1st string to compare
>   * @param s2 The 2nd string to compare
> - * @return integer greater than, equal to, or less than 0, depending on
> - *         if s1 is lexicographically greater than, equal to, or less
> - *         than s2 ignoring case.
> + * @return 0 if s1 is lexicographically equal to s2 ignoring case;
> + *         non-0 otherwise.
>   */
>  AP_DECLARE(int) ap_casecmpstr(const char *s1, const char *s2);
> 
> @@ -2453,9 +2452,8 @@ AP_DECLARE(int) ap_casecmpstr(const char
>   * @param s1 The 1st string to compare
>   * @param s2 The 2nd string to compare
>   * @param n  Maximum number of characters in the strings to compare
> - * @return integer greater than, equal to, or less than 0, depending on
> - *         if s1 is lexicographically greater than, equal to, or less
> - *         than s2 ignoring case.
> + * @return 0 if s1 is lexicographically equal to s2 ignoring case;
> + *         non-0 otherwise.
>   */
>  AP_DECLARE(int) ap_casecmpstrn(const char *s1, const char *s2, apr_size_t n);
> 
> Howso?  The implementation does provide ASCII numeric-alpha ordering.