You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Günther Gsenger <gu...@gmail.com> on 2007/10/22 21:03:25 UTC

svn commit: r573831

Hi,

sorry I can't reply directly to this thread  
http://mail-archives.apache.org/mod_mbox/httpd-dev/200709.mbox/%3c46E3BFE1.9050004@apache.org%3e  
- i only (re)subscribed to this list today after I found the thread.

> +/*
> + * Escapes a uri in a similar way as php's urlencode does.
> + * Based on ap_os_escape_path in server/util.c
> + */
> +static char *escape_uri(apr_pool_t *p, const char *path) {
> +    char *copy = apr_palloc(p, 3 * strlen(path) + 3);
> +    const unsigned char *s = (const unsigned char *)path;
> +    unsigned char *d = (unsigned char *)copy;
> +    unsigned c;
> +
> +    while ((c = *s)) {
> +        if (apr_isalnum(c) || c == '_') {
> +            *d++ = c;
> +        }
> +        else if (c == ' ') {
> +            *d++ = '+';
> +        }
> +        else {
> +            d = c2x(c, '%', d);
> +        }
> +        ++s;
> +    }
> +    *d = '\0';
> +    return copy;
> +}
> +
Ruediger Pluem:
> Does it make sense to duplicate code? Shouldn't this be placed in util.c?
It most likely should. I placed it there because I thought the patch would  
get a higher acceptance if it just touched mod_rewrite.

> +                    /* escape the backreference */
> +                    char *tmp2, *tmp;
> +                    tmp = apr_palloc(pool, span + 1);
> +                    strncpy(tmp, bri->source + bri->regmatch[n].rm_so,  
> span);
Ruediger Pluem:
> How about using apr_pstrndup instead?
Yes, I was not aware of that function.


André Malo:
> This spreads another uri escaper copy around. Why can't we take 
> ap_escape_uri? Without deep digging: what's the difference?
>Also I don't like the ' ' => '+' transition, which should not be applied  
> forpaths. It's safer to translate that always to %20, I guess.
The main difference is this escaping of ' ' to '+'. The reason for this is  
that while ' ' gets encoded as %20 in paths, it gets encoded as '+' in  
query strings (afaik for historic reasons). Therefore, languages which  
interpret the query string (like PHP) might get confused if they receive a  
%20 in the query string (or at least that was my concern).


André Malo:
> By the way, I'm wondering why nobody picked up the suggested use of the 
> escape rewrite map (or I overread it).
I don't know if that works but I developed the patch because it can be  
used in a directory-context and the RewriteMap can't.

I've created another patch, hopefully you'll like it better.

Regards,
Guenther

Re: svn commit: r573831

Posted by Nick Kew <ni...@webthing.com>.
On Tue, 23 Oct 2007 00:22:56 +0200
André Malo <nd...@perlig.de> wrote:

> 
> > > The main difference is this escaping of ' ' to '+'. The reason for
> > > this is that while ' ' gets encoded as %20 in paths, it gets
> > > encoded as '+' in query strings (afaik for historic reasons).
> > > Therefore, languages which interpret the query string (like PHP)
> > > might get confused if they receive a %20 in the query string (or
> > > at least that was my concern).
> >
> > That sounds plausible, but I'm not sure.  Anyone else?
> 
> Nah. Everyone just takes %hh and turns it into an octet. The special 
> handling takes place for the + sign only, on both sides.
> In fact, every script must be prepared to get a %20 from the client,
> as well.

You're right of course, and it's also so easy to implement
that it's unlikely to have been messed up in real life.

As I see it, we have two variants of the fix:
  (1) Günther Gsenger's patch, with + for space.
  (2) Use ap_escape_path_segment, with %20 for space.

My (slight) inclination is to apply (2), but if anyone has strong
views, I'm happy with (1).  If noone cares enough to reply here,
I'll go ahead and apply (2).

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Re: svn commit: r573831

Posted by André Malo <nd...@perlig.de>.
* Nick Kew wrote:

> > The main difference is this escaping of ' ' to '+'. The reason for
> > this is that while ' ' gets encoded as %20 in paths, it gets encoded
> > as '+' in query strings (afaik for historic reasons). Therefore,
> > languages which interpret the query string (like PHP) might get
> > confused if they receive a %20 in the query string (or at least that
> > was my concern).
>
> That sounds plausible, but I'm not sure.  Anyone else?

Nah. Everyone just takes %hh and turns it into an octet. The special 
handling takes place for the + sign only, on both sides.
In fact, every script must be prepared to get a %20 from the client, as 
well.

nd
-- 
> Rätselnd, was ein Anthroposoph mit Unterwerfung zu tun hat...
                    ^^^^^^^^^^^^
[...] Dieses Wort gibt so viele Stellen für einen Spelling Flame her, und
Du gönnst einem keine einzige.    -- Jean Claude und David Kastrup in dtl

Re: svn commit: r573831

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Nick Kew wrote:
> Günther Gsenger <gu...@gmail.com> wrote:
>> André Malo:
>>> This spreads another uri escaper copy around. Why can't we take 
>>> ap_escape_uri? Without deep digging: what's the difference?
>>> Also I don't like the ' ' => '+' transition, which should not be
>>> applied  
>>> forpaths. It's safer to translate that always to %20, I guess.
>> The main difference is this escaping of ' ' to '+'. The reason for
>> this is that while ' ' gets encoded as %20 in paths, it gets encoded
>> as '+' in query strings (afaik for historic reasons). Therefore,
>> languages which interpret the query string (like PHP) might get
>> confused if they receive a %20 in the query string (or at least that
>> was my concern).
> 
> That sounds plausible, but I'm not sure.  Anyone else?

I strongly expect every CGI consumer expects to see '+' where appropriate,
and changing this will alter a vast number of CGI scripts.  Not all of them
broken, of course, but some will be.

You can't touch QUERY_ARGS escaping, as it's well defined.  If the patch
does this it needs refinement.

Bill

Re: svn commit: r573831

Posted by Nick Kew <ni...@webthing.com>.
On Mon, 22 Oct 2007 21:03:25 +0200
Günther Gsenger <gu...@gmail.com> wrote:

> Hi,

Hi, thanks for revisiting this!

> Ruediger Pluem:
> > Does it make sense to duplicate code? Shouldn't this be placed in
> > util.c?
> It most likely should. I placed it there because I thought the patch
> would get a higher acceptance if it just touched mod_rewrite.

I think perhaps Rudiger's point is that the patch essentially
duplicates ap_escape_path_segment, except for ' ' --> '+'.

> André Malo:
> > This spreads another uri escaper copy around. Why can't we take 
> > ap_escape_uri? Without deep digging: what's the difference?
> >Also I don't like the ' ' => '+' transition, which should not be
> >applied  
> > forpaths. It's safer to translate that always to %20, I guess.
> The main difference is this escaping of ' ' to '+'. The reason for
> this is that while ' ' gets encoded as %20 in paths, it gets encoded
> as '+' in query strings (afaik for historic reasons). Therefore,
> languages which interpret the query string (like PHP) might get
> confused if they receive a %20 in the query string (or at least that
> was my concern).

That sounds plausible, but I'm not sure.  Anyone else?

> I've created another patch, hopefully you'll like it better.

Looks good, but it might be nice to avoid introducing another
new escape function.  If space vs + is the issue, perhaps we
could fix it by making the third arg to ap_os_escape_path an
enum, with an option to escape space --> +.

Just thinking out loud.  I won't apply anything yet, but please
bug me if it's still languishing here by the weekend.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/