You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Rodent of Unusual Size <Ke...@Golux.Com> on 1998/07/07 12:10:02 UTC

Re: general/2553: URL:s containing the character '�' gets trucated (See also PR 800)

Dean Gaudet wrote:
>
> Yeah, except apache doesn't enforce this in any way shape or form, so
> that doesn't explain this PR at all.

No, but it explains why we shouldn't bother to spend time on it.

> (Not enforcing it is very deliberate, Roy can elaborate.)

That would be good, I wish he would.  We close other PRs about
illegal characters in URLs.  Is it because the RFC1738 replacement
(that I don't think is an RFC yet) allows 8-bit characters?

By the way, I think some discussion or explanation is merited
before people unilaterally reverse other people's PR changes.
I wouldn't reverse one of yours without checking with you
first; I'd appreciate the same courtesy being extended by all to
all.  It just makes us look fractious and stupid.

Unless RFC1738 has been replaced (which AFAIK it hasn't yet),
the original closure message is technically correct - regardless
of whether Apache chooses to be more lenient than the RFC.
Otherwise we're encouraging bad behaviour.  Why don't you re-open
#800, then, since Marc said the same thing there?

#ken	P-)}

Ken Coar                    <http://Web.Golux.Com/coar/>
Apache Group member         <http://www.apache.org/>
"Apache Server for Dummies" <http://Web.Golux.Com/coar/ASFD/>

Re: general/2553: URL:s containing the character 'ö' gets trucated (See also PR 800)

Posted by Marc Slemko <ma...@worldgate.com>.

On Tue, 7 Jul 1998, Brian Behlendorf wrote:

> 
> At 04:59 PM 7/7/98 -0600, Marc Slemko wrote:
> >(general question) So, should we support unencoded spaces in URLs too? 
> >
> >We can do it by parsing left to right, then right to left, then taking
> >what is left in the middle.  People do complain about it.  Why is that
> >different or not?
> 
> Hell no.  Every piece of URL-parsing software I've seen expects there to be
> no whitespace within a URL.  Is there a server out there that actually
> accepts it?

Not that I'm aware of.  But, technically, it is illegal in just the same
manner as other things and can be easily generated (ie. most common
clients don't properly encode spaces, but require the user to do it).

I'm not necessarily seriously suggesting it, just saying that there isn't
that much difference between it and other things...

Re: general/2553: URL:s containing the character 'ö' gets trucated (See also PR 800)

Posted by Brian Behlendorf <br...@hyperreal.org>.

At 04:59 PM 7/7/98 -0600, Marc Slemko wrote:
>(general question) So, should we support unencoded spaces in URLs too? 
>
>We can do it by parsing left to right, then right to left, then taking
>what is left in the middle.  People do complain about it.  Why is that
>different or not?

Hell no.  Every piece of URL-parsing software I've seen expects there to be
no whitespace within a URL.  Is there a server out there that actually
accepts it?

	Brian


--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
pure chewing satisfaction                                  brian@apache.org
                                                        brian@hyperreal.org

Re: general/2553: URL:s containing the character '�' gets trucated(See also PR 800)

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.

Marc Slemko wrote:
> 
> (general question) So, should we support unencoded spaces in URLs too?

Huh, catch *me* voicing an opinion on this one! :->  Roy's
draft has become my touchstone - or will, once I locate it.
Whatever it says is fine with me.

#ken	P-)}

Ken Coar                    <http://Web.Golux.Com/coar/>
Apache Group member         <http://www.apache.org/>
"Apache Server for Dummies" <http://Web.Golux.Com/coar/ASFD/>

Re: general/2553: URL:s containing the character 'ö' gets trucated (See also PR 800)

Posted by Dean Gaudet <dg...@arctic.org>.

On Tue, 7 Jul 1998, Marc Slemko wrote:

> (general question) So, should we support unencoded spaces in URLs too? 

No.

> We can do it by parsing left to right, then right to left, then taking
> what is left in the middle.  People do complain about it.  Why is that
> different or not?

Because space is a punctuation character with special meaning within the
protocol.  Nothing above 127 has a special meaning within the protocol. 

Dean

Re: general/2553: URL:s containing the character '�' gets trucated (See also PR 800)

Posted by Marc Slemko <ma...@worldgate.com>.

On Tue, 7 Jul 1998, Rodent of Unusual Size wrote:

> Dean Gaudet wrote:
> > 
> > Oh, I thought you just sent this to me privately.  So here's what I sent
> > Ken back privately.
> 
> I did, by accident.  Then I re-sent it to the list, where I meant it to go
> in the first place. :-)
> 
> > Because when 800 went by I hadn't read Roy's latest draft, and I hadn't
> > worked on the parsing code.
> 
> Hmm.  I thought the rule of thumb was that implementors should NOT hew
> to drafts.  Experimental RFCs, yes - but not drafts.

Good policy, but when you don't have anything else to refer to that makes
any sense at all, your options are limited...

(general question) So, should we support unencoded spaces in URLs too? 

We can do it by parsing left to right, then right to left, then taking
what is left in the middle.  People do complain about it.  Why is that
different or not?

Re: general/2553: URL:s containing the character '�' gets trucated (See also PR 800)

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.

Dean Gaudet wrote:
> 
> Oh, I thought you just sent this to me privately.  So here's what I sent
> Ken back privately.

I did, by accident.  Then I re-sent it to the list, where I meant it to go
in the first place. :-)

> Because when 800 went by I hadn't read Roy's latest draft, and I hadn't
> worked on the parsing code.

Hmm.  I thought the rule of thumb was that implementors should NOT hew
to drafts.  Experimental RFCs, yes - but not drafts.

> The reason that all characters are accepted is to avoid an English-centric
> URL space.  Apache has no reason to restrict the characters -- it only
> needs to know about the punctuation of URLs [/:?#].  We have no reason to
> enforce english only URLs on Norwegians for example, they should be free
> to use their characters within intranets, and even on the internet.  In
> fact we should support that usage.  It can make the difference between a
> site that makes sense to an end user, and a site that is confusing.

Oh, certainly - when the rules by which the Internet live say it's
acceptable.  At the moment, however, our laxity is encouraging
behaviour that may or may not become acceptable - as Magnus has
shown.

Whatever.  I defer to our standards cop, of course.

#ken	P-)}

Ken Coar                    <http://Web.Golux.Com/coar/>
Apache Group member         <http://www.apache.org/>
"Apache Server for Dummies" <http://Web.Golux.Com/coar/ASFD/>

Re: general/2553: URL:s containing the character 'ö' gets trucated (See also PR 800)

Posted by Dean Gaudet <dg...@arctic.org>.

Oh, I thought you just sent this to me privately.  So here's what I sent
Ken back privately.

On Tue, 7 Jul 1998, Rodent of Unusual Size wrote:

> Dean Gaudet wrote:
> > 
> > Yeah, except apache doesn't enforce this in any way shape or form, so
> > that doesn't explain this PR at all.
> 
> No, but it explains why we shouldn't bother to spend time on it.

No it doesn't.

> > (Not enforcing it is very deliberate, Roy can elaborate.)
> 
> That would be good, I wish he would.  We close other PRs about
> illegal characters in URLs.  Is it because the RFC1738 replacement
> (that I don't think is an RFC yet) allows 8-bit characters?
>
> By the way, I think some discussion or explanation is merited
> before people unilaterally reverse other people's PR changes.
> I wouldn't reverse one of yours without checking with you
> first; I'd appreciate the same courtesy being extended by all to
> all.  It just makes us look fractious and stupid.
> 
> Unless RFC1738 has been replaced (which AFAIK it hasn't yet),
> the original closure message is technically correct - regardless
> of whether Apache chooses to be more lenient than the RFC.
> Otherwise we're encouraging bad behaviour.  Why don't you re-open
> #800, then, since Marc said the same thing there?

Because when 800 went by I hadn't read Roy's latest draft, and I hadn't
worked on the parsing code. 

The reason that all characters are accepted is to avoid an English-centric
URL space.  Apache has no reason to restrict the characters -- it only
needs to know about the punctuation of URLs [/:?#].  We have no reason to
enforce english only URLs on Norwegians for example, they should be free
to use their characters within intranets, and even on the internet.  In
fact we should support that usage.  It can make the difference between a
site that makes sense to an end user, and a site that is confusing. 

Dean