You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Garey Mills <gm...@library.berkeley.edu> on 2002/07/02 22:42:25 UTC

URL parsing changed between 1.3.23 and 1.3.26?

Hi -

	Does anyone have any idea why the following URL would work in
1.3.23 and not work in 1.3.26?

<A 
HREF="http://sunsite2.berkeley.edu:4140/WebZ/html/urlwarn.html:sessionid=01-59496-1036819798:entityartTitle=:entityartAuthor=
:entityartJournal=UNIX Review:entityartNum=n. 11, :entityartVol=v. 13,
:entityartPage=p. 137 (1
pages):entityartNewLoc=http%3A%2F%2Fwww.melvyl.ucop.edu%2Fmw%2Fcgi-bin%2Fftsrv%3FCOMP%2B17507853?entityartDate=Oct,
1995:entitymyreccount=1:entityartTitle=AT&T Bell Laboratories. (the Plan 9
operating system for research and educational use)(Brief Article)(Product
Announcement)"><b>Art/Cit (Netscape only)</b></A>

The message I get from Apache 1.3.26 is:

Bad Request

Your browser sent a request that this server could not understand.

The request line contained invalid characters following the protocol
string.

And the error log shows this:

[Tue Jul  2 13:10:16 2002] [error] [client 128.32.238.84] request failed:
erroneous characters after protocol string: GET /WebZ/html
/urlwarn.html:sessionid=01-59496-1036819798:entityartTitle=:entityartAuthor=
:entityartJournal=UNIX Review:entityartNum=n. 11, :enti
tyartVol=v. 13, :entityartPage=p. 137 (1
pages):entityartNewLoc=http%3A%2F%2Fwww.melvyl.ucop.edu%2Fmw%2Fcgi-bin%2Fftsrv%3FCOMP%2B175
07853:entityartDate=Oct, 1995?entitymyreccount=1:entityartTitle=AT&T Bell
Laboratories. (the Plan 9 operating system for research an
d educational use)(Brief Article)(Product Announcement) HTTP/1.0



NOTE: The URL will not work as it stands because it points to a
web application and needs to have a session established. But I guarantee
that it does work in 1.3.23 and not in 1.3.26, both having mod_ssl and 
a special module called 'mod_webz' enabled.


Garey Mills
Library Systems Office
UC Berkeley

PS. If your opinion is that this really should go to the users list, or
into a bug report, please let me know. I couldn't really see it fitting in
either place. 


Re: URL parsing changed between 1.3.23 and 1.3.26?

Posted by "Roy T. Fielding" <fi...@apache.org>.
> That's true.  But & is definitely the one used by convention.  (Maybe it'
> s
> in the CGI spec?  Not sure on that one.)  And that doesn't change the fact
> that this in this case ':' was used in place of both the '?' and the
> '&', which is definitely wrong.

No, it's just a different way of naming the path segment.  Any http
resource is free to construct its own namespace with the exception
that "/" and "?" have a reserved meaning *when* they are used.

....Roy


Re: URL parsing changed between 1.3.23 and 1.3.26?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Tue, 2 Jul 2002, Aaron Bannert wrote:

> On Tue, Jul 02, 2002 at 05:08:17PM -0400, Cliff Woolley wrote:
> > You're supposed to encode a query string like so:
> > http://myserver.com/file.html?arg1=val1&arg2=val2&arg3=val3
>
> Is that actually part of the URI spec, or just a convention?  I was
> under the impression that the spec only says that args go after
> a ? and everything after is up to the interface implementor (mod
> URI-character-space).

That's true.  But & is definitely the one used by convention.  (Maybe it's
in the CGI spec?  Not sure on that one.)  And that doesn't change the fact
that this in this case ':' was used in place of both the '?' and the
'&', which is definitely wrong.

>From RFC 2396 on URIs:

3. URI Syntactic Components

...
      <scheme>://<authority><path>?<query>

   each of which, except <scheme>, may be absent from a particular URI.
   For example, some URI schemes do not allow an <authority> component,
   and others do not use a <query> component.

      absoluteURI   = scheme ":" ( hier_part | opaque_part )
...
      hier_part     = ( net_path | abs_path ) [ "?" query ]
      net_path      = "//" authority [ abs_path ]
      abs_path      = "/"  path_segments


3.4. Query Component

   The query component is a string of information to be interpreted by
   the resource.

      query         = *uric

   Within a query component, the characters ";", "/", "?", ":", "@",
   "&", "=", "+", ",", and "$" are reserved.


--Cliff



Re: URL parsing changed between 1.3.23 and 1.3.26?

Posted by Aaron Bannert <aa...@clove.org>.
On Tue, Jul 02, 2002 at 05:08:17PM -0400, Cliff Woolley wrote:
> You're supposed to encode a query string like so:
> 
> http://myserver.com/file.html?arg1=val1&arg2=val2&arg3=val3

Is that actually part of the URI spec, or just a convention?  I was
under the impression that the spec only says that args go after
a ? and everything after is up to the interface implementor (mod
URI-character-space).

-aaron

Re: URL parsing changed between 1.3.23 and 1.3.26?

Posted by Greg Ames <gr...@apache.org>.
Cliff Woolley wrote:

> You're supposed to encode a query string like so:
> 
> http://myserver.com/file.html?arg1=val1&arg2=val2&arg3=val3

Sure, but we also support path info, which can be used like a query string:

http://bugs.apache.org/index.cgi/full/3708

to get some Chinese spam, and Marc's subtle reply.

Greg

Re: URL parsing changed between 1.3.23 and 1.3.26?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Tue, 2 Jul 2002, Jerry Baker wrote:

> Garey Mills wrote:
> >
> > NOTE: The URL will not work as it stands because it points to a
> > web application and needs to have a session established. But I guarantee
> > that it does work in 1.3.23 and not in 1.3.26, both having mod_ssl and
> > a special module called 'mod_webz' enabled.
>
> The URL has spaces in it. That is a big no no for one. URL's with spaces
> only work in IE even though the HTTP specification prohibits them.

Right.  It's strange though in a way -- shouldn't the browser
automatically encode the spaces?  Hmph.

For another thing -- what's with the colons?

> http://sunsite2.berkeley.edu:4140/WebZ/html/urlwarn.html:sessionid=01-5949
> 6-1036819798:entityartTitle=:entityartAuthor=
> :entityartJournal=UNIX Review:entityartNum=n. 11, :entityartVol=v. 13,
> :entityartPage=p. 137 (1   [snip]

You're supposed to encode a query string like so:

http://myserver.com/file.html?arg1=val1&arg2=val2&arg3=val3

Anyway, the reason this stopped working in 1.3.26 is that we're now more
strict with the request line; as soon as it sees a space after the URL,
the only other thing on the line ought to be "HTTP/1.0" or "HTTP/1.1".  In
your case, the URL itself has spaces in it, which as Jerry pointed out is
and always has been invalid.  You just somehow got away with it before.
Encode the spaces as %20 and it will work.


--Cliff


Re: URL parsing changed between 1.3.23 and 1.3.26?

Posted by Jerry Baker <je...@attbi.com>.
Garey Mills wrote:
> 
> NOTE: The URL will not work as it stands because it points to a
> web application and needs to have a session established. But I guarantee
> that it does work in 1.3.23 and not in 1.3.26, both having mod_ssl and
> a special module called 'mod_webz' enabled.

The URL has spaces in it. That is a big no no for one. URL's with spaces
only work in IE even though the HTTP specification prohibits them.

-- 
Jerry Baker