You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Rob Hartill <ro...@imdb.com> on 1996/06/21 15:35:18 UTC
Re: Apache bug? Something else bug? (CGI script arguments)

Not acked.

I was going to tell him to use %26, but then how do you encode a & as
part of the data and not a separator. Seems to be a can of worms. The
&amp encoding advice sounds dubious.


>I'm not sure whose problem this is, but I decided to try asking you first,
>if only to be told it's definitely outside Apache's territory...
>
>A comment I saw recently prompted me to re-read the HTML 2.0 RFC, which
>confirmed the suggestion I'd seen (but never noticed on earlier readings of
>the standard...) that
>
>=====
>            NOTE - The URI from a query form submission can be
>            used in a normal anchor style hyperlink.
>            Unfortunately, the use of the `&' character to
>            separate form fields interacts with its use in SGML
>            attribute values as an entity reference delimiter.
>            For example, the URI `http://host/?x=1&y=2' must be
>            written `<a href="http://host/?x=1&#38;y=2"' or `<a
>            href="http://host/?x=1&amp;y=2">'.
>=====
>
>So I duly modified the CGI script that I was about to put into service so it
>would send out "&amp;" as the script argument separator within anchors in
>the HTML it generated, in accordance with the above statement. And it all
>seemed to work - while I was using Netscape to test it. When I tried a final
>check with lynx, it didn't - the script was not seeing anything after the
>first argument.
>
>Further investigation (including checking what was sent over the network,
>since the client intentionally hides what's going on...) showed that (a)
>Netscape was converting the &amp; it received in anchors to & for display
>AND sending it back as just "&" when requesting the document (which may be
>reasonable - it's then *not* a part of an HTML document...), while (b) lynx
>sent back exactly what it received (with &amp;). Clearly neither Apache nor
>the Perl 5 CGI-modules V2.5 facilities I used to access the arguments
>translated the &amp;. The lynx behaviour is seen even with the recent lynx
>V2.5.
>
>I can't make up my mind whose "fault" this is (though I'll wait until I hear
>your opinion before saying what I currently think - not least because I've
>already changed my mind twice while investigating and then writing this
>message!):
>
> * my fault - I've misunderstood how it's all supposed to work...
> * lynx' fault - it should remove the HTML encoding (but not any URL encoding)
>   before sending the URL back as part of a request
> * Apache's fault - it should strip the HTML encoding before passing the query 
>   arguments to the CGI script
> * CGI-module's fault - it should strip the HTML encoding before passing the
>   query string to the user's perl script
>
>I did wonder if the recently-added (according the the CHANGES file)
>QUERY_STRING_UNESCAPED CGI variable might have the "right" information in
>it, but neither the Perl CGI modules's "dump variables" facility nor dumping
>out all of %ENV showed a value for it, though the normal QUERY_STRING was
>set (and contained &amp;). That was with Apache 1.1b2 (and it read as though
>the new variable was in 1.1b1). So I couldn't tell whether it would have
>helped.
>
>Whatever the explanation, this looks like a no-win situation, if you cannot
>rely on the query argument separator being returned consistently by clients,
>since there's no way to be sure that something looking like an HTML entity
>name should be unencoded - it may already have been unencoded somewhere
>along the way (e.g. if the query argument was itself the name of an entity
>about which the user desired information - sent as &amp;gt; and already
>converted to &gt; - shouldn't then be converted to ">".
>
>This sounds like a grey area (and probably one which no-one noticed existed
>until it got noted in the RFC), so that people just sent out URLs
>URL-encoded but not also HTML-encoded (as I certainly did - the rules were
>"clearly" different for URLs than for HTML in general :-).
>
>If you can offer any comments on where the problem lies and the right
>solution(s) - if any exist - I'd be very grateful. And apologies if in your
>view it's not an Apache problem... I'm still unsure, though I admit that my
>current view is that it's probably not Apache (but I might change my mind
>again).
>
>[The RFC does suggest a solution - 
>=====
>            HTTP server implementors, and in particular, CGI
>            implementors are encouraged to support the use of
>            `;' in place of `&' to save users the trouble of
>            escaping `&' characters this way.
>=====
>
>but since the Perl module uses &, changing to use ";" would mean either 
>dropping the module (which would be a lot of work) or hacking it around 
>(which isn't a good idea, and could also be a lot of work... not to mention 
>breaking anything which relied on the old behaviour).
>
>                                John Line
>-- 
>University of Cambridge WWW/gopher server manager account (usually John Line)
>Send general queries to the WWW or gopher administrator addresses -
>webmaster@ucs.cam.ac.uk or gopher-admin@ucs.cam.ac.uk.


-- 
Rob Hartill (robh@imdb.com)
The Internet Movie Database (IMDb)  http://www.imdb.com/
           ...more movie info than you can poke a stick at.