You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "Roy T. Fielding" <fi...@kiwi.ics.uci.edu> on 1998/02/21 05:06:43 UTC

Re: absoluteURIs suck

>I really don't understand the lameness regarding absoluteURIs in HTTP/1.1. 
>Suppose HTTP/1.2 comes out and dictates that absoluteURIs must be used for
>all requests (this is hinted at in RFC2068).  In order to interoperate
>with HTTP/1.1 servers all HTTP/1.2 clients will have to also include Host:
>headers.  This is a waste of bandwidth having to include the hostname
>twice. 

Such a concern was less significant than the probability that you might
be sending the absoluteURI request to an old proxy (that doesn't send Host
but will forward it if present) and the possibility that name-based
vhosts would fail due to inadequate implementation.  Much of the weird
wording in the spec is due to several IESG biggies "opinion" of what
was sufficient to ensure implementation versus my own pleading to
ensure deployment.

>It could be fixed by relaxing HTTP/1.1 and requiring that the client MUST
>send either an absoluteURI or a relativeURI with a Host: header. 

That is one of the cases where the IESG insisted on something implementers
would be forced to implement rather than on something "sufficient".
The compromise I insisted on was that always sending Host would be
dependent on the exact version number (HTTP/1.1) rather than >=1.1.
That way, the requirement can be removed when it is no longer necessary,
which is about the same time that it will be possible to send a full URI
in a non-proxy request (a long long time from now).

>Apache 1.2 and 1.3 are broken as far as forward compatibility with this
>hypothetical HTTP spec as well.  Consider: 
>
><VirtualHost 10.1.1.1>
>...
></VirtualHost>
>
>No NameVirtualHost in the config.  I consider the only correct way to
>implement this config is that *all requests* appearing at 10.1.1.1:80 will
>be served by that virtual host.  Right now if a request appears there with
>an absolute URI with a hostname that isn't listed *we will reject it*.
>This means we're not forward compatible with some lame HTTP version that
>doesn't exist but is threatened to exist. 
>
>Contrast this with the behaviour on a Host: header that we don't
>recognize... we just don't care about it, we serve what has been
>configured (a default server, or the ip-vhost).

Either the server is configured to deal with the global namespace, or
it ignores the global namespace assuming that every request it receives
is intended for it.  The requirements in HTTP/1.1 force the server to
recognize its own namespace as part of the global absoluteURI namespace,
thereby giving us "training wheels" for the day in which we always use
the global namespace.  In contrast, the deployment strategy of Host is
not moving toward the global namespace, and thus whether or not we actually
check the Host value for non-vhosts is a decision left to the implementers.

These training wheels have little usefulness on their own, but they do
make it easier to deploy real wheels eventually (just as persistent
connections is actually a training wheel for multiplexed connections).

....Roy

Re: absoluteURIs suck

Posted by Dean Gaudet <dg...@arctic.org>.
Well ... damn.  I searched for "loop" several times in the RFC and didn't
see that.  Take back that rant then.

Dean

On Sat, 21 Feb 1998, Alexei Kosut wrote:

> On Sat, 21 Feb 1998, Dean Gaudet wrote:
> 
> > That doesn't avoid request loops.  That avoids self-loops, which
> > would be a reasonable requirement of any quality implementation.
> > A proxy has to do DNS lookups anyhow, so it doesn't bother me that
> > this requirement absolutely requires DNS to be implemented properly.
> > But it certainly doesn't avoid loops.  Nothing in the standard enforces
> > proxy loop avoidance.  Max-Forwards can't do it either, section 14.31:
> 
> Unless the sleep debt has been getting to me, proxy loop avoidance is
> one of the many magical uses of the Via header. From section 14.44 of RFC
> 2068: "The Via general-header field MUST be used by gateways and
> proxies... and is intended to be used for... avoiding request loops."
> 
> Looks like it would work fine to me.
> 
> -- Alexei Kosut <ak...@stanford.edu> <http://www.stanford.edu/~akosut/>
>    Stanford University, Class of 2001 * Apache <http://www.apache.org> *
> 
> 
> 


Re: absoluteURIs suck

Posted by Alexei Kosut <ak...@leland.Stanford.EDU>.
On Sat, 21 Feb 1998, Dean Gaudet wrote:

> That doesn't avoid request loops.  That avoids self-loops, which
> would be a reasonable requirement of any quality implementation.
> A proxy has to do DNS lookups anyhow, so it doesn't bother me that
> this requirement absolutely requires DNS to be implemented properly.
> But it certainly doesn't avoid loops.  Nothing in the standard enforces
> proxy loop avoidance.  Max-Forwards can't do it either, section 14.31:

Unless the sleep debt has been getting to me, proxy loop avoidance is
one of the many magical uses of the Via header. From section 14.44 of RFC
2068: "The Via general-header field MUST be used by gateways and
proxies... and is intended to be used for... avoiding request loops."

Looks like it would work fine to me.

-- Alexei Kosut <ak...@stanford.edu> <http://www.stanford.edu/~akosut/>
   Stanford University, Class of 2001 * Apache <http://www.apache.org> *



Re: absoluteURIs suck

Posted by Dean Gaudet <dg...@arctic.org>.
rant rant.

On Fri, 20 Feb 1998, Roy T. Fielding wrote:

> >I really don't understand the lameness regarding absoluteURIs in HTTP/1.1. 
> >Suppose HTTP/1.2 comes out and dictates that absoluteURIs must be used for
> >all requests (this is hinted at in RFC2068).  In order to interoperate
> >with HTTP/1.1 servers all HTTP/1.2 clients will have to also include Host:
> >headers.  This is a waste of bandwidth having to include the hostname
> >twice. 
> 
> Such a concern was less significant than the probability that you might
> be sending the absoluteURI request to an old proxy (that doesn't send Host
> but will forward it if present) and the possibility that name-based
> vhosts would fail due to inadequate implementation.  Much of the weird
> wording in the spec is due to several IESG biggies "opinion" of what
> was sufficient to ensure implementation versus my own pleading to
> ensure deployment.

And folks wonder why the standards processes are sneered
at when they refuse to listen to the folks implementing
the protocols.  But I suppose if I'm going to complain about
bandwidth I've got a lot more issues with HTTP than just this.
See <ftp://koobera.math.uic.edu/www/sarcasm/modest-proposal.txt>.

> >Apache 1.2 and 1.3 are broken as far as forward compatibility with this
> >hypothetical HTTP spec as well.  Consider: 
> >
> ><VirtualHost 10.1.1.1>
> >...
> ></VirtualHost>
> >
> >No NameVirtualHost in the config.  I consider the only correct way to
> >implement this config is that *all requests* appearing at 10.1.1.1:80 will
> >be served by that virtual host.  Right now if a request appears there with
> >an absolute URI with a hostname that isn't listed *we will reject it*.
> >This means we're not forward compatible with some lame HTTP version that
> >doesn't exist but is threatened to exist. 
> >
> >Contrast this with the behaviour on a Host: header that we don't
> >recognize... we just don't care about it, we serve what has been
> >configured (a default server, or the ip-vhost).
> 
> Either the server is configured to deal with the global namespace, or
> it ignores the global namespace assuming that every request it receives
> is intended for it.

This is not a requirement of RFC2068.  And it's not a requirement of
Apache either.  Section 1.3:

    ...any server may act as an origin server, proxy, gateway, or tunnel,
    switching behavior based on the nature of each request.

> The requirements in HTTP/1.1 force the server to
> recognize its own namespace as part of the global absoluteURI namespace,
> thereby giving us "training wheels" for the day in which we always use
> the global namespace.  In contrast, the deployment strategy of Host is
> not moving toward the global namespace, and thus whether or not we actually
> check the Host value for non-vhosts is a decision left to the implementers.

I'm happy not checking Host for ip-vhosts, in fact that's what's
implemented now.  Except we use Host for self-redirects when
UseCanonicalName is off.

My complaint is absoluteURI.  Let me examine the possibilities for
recognizing a domain name as your own server (so that you can
determine if it's a proxy request or an origin request):

- DNS lookup, trust the IP returned, and if it matches the local
    connection IP then say "that's me".  This is a seriously broken
    method -- not only for performance, but for denial of service
    and security reasons.  Apache currently DOES implement this,
    and I want to get rid of this.  It's bad doing a forward
    lookup, the attackers control forward lookups.

- Exhaustive listing of servernames.  Apache tries these lists first
    before doing the DNS lookup.  But it is impossible to exhaustively
    list servernames precisely because it is the client that decides
    how a hostname maps to an ip address, not the server.

Insert my rant here about how name-vhosts are an inaccurate protocol.
I'd love something like this inserted into the standard:

    Note that the server may resolve DNS in a different manner than
    the client.  For example, a client mdma.chem.happy.edu requesting
    the unqualified hostname www may resolve it to www.chem.happy.edu.
    If the server happens to be managed by the CS dept, it may resolve
    an unqualified www as www.cs.happy.edu.  A client making a request
    to www then may not result in the correct origin server.  To avoid
    problems such as this clients SHOULD fully qualify all domain names
    specified by the user.

It's completely trivial to do this with the standard unix gethostbyname()
API... the h_name field of struct hostent is fully qualified on any
reasonable system (i.e. not sunos4 or 5 using NIS with poor host maps).
(of course Sun managed to make their servers ubiquitous in academia,
which not only means that academics always say apache has bad performance,
but more relevantly means that the clients really have no idea what the
heck the FQDN is ... 'cause NIS sucks.)

Not that it matters now.

At any rate, to fix the check_fulluri in apache is going to require an
extra field in the request_rec.  When doing check_fulluri apache has no
idea that it's handling an origin request or a proxy request.

rant rant.

Dean

P.S. rant rant:  Section 5.1.2 says:

    In order to avoid request loops, a proxy MUST be able to recognize
    all of its server names, including any aliases, local variations,
    and the numeric IP address.

That doesn't avoid request loops.  That avoids self-loops, which
would be a reasonable requirement of any quality implementation.
A proxy has to do DNS lookups anyhow, so it doesn't bother me that
this requirement absolutely requires DNS to be implemented properly.
But it certainly doesn't avoid loops.  Nothing in the standard enforces
proxy loop avoidance.  Max-Forwards can't do it either, section 14.31:

    The Max-Forwards header field SHOULD be ignored for all other methods
    defined by this specification and for any extension methods for which
    it is not explicitly referred to as part of that method definition.

Thank god a proxy can freely insert a header:

X-Loop: a.b.c.d

and detect loops.