You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Behlendorf <br...@organic.com> on 1996/04/24 22:37:31 UTC

this host crap

is mighty interesting.  Some of the bizarre Host: headers I've been sent:

HYPERREAL.COM
WWW.APACHE.ORG
  (okay, not that bizarre, but a reminder that we need to account for 
  capitalization)

bong.com:80
hyperreal.com:80
taz.hyperreal.com:80
  (a reminder that we need to deal with port #'s - I'd prefer to in this 
  case sent back a 302 Location: (or is it 303?) removing the redundant 
  :80 URL)

vrml.wired.com.
www.apache.org.
www.hyperreal.com.

  (a reminder that it looks like we'll have to handle ending-periods too)

Truely fucked up:

fgw
   (sent by "NCSA_Mosaic/2.7b3 (X11;meson 4_52 mips)" and 
   "NCSA_Mosaic/2.7b3 (X11;krypton 4_52 mips)" when accessing 
   ecstasy.org, all coming from relay4.oleane.net

hyperreal.com:

   (is this legal?)

hyperreal.com:70
 
   (this is coming from a lot of different UA's and hosts, all the UA's 
   have "via proxy gateway  CERN-HTTPD/3.0 libwww/2.17" attached, I 
   suppose the CERN proxy has a problem with "http://hyperreal.com:70", 
   huh?)

images.bianca.com

  (previously noted, I'm working with Dean on this one.)

linux
linux.cis.nctu.edu.tw
www.ukweb.com

  These are the most bizarre.  The actual logfile entries claim they are 
  coming from "linux.cis.nctu.edu.tw" and "www.ukweb.com" respectively, 
  and they appear to be robotic in nature, yet they have the User-Agent 
  set to valid Mozilla user-agents (like Mozilla/3.0b2 (X11; I; Linux 
  1.3.94 i586), sometimes i486)) sometimes going via a proxy server, 
  sometimes not.  I'm almost wondering if this is a bug of some sort - 
  I've sent mail to Mark and to the .tw mirror maintainer (this is a 
  mirror I had not been informed of, and isn't on our pages yet) to see 
  if it's really from them, or if some sort of corruption from the 
  Referer: field is coming in somehow.


www-cache.funet.fi

  Apparently the cache at www-cache.funet.fi (which doesn't appear to 
  identify itself in the user-agent header, maybe it does in the 
  Forwarded: header, I don't know) decided to add a "Host: 
  www-cache.funet.fi".  The browser which sent this was NCSA_Mosaic/2.7b3 (X11;IRIX 5.3 IP19)
  Maybe it was the browser, but I saw lots of other NCSA_Mosaic/2.7b3 
  X11's which appeared to handle proxies without a problem. No, wait - 
  this was also the cause of another bogus Host: header, "fgw".  I 
  haven't seen any requests from NCSA_Mosaic/2.7b4 through a proxy yet, so
  maybe this is a bug in XMosaic.  You folks at NCSA want to look at this?

www.sandbox.net

  Okay, so both "Mozilla/2.01Gold (Win95; I)" and "Mozilla/2.0 
  (Macintosh; I; 68K)" sent this erroneously - the URL in these cases was 
  (get ready)
  
  http://www.sandbox.net/cyberhunt2/prot-bin/webfilter/www.lycos.com:80/cgi-bin/pursuit?query=faberge+eggs

  and 

  http://www.sandbox.net/cyberhunt2/prot-bin/webfilter/www.lycos.com/cgi-bin/nph-randurl/cgi-bin/largehostpursuit1.html?query=relic&maxhits=20

  This is a protected service so I can't see what type of response these 
  people really got - both requests came from "www.tracer.com".  Maybe 
  Netscape 2.0 doesn't change the value of the Host: header after a 302 
  or 303 redirect?

www.webville.com

  6 bogus requests were made, all from the same remote host and with the same
  client (Mozilla/2.0 (Win16; I)), with the referer being
  "http://www.webville.com/oak/Marco-25/archive.html".  Looking at that 
  page, there are references to hyperreal in addition to lots of other 
  places, but I don't see anything that should explicitly trigger such a 
  bogus request.  


>From all of these, I get the feeling that handling bogus Host: headers is 
going to be an interesting situation.  Since the migration path will not 
be smooth, one option I'd like to have is to be able to, on the absence 
of a Host: header or the existance of a bogus one, return an error, 
something like "Malformed Request".  Roy will no doubt have opinions on 
this.  :)

Forward where appropriate.

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com  |  We're hiring!  http://www.organic.com/Home/Info/Jobs/


Re: this host crap

Posted by Mark J Cox <ma...@ukweb.com>.
> www.ukweb.com
> 
>   These are the most bizarre.  The actual logfile entries claim they are 
>   coming from "linux.cis.nctu.edu.tw" and "www.ukweb.com" respectively, 

www.ukweb.com runs Apache 1.1b1 and has "ProxyPass" set up to map "/apache"
to "www.apache.org/".  

It looks like the proxy module is passing on the "Host" header.  I've not
checked the proxy code but it seems the most logical explanation. 

Mark


Re: this host crap

Posted by Dean Gaudet <dg...@hotwired.com>.
In article <Pi...@fully.organic.com> you write:
>fgw
[...]

Not that this is necessarily the case for this one... but something to
worry about.  Netscape doesn't ever send/use FQDNs when the user types
in an unqualified name.  i.e. try "http://www/" and you'll just get
"Host: www".  It doesn't treat cookies from "www" as if they come from
"www.fqdn.com".  It doesn't share www authentication with www.fqdn.com.
The list is probably longer...

I haven't checked any other client's behaviour in this respect.

It's great trying to train people to use fqdns when they're sorta
comfortable with adding CNAMEs for single letter host names 'cause they
think it's a great timesaver.  I don't recall how many "bug reports"
and "server problems" I've had to "debug" related to this.

Dean

Re: this host crap

Posted by Alexei Kosut <ak...@nueva.pvt.k12.ca.us>.
On Wed, 24 Apr 1996, Brian Behlendorf wrote:

> HYPERREAL.COM
> WWW.APACHE.ORG
>   (okay, not that bizarre, but a reminder that we need to account for 
>   capitalization)

We do.

> bong.com:80
> hyperreal.com:80
> taz.hyperreal.com:80
>   (a reminder that we need to deal with port #'s - I'd prefer to in this 

We do.

>   case sent back a 302 Location: (or is it 303?) removing the redundant 
>   :80 URL)

Bad idea. According to the HTTP/1.1 spec, the Host header is not
entirely related to the URL the client typed in. Namely, it's
host:port, with port defaulting to 80. So a client would be perfectly
within its rights sending "Host: www.apache.org:80", even if the user
typed in or was linked to "http//www.apache.org/".

> vrml.wired.com.
> www.apache.org.
> www.hyperreal.com.
> 
>   (a reminder that it looks like we'll have to handle ending-periods too)

Rgph. We don't do that. I suppose we could. I'll think about it.

> hyperreal.com:
> 
>    (is this legal?)

I don't think so. But we handle it correctly.

> hyperreal.com:70
>  
>    (this is coming from a lot of different UA's and hosts, all the UA's 
>    have "via proxy gateway  CERN-HTTPD/3.0 libwww/2.17" attached, I 
>    suppose the CERN proxy has a problem with "http://hyperreal.com:70", 
>    huh?)

But, we already knew that, yes... ?

> linux
> linux.cis.nctu.edu.tw
> www.ukweb.com
> 
>   These are the most bizarre.  The actual logfile entries claim they are 
>   coming from "linux.cis.nctu.edu.tw" and "www.ukweb.com" respectively, 
>   and they appear to be robotic in nature, yet they have the User-Agent 
>   set to valid Mozilla user-agents (like Mozilla/3.0b2 (X11; I; Linux 
>   1.3.94 i586), sometimes i486)) sometimes going via a proxy server, 
>   sometimes not.  I'm almost wondering if this is a bug of some sort - 
>   I've sent mail to Mark and to the .tw mirror maintainer (this is a 
>   mirror I had not been informed of, and isn't on our pages yet) to see 
>   if it's really from them, or if some sort of corruption from the 
>   Referer: field is coming in somehow.

I sure hope not... Could be someone was poring over a spec, came
across Host and misinterpreted it to mean the browser's hostname... I
hope not.

> www-cache.funet.fi
> 
>   Apparently the cache at www-cache.funet.fi (which doesn't appear to 
>   identify itself in the user-agent header, maybe it does in the 
>   Forwarded: header, I don't know) decided to add a "Host: 
>   www-cache.funet.fi".  The browser which sent this was NCSA_Mosaic/2.7b3 (X11;IRIX 5.3 IP19)
>   Maybe it was the browser, but I saw lots of other NCSA_Mosaic/2.7b3 
>   X11's which appeared to handle proxies without a problem. No, wait - 
>   this was also the cause of another bogus Host: header, "fgw".  I 
>   haven't seen any requests from NCSA_Mosaic/2.7b4 through a proxy yet, so
>   maybe this is a bug in XMosaic.  You folks at NCSA want to look at this?

Here's my bet: NCSA Mosaic 2.7b3, when talking to a proxy, sends a
Host header with the proxy's name. This could explain
www-cache.funet.fi, the .tw and ukweb ones, and even fgw - if it's an
internal name of a proxy.

> www.sandbox.net
> 
>   Okay, so both "Mozilla/2.01Gold (Win95; I)" and "Mozilla/2.0 
>   (Macintosh; I; 68K)" sent this erroneously - the URL in these cases was 
>   (get ready)
>   
>   http://www.sandbox.net/cyberhunt2/prot-bin/webfilter/www.lycos.com:80/cgi-bin/pursuit?query=faberge+eggs

Hmm.

>   http://www.sandbox.net/cyberhunt2/prot-bin/webfilter/www.lycos.com/cgi-bin/nph-randurl/cgi-bin/largehostpursuit1.html?query=relic&maxhits=20

Hmm hmm.

>   This is a protected service so I can't see what type of response these 
>   people really got - both requests came from "www.tracer.com".  Maybe 
>   Netscape 2.0 doesn't change the value of the Host: header after a 302 
>   or 303 redirect?

Could be. Or it could be a Netscape clone...

> www.webville.com
> 
>   6 bogus requests were made, all from the same remote host and with the same
>   client (Mozilla/2.0 (Win16; I)), with the referer being
>   "http://www.webville.com/oak/Marco-25/archive.html".  Looking at that 
>   page, there are references to hyperreal in addition to lots of other 
>   places, but I don't see anything that should explicitly trigger such a 
>   bogus request.  

Don't have a clue about that one.

> >From all of these, I get the feeling that handling bogus Host: headers is 
> going to be an interesting situation.  Since the migration path will not 
> be smooth, one option I'd like to have is to be able to, on the absence 
> of a Host: header or the existance of a bogus one, return an error, 
> something like "Malformed Request".  Roy will no doubt have opinions on 
> this.  :)

This is not neccessary. Malformed headers, if they don't pass muster, and are
treated like they didn't exist... just make all your servers
VirtualHosts, and make the "main" server just a page that says "hey,
you, get a browser that supports Host: correctly."

If you want them seperately, that's something different.

-- 
________________________________________________________________________
Alexei Kosut <ak...@nueva.pvt.k12.ca.us>    
URL: http://www.nueva.pvt.k12.ca.us/~akosut/  
Lefler on IRC, DALnet <http://www.dal.net/>