You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Thomas Gabrielsen <ap...@arton.no> on 2004/06/04 20:44:56 UTC

[users@httpd] Robots.txt and virtualhost

Hi

I have a question about Robots.txt files and virtualhost's. When I have one
main directory and one (named) virtualhost. Both are sharing one ip. Should
I add then something special in the robots.txt file that are placed in the
main directory pointing the robots to the virtualhost directory? The reason
I ask is that I get MANY requests for robots.txt in the main directory
compared to the virtualhost. Now, the mainhost has been up for a long time
compared to the virtualhost (which has been up for a week) so that could be
the reason. I just wanted to ask you fine people to be sure :)

Thanks,
Thomas Gabrielsen




---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Robots.txt and virtualhost

Posted by Thomas Gabrielsen <ap...@arton.no>.
Ah, I see! Thanks for clearing that up.

Thomas

----- Original Message ----- 
From: "Jon Berg" <jo...@turtlemeat.com>
To: <us...@httpd.apache.org>
Sent: Saturday, June 05, 2004 8:05 PM
Subject: Re: [users@httpd] Robots.txt and virtualhost


: hi,
:
: There is no way of doing a query to apache and finding all virtual hosts
: on an ip. There is a way to do this with dns, dnslocator.com.
: Or I guess if Google saves every ip that they get when doing
: a dns lookup they could merge this together and find it. But I don't
: see any point of doing it.
:
: So a new virtual host would be like a new "host" on the Internet for
: a searchbot.
:
:
:
: Jon Berg.
: ----
: http://tubegator.com
:
: Thomas Gabrielsen wrote:
:
: > Wow! Nine exclamation marks! I guess you have a great knowledge on this
: > subject then ;)
: > So I dare to ask you a cuple of other questions:
: > - When a robots crawling the net, is it scanning internet for which ip's
: > hosting webservers, and then again finds what domains are hosted there?
If
: > thats the case, how do they do that? Does webservers (in this case:
Apache)
: > give out that information about which domain is hosted when querying the
IP
: > (mainhost).
: > If this is the case, the newer virtualhost should have the same amount
of
: > visits as the mainhost by now so this can't be the case.....!?
: >
: > Can you (or anybody) tell me more about how robots work, or where I can
read
: > more about it? Specially the google bot.
: >
: > Thanks,
: > Thomas
: >
: > ----- Original Message ----- 
: > From: "Zoe Ballz" <zo...@zoeballz.tv>
: > To: <us...@httpd.apache.org>
: > Sent: Saturday, June 05, 2004 1:31 AM
: > Subject: Re: [users@httpd] Robots.txt and virtualhost
: >
: >
: > Hiya
: >
: > You are right - thats exactly the reason !!!!!!!!!
: >
: > Zoe
: >
: > http://zoeballz.net - Website Hosting for the Transgendered Community
: >   ----- Original Message ----- 
: >   From: Thomas Gabrielsen
: >   To: users@httpd.apache.org
: >   Sent: Friday, June 04, 2004 7:44 PM
: >   Subject: [users@httpd] Robots.txt and virtualhost
: >
: >
: >   Hi
: >
: >   I have a question about Robots.txt files and virtualhost's. When I
have
: > one
: >   main directory and one (named) virtualhost. Both are sharing one ip.
: > Should
: >   I add then something special in the robots.txt file that are placed in
the
: >   main directory pointing the robots to the virtualhost directory? The
: > reason
: >   I ask is that I get MANY requests for robots.txt in the main directory
: >   compared to the virtualhost. Now, the mainhost has been up for a long
time
: >   compared to the virtualhost (which has been up for a week) so that
could
: > be
: >   the reason. I just wanted to ask you fine people to be sure :)
: >
: >   Thanks,
: >   Thomas Gabrielsen
: >
: >
: >
: >
: >   ---------------------------------------------------------------------
: >   The official User-To-User support forum of the Apache HTTP Server
Project.
: >   See <URL:http://httpd.apache.org/userslist.html> for more info.
: >   To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
: >      "   from the digest: users-digest-unsubscribe@httpd.apache.org
: >   For additional commands, e-mail: users-help@httpd.apache.org
: >
: >
: >
: > ---------------------------------------------------------------------
: > The official User-To-User support forum of the Apache HTTP Server
Project.
: > See <URL:http://httpd.apache.org/userslist.html> for more info.
: > To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
: >    "   from the digest: users-digest-unsubscribe@httpd.apache.org
: > For additional commands, e-mail: users-help@httpd.apache.org
: >
: >
: >
:
:
: ---------------------------------------------------------------------
: The official User-To-User support forum of the Apache HTTP Server Project.
: See <URL:http://httpd.apache.org/userslist.html> for more info.
: To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
:    "   from the digest: users-digest-unsubscribe@httpd.apache.org
: For additional commands, e-mail: users-help@httpd.apache.org
:
:



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Robots.txt and virtualhost

Posted by Jon Berg <jo...@turtlemeat.com>.
hi,

There is no way of doing a query to apache and finding all virtual hosts 
on an ip. There is a way to do this with dns, dnslocator.com.
Or I guess if Google saves every ip that they get when doing
a dns lookup they could merge this together and find it. But I don't
see any point of doing it.

So a new virtual host would be like a new "host" on the Internet for
a searchbot.



Jon Berg.
----
http://tubegator.com

Thomas Gabrielsen wrote:

> Wow! Nine exclamation marks! I guess you have a great knowledge on this
> subject then ;)
> So I dare to ask you a cuple of other questions:
> - When a robots crawling the net, is it scanning internet for which ip's
> hosting webservers, and then again finds what domains are hosted there? If
> thats the case, how do they do that? Does webservers (in this case: Apache)
> give out that information about which domain is hosted when querying the IP
> (mainhost).
> If this is the case, the newer virtualhost should have the same amount of
> visits as the mainhost by now so this can't be the case.....!?
> 
> Can you (or anybody) tell me more about how robots work, or where I can read
> more about it? Specially the google bot.
> 
> Thanks,
> Thomas
> 
> ----- Original Message ----- 
> From: "Zoe Ballz" <zo...@zoeballz.tv>
> To: <us...@httpd.apache.org>
> Sent: Saturday, June 05, 2004 1:31 AM
> Subject: Re: [users@httpd] Robots.txt and virtualhost
> 
> 
> Hiya
> 
> You are right - thats exactly the reason !!!!!!!!!
> 
> Zoe
> 
> http://zoeballz.net - Website Hosting for the Transgendered Community
>   ----- Original Message ----- 
>   From: Thomas Gabrielsen
>   To: users@httpd.apache.org
>   Sent: Friday, June 04, 2004 7:44 PM
>   Subject: [users@httpd] Robots.txt and virtualhost
> 
> 
>   Hi
> 
>   I have a question about Robots.txt files and virtualhost's. When I have
> one
>   main directory and one (named) virtualhost. Both are sharing one ip.
> Should
>   I add then something special in the robots.txt file that are placed in the
>   main directory pointing the robots to the virtualhost directory? The
> reason
>   I ask is that I get MANY requests for robots.txt in the main directory
>   compared to the virtualhost. Now, the mainhost has been up for a long time
>   compared to the virtualhost (which has been up for a week) so that could
> be
>   the reason. I just wanted to ask you fine people to be sure :)
> 
>   Thanks,
>   Thomas Gabrielsen
> 
> 
> 
> 
>   ---------------------------------------------------------------------
>   The official User-To-User support forum of the Apache HTTP Server Project.
>   See <URL:http://httpd.apache.org/userslist.html> for more info.
>   To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>      "   from the digest: users-digest-unsubscribe@httpd.apache.org
>   For additional commands, e-mail: users-help@httpd.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>    "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
> 
> 
> 


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Robots.txt and virtualhost

Posted by Eugene Lee <li...@fsck.net>.
: > > : if someone comes to the machine via the IP address it serves a
: > > : simple page that tells this is a virtual host and then gives a
: > > : short description of all the sites that are hosted along with a
: > > : link to each site.

At this point, I think it is worthless to support any protocol older
than HTTP/1.1.


-- 
Eugene Lee

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Robots.txt and virtualhost

Posted by Thomas Gabrielsen <ap...@arton.no>.
> if someone comes to the machine via the IP
> > : address it serves a simple page that tells this is a virtual host and
> > : then gives a short description of all the sites that are hosted along
> > : with a link to each site.
> > :
> >
> > Yes, this is good for very old browsers that doesnt support host headers
> > too.
> >
> > Thomas
>
> Actually, no, it isn't. Not unless the links look like
> http://111.222.333.444/virthost1/
>

If a old browser dont understand host headers it will go to the default host
of the webserver. Now, if there's a list of all the domains there the user
can then click on that link him/herself.
I know I've read several articles about this, but I wont bother to fine them
now because ...


>
> Has there even been such a browser since Mosaic?
>

... I dont think so.

But I've think many webmasters still think it's a good practice to keep such
a list in the main host directory.


Thomas



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Robots.txt and virtualhost

Posted by Tim Burden <Ti...@Burden.ca>.
 if someone comes to the machine via the IP
> : address it serves a simple page that tells this is a virtual host and
> : then gives a short description of all the sites that are hosted along
> : with a link to each site.
> :
>
> Yes, this is good for very old browsers that doesnt support host headers
> too.
>
> Thomas

Actually, no, it isn't. Not unless the links look like
http://111.222.333.444/virthost1/

If they look like http://www.virthost1.com the old browsers still can't
follow it.

Has there even been such a browser since Mosaic?


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Robots.txt and virtualhost

Posted by Thomas Gabrielsen <ap...@arton.no>.


: Thomas Gabrielsen wrote:
:
: >Wow! Nine exclamation marks! I guess you have a great knowledge on this
: >subject then ;)
: >So I dare to ask you a cuple of other questions:
: >- When a robots crawling the net, is it scanning internet for which ip's
: >hosting webservers, and then again finds what domains are hosted there?
If
: >thats the case, how do they do that? Does webservers (in this case:
Apache)
: >give out that information about which domain is hosted when querying the
IP
: >(mainhost).
: >
: I thought I would chime in on this.  I don't claim that what I did is
: the best thing but I thought it is worth mentioning.  I set up my
: virtual host machine so that if someone comes to the machine via the IP
: address it serves a simple page that tells this is a virtual host and
: then gives a short description of all the sites that are hosted along
: with a link to each site.
:

Yes, this is good for very old browsers that doesnt support host headers
too.

Thomas



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Robots.txt and virtualhost

Posted by Chris W <1q...@cox.net>.
Thomas Gabrielsen wrote:

>Wow! Nine exclamation marks! I guess you have a great knowledge on this
>subject then ;)
>So I dare to ask you a cuple of other questions:
>- When a robots crawling the net, is it scanning internet for which ip's
>hosting webservers, and then again finds what domains are hosted there? If
>thats the case, how do they do that? Does webservers (in this case: Apache)
>give out that information about which domain is hosted when querying the IP
>(mainhost).
>
I thought I would chime in on this.  I don't claim that what I did is 
the best thing but I thought it is worth mentioning.  I set up my 
virtual host machine so that if someone comes to the machine via the IP 
address it serves a simple page that tells this is a virtual host and 
then gives a short description of all the sites that are hosted along 
with a link to each site.

-- 
Chris W

Bring Back the HP 15C
http://hp15c.org

Not getting the gifts you want?  The Wish Zone can help.
http://thewishzone.com



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Robots.txt and virtualhost

Posted by Thomas Gabrielsen <ap...@arton.no>.
Wow! Nine exclamation marks! I guess you have a great knowledge on this
subject then ;)
So I dare to ask you a cuple of other questions:
- When a robots crawling the net, is it scanning internet for which ip's
hosting webservers, and then again finds what domains are hosted there? If
thats the case, how do they do that? Does webservers (in this case: Apache)
give out that information about which domain is hosted when querying the IP
(mainhost).
If this is the case, the newer virtualhost should have the same amount of
visits as the mainhost by now so this can't be the case.....!?

Can you (or anybody) tell me more about how robots work, or where I can read
more about it? Specially the google bot.

Thanks,
Thomas

----- Original Message ----- 
From: "Zoe Ballz" <zo...@zoeballz.tv>
To: <us...@httpd.apache.org>
Sent: Saturday, June 05, 2004 1:31 AM
Subject: Re: [users@httpd] Robots.txt and virtualhost


Hiya

You are right - thats exactly the reason !!!!!!!!!

Zoe

http://zoeballz.net - Website Hosting for the Transgendered Community
  ----- Original Message ----- 
  From: Thomas Gabrielsen
  To: users@httpd.apache.org
  Sent: Friday, June 04, 2004 7:44 PM
  Subject: [users@httpd] Robots.txt and virtualhost


  Hi

  I have a question about Robots.txt files and virtualhost's. When I have
one
  main directory and one (named) virtualhost. Both are sharing one ip.
Should
  I add then something special in the robots.txt file that are placed in the
  main directory pointing the robots to the virtualhost directory? The
reason
  I ask is that I get MANY requests for robots.txt in the main directory
  compared to the virtualhost. Now, the mainhost has been up for a long time
  compared to the virtualhost (which has been up for a week) so that could
be
  the reason. I just wanted to ask you fine people to be sure :)

  Thanks,
  Thomas Gabrielsen




  ---------------------------------------------------------------------
  The official User-To-User support forum of the Apache HTTP Server Project.
  See <URL:http://httpd.apache.org/userslist.html> for more info.
  To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
     "   from the digest: users-digest-unsubscribe@httpd.apache.org
  For additional commands, e-mail: users-help@httpd.apache.org



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Robots.txt and virtualhost

Posted by Zoe Ballz <zo...@zoeballz.tv>.
Hiya

You are right - thats exactly the reason !!!!!!!!!

Zoe

http://zoeballz.net - Website Hosting for the Transgendered Community
  ----- Original Message ----- 
  From: Thomas Gabrielsen 
  To: users@httpd.apache.org 
  Sent: Friday, June 04, 2004 7:44 PM
  Subject: [users@httpd] Robots.txt and virtualhost


  Hi

  I have a question about Robots.txt files and virtualhost's. When I have one
  main directory and one (named) virtualhost. Both are sharing one ip. Should
  I add then something special in the robots.txt file that are placed in the
  main directory pointing the robots to the virtualhost directory? The reason
  I ask is that I get MANY requests for robots.txt in the main directory
  compared to the virtualhost. Now, the mainhost has been up for a long time
  compared to the virtualhost (which has been up for a week) so that could be
  the reason. I just wanted to ask you fine people to be sure :)

  Thanks,
  Thomas Gabrielsen




  ---------------------------------------------------------------------
  The official User-To-User support forum of the Apache HTTP Server Project.
  See <URL:http://httpd.apache.org/userslist.html> for more info.
  To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
     "   from the digest: users-digest-unsubscribe@httpd.apache.org
  For additional commands, e-mail: users-help@httpd.apache.org