You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Nathan Ollerenshaw <na...@valuecommerce.ne.jp> on 2003/03/13 12:27:30 UTC

Advanced Mass Hosting Module

Resending this to this list as I got no response on users list.

Currently, we are using flat config files generated by our website
provisioning software to support our mass hosted customers. The reason
for doing it this way, and not using the mod_vhost_alias module is
because we need to be able to turn on/off CGI, PHP, Java, shtml etc on
a per vhost basis. We need the power that having a distinct
<VirtualHost> directive for each site gives you.

Is there a better way?

What I have in mind is a module that fits in with our current LDAP
based infrastructure. Currently, LDAP services our mail users, and I
would like to see the Apache mass hosting configuration held in LDAP as
well. In this way, we can just scale by adding more apache servers,
mounting the shared docroot and pointing them to the LDAP server.

The LDAP entry would look something like this:

# www.example.com, base
dn: uid=www.example.com, o=base
siteGidNumber: 10045
siteUidNumber: 10045
objectClass: top
objectClass: apacheVhost
serverName: www.example.com
serverAlias: example.com
serverAlias: another.example.com
docRoot: /data/web/04/09/example.com/www
vhostStatus: enabled
phpStatus: enabled
shtmlStatus: enabled
cgiStatus: enabled
dataOutSoftLimit: 1000000 (in bytes per month)
dataOutHardLimit: 10000000
dataInSoftLimit: 1000000
dataInHardLimit: 10000000
dataThrottleRate: 1000000 (in bits/sec)

Then, as a request came in, the imaginary mod_advanced_masshosting
module would first check to see if it had the information about the
domain already cached in memory (to avoid hitting LDAP for every HTTP
request, which would be a Bad Idea) and then if not, it would grab the
entry from LDAP, cache it, and service the incoming requests.

The cache itself would need to be shared among the actual child apache
processes somehow.

In addition to these features, the module would keep track of the
amount of data transferred in & out for each vhost and apply a
soft/hard limit when the limits defined in the LDAP entry were reached.
The amount of actual data transferred would periodically be written to
either a GDBM file or even to an LDAP entry (not sure what is best -
probably LDAP for consistency) and the data would also need to be
shared among any servers in a cluster somehow.

This would enable ISPs to bill on a per vhost basis fairly accurately,
and limit abusive sites.

Now, I've looked around for something like this, and as far as I can
see, there isn't anything that does vhosting quite like this, except
for the commercial systems out there such as Zeus.

Do people think this is a good approach?

Will another method give me what I want? (LDAP is not a dependency,
just a nice-to-have)

Finally, I am thinking about starting an Open Source project to write
this module. My C is pretty primitive right now, though I have got
simple LDAP lookup code working already (just not in Apache, yet).

Would anyone else see this as a worthwhile project for Apache?

It certainly would solve our problems, but it sometimes feels like I'm
trying to fix a simple problem with something very heavy - though
implemented correctly, I don't think performance will be a problem.

Comments gratefully received :)

Regards,

Nathan.

-- 
Nathan Ollerenshaw - Systems Engineer - Shared Hosting
ValueCommerce Japan - http://www.valuecommerce.ne.jp

If you think nobody cares if you're alive, try missing a
couple of car payments.


Re: Advanced Mass Hosting Module

Posted by Tony Finch <do...@dotat.at>.
> > Resending this to this list as I got no response on users list.

Sorry, I missed the original version of this post.

> > Currently, we are using flat config files generated by our website
> > provisioning software to support our mass hosted customers. The reason
> > for doing it this way, and not using the mod_vhost_alias module is
> > because we need to be able to turn on/off CGI, PHP, Java, shtml etc on
> > a per vhost basis. We need the power that having a distinct
> > <VirtualHost> directive for each site gives you.
> >
> > Is there a better way?

The mod_vhost_alias way came from a heritage of very basic web site
provisioning, with little change in architecture since 1996. The
model was abusing the filesystem as a database -- we were using
permissions on users' home directories to record if they had been
barred or had exceeded their quota. We also abused the DNS as a
database, which is where UseCanonicalName DNS came from.

>From a more recent perspective this is foolish (or at least naive).

> > In addition to these features, the module would keep track of the
> > amount of data transferred in & out for each vhost and apply a
> > soft/hard limit when the limits defined in the LDAP entry were reached.
> > The amount of actual data transferred would periodically be written to
> > either a GDBM file or even to an LDAP entry (not sure what is best -
> > probably LDAP for consistency) and the data would also need to be
> > shared among any servers in a cluster somehow.
> > This would enable ISPs to bill on a per vhost basis fairly accurately,
> > and limit abusive sites.

This part of it should be separate from the vhosting side of things.
How you provision a web site is independent of how you accumulate stats
on it. It's a logging module, which is naturally separate from a
URI->filename mapping module -- though a proper vhosting module needs
to hook into the DirectoryWalk side of things to do permissions.

> > Will another method give me what I want? (LDAP is not a dependency,
> > just a nice-to-have)

Clever application of .htaccess files, <directory> sections containing
AllowOverride directives, etc. *may* be good enough, but it's a very
blunt tool.

Sounds like you're aiming for something good. Lots of people have asked
me for database-driven mod_vhost_alias (which misses the point, but)
so there is a clear need. Don't worry too much about the project
management side of things -- just write the code and the docs and publish
it, then keep polishing and answering emails.

Tony.
-- 
f.a.n.finch  <do...@dotat.at>  http://dotat.at/
BERWICK ON TWEED TO WHITBY: SOUTHEAST 2 OR 3, INCREASING 4 PERHAPS 5. FAIR.
MODERATE OR GOOD. SLIGHT, INCREASING MODERATE LATER.

Re: Advanced Mass Hosting Module

Posted by Nathan Ollerenshaw <na...@valuecommerce.ne.jp>.
On Friday, March 14, 2003, at 09:00 AM, Tim Nagel wrote:

> I would also love to see such a module available, and im very willing 
> to
> contribute in any way i can, however, im skillless in the C arena :(

Learn C, and you're on the team!

> Good luck.
>
> Tim

Nathan.

-- 
Nathan Ollerenshaw - Systems Engineer - Shared Hosting
ValueCommerce Japan - http://www.valuecommerce.ne.jp

I'm your blubber boy you should rub me
The sun beat me down too viciously
I fell into the ground to what I used to be
I've melted away I'm nothing again


Re: Advanced Mass Hosting Module

Posted by Tim Nagel <ti...@merkworx.com>.
I would also love to see such a module available, and im very willing to
contribute in any way i can, however, im skillless in the C arena :(

Good luck.

Tim
----- Original Message -----
From: "Nathan Ollerenshaw" <na...@valuecommerce.ne.jp>
To: <de...@httpd.apache.org>
Sent: Thursday, March 13, 2003 10:27 PM
Subject: Advanced Mass Hosting Module


> Resending this to this list as I got no response on users list.
>
> Currently, we are using flat config files generated by our website
> provisioning software to support our mass hosted customers. The reason
> for doing it this way, and not using the mod_vhost_alias module is
> because we need to be able to turn on/off CGI, PHP, Java, shtml etc on
> a per vhost basis. We need the power that having a distinct
> <VirtualHost> directive for each site gives you.
>
> Is there a better way?
>
> What I have in mind is a module that fits in with our current LDAP
> based infrastructure. Currently, LDAP services our mail users, and I
> would like to see the Apache mass hosting configuration held in LDAP as
> well. In this way, we can just scale by adding more apache servers,
> mounting the shared docroot and pointing them to the LDAP server.
>
> The LDAP entry would look something like this:
>
> # www.example.com, base
> dn: uid=www.example.com, o=base
> siteGidNumber: 10045
> siteUidNumber: 10045
> objectClass: top
> objectClass: apacheVhost
> serverName: www.example.com
> serverAlias: example.com
> serverAlias: another.example.com
> docRoot: /data/web/04/09/example.com/www
> vhostStatus: enabled
> phpStatus: enabled
> shtmlStatus: enabled
> cgiStatus: enabled
> dataOutSoftLimit: 1000000 (in bytes per month)
> dataOutHardLimit: 10000000
> dataInSoftLimit: 1000000
> dataInHardLimit: 10000000
> dataThrottleRate: 1000000 (in bits/sec)
>
> Then, as a request came in, the imaginary mod_advanced_masshosting
> module would first check to see if it had the information about the
> domain already cached in memory (to avoid hitting LDAP for every HTTP
> request, which would be a Bad Idea) and then if not, it would grab the
> entry from LDAP, cache it, and service the incoming requests.
>
> The cache itself would need to be shared among the actual child apache
> processes somehow.
>
> In addition to these features, the module would keep track of the
> amount of data transferred in & out for each vhost and apply a
> soft/hard limit when the limits defined in the LDAP entry were reached.
> The amount of actual data transferred would periodically be written to
> either a GDBM file or even to an LDAP entry (not sure what is best -
> probably LDAP for consistency) and the data would also need to be
> shared among any servers in a cluster somehow.
>
> This would enable ISPs to bill on a per vhost basis fairly accurately,
> and limit abusive sites.
>
> Now, I've looked around for something like this, and as far as I can
> see, there isn't anything that does vhosting quite like this, except
> for the commercial systems out there such as Zeus.
>
> Do people think this is a good approach?
>
> Will another method give me what I want? (LDAP is not a dependency,
> just a nice-to-have)
>
> Finally, I am thinking about starting an Open Source project to write
> this module. My C is pretty primitive right now, though I have got
> simple LDAP lookup code working already (just not in Apache, yet).
>
> Would anyone else see this as a worthwhile project for Apache?
>
> It certainly would solve our problems, but it sometimes feels like I'm
> trying to fix a simple problem with something very heavy - though
> implemented correctly, I don't think performance will be a problem.
>
> Comments gratefully received :)
>
> Regards,
>
> Nathan.
>
> --
> Nathan Ollerenshaw - Systems Engineer - Shared Hosting
> ValueCommerce Japan - http://www.valuecommerce.ne.jp
>
> If you think nobody cares if you're alive, try missing a
> couple of car payments.
>
>


Re: Advanced Mass Hosting Module

Posted by Graham Leggett <mi...@sharp.fm>.
Nathan Ollerenshaw wrote:

> What I have in mind is a module that fits in with our current LDAP
> based infrastructure. Currently, LDAP services our mail users, and I
> would like to see the Apache mass hosting configuration held in LDAP as
> well. In this way, we can just scale by adding more apache servers,
> mounting the shared docroot and pointing them to the LDAP server.

I had this on the cards quite a while ago, but have not got around to 
actually finishing it off.

The idea was a separate tool which would generate flat apache config 
files based on LDAP queries. The reason for the flat files was so the 
server could still restart and work even if the LDAP server was down. 
"Kicking" a server could be as simple as accessing a special URL, which 
recreates the flat config files and gracefully restarts the server.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."


Re: Advanced Mass Hosting Module

Posted by Ian Holsman <ia...@apache.org>.
Thomas Eibner wrote:
> On Sat, Mar 15, 2003 at 01:00:18AM +0900, Nathan Ollerenshaw wrote:
> 
>>On Saturday, March 15, 2003, at 12:02 AM, Thomas Eibner wrote:
>>
>>
>>>On Thu, Mar 13, 2003 at 08:27:30PM +0900, Nathan Ollerenshaw wrote:
>>>
>>>>Resending this to this list as I got no response on users list.
>>>>
>>>>Currently, we are using flat config files generated by our website
>>>>provisioning software to support our mass hosted customers. The reason
>>>>for doing it this way, and not using the mod_vhost_alias module is
>>>>because we need to be able to turn on/off CGI, PHP, Java, shtml etc on
>>>>a per vhost basis. We need the power that having a distinct
>>>><VirtualHost> directive for each site gives you.
>>>>
>>>>Is there a better way?

I don't know of a specific virtual host hook, but if there isn't there might be a need for it.

I guess you need to have someplace which calls your module's hook *before* the server definition 
gets set, and allows you to run a pre-config/post-config & followup merge for all the modules 
currently loaded on the first time the server-name is loaded into memory, and then pass the 
resulting server-config down to the rest of the hooks.

this should make it possible to allow you to do anything in your module that the plaintext v-host 
one could do.

--Ian



Re: Advanced Mass Hosting Module

Posted by Nathan Ollerenshaw <na...@valuecommerce.ne.jp>.
On Saturday, March 15, 2003, at 01:13 AM, Thomas Eibner wrote:
> On Sat, Mar 15, 2003 at 01:00:18AM +0900, Nathan Ollerenshaw wrote:
>> I wasn't thinking of anything radical. Just have a hook to set the
>> handler for a particular document (if it matches .php or .php4) to the
>> PHP module if it's allowed to, and serve it as a normal document if
>> not. Etc.
>>
>> I've not had a great delve in the hooks but nothing has suggested in
>> what I've looked at that it's not possible.
>
> I'm not sure if it's as simple as you describe. What is to stop a user
> from placing a .htaccess file in a directory giving himself ability to
> give the right content type to execute a php script for instance?
> If you want suexec to work too, there might be further complications.
> (Just thinking out loud here) :)

You bring up a valid point, but I was thinking more of sbox. Thats what 
use use currently (because suexec didn't fit our model) and it works 
great. Though, there seems to be a bug where it's poisoning the 
environment ...

At any rate, if I'm interfering around the URI-to-filename translation 
phase first, I should be able to minimise any problems with .htaccess 
files. But, I don't know, I don't fully understand all the phases that 
I can interfere with just yet :)

There are other phases I've not really looked at as well which I could 
hook into to do extra sanity checks, I guess. But, I think, get the 
thing basically working, then narrow down all the annoying security 
holes it will make, eh?

>> I really need to get a proof-of-concept working; maybe this weekend if
>> my other half gives me a 'allowed to use computer' note for the 
>> teacher.
>
> What would you consider a proof-of-concept? I have my code lurking on 
> some
> machine in cvs if you want to take a look at it.

If my feeble coding skills are up to it :) I've requested a new sf.net 
project, so in a couple of days I should be able to put up my hacky 
bits of code.

Really, I only started programming C with a vengeance about a week ago. 
I'm an old perl hacker, and never felt a need to use C. So fear my 
code. Expect apache to segfault. ;)

Nathan.

-- 
Nathan Ollerenshaw - Systems Engineer - Shared Hosting
ValueCommerce Japan - http://www.valuecommerce.ne.jp

I'm your blubber boy you should rub me
The sun beat me down too viciously
I fell into the ground to what I used to be
I've melted away I'm nothing again


Re: Advanced Mass Hosting Module

Posted by Thomas Eibner <ei...@mnmailhost.bridge.com>.
On Sat, Mar 15, 2003 at 01:00:18AM +0900, Nathan Ollerenshaw wrote:
> On Saturday, March 15, 2003, at 12:02 AM, Thomas Eibner wrote:
> 
> >On Thu, Mar 13, 2003 at 08:27:30PM +0900, Nathan Ollerenshaw wrote:
> >>Resending this to this list as I got no response on users list.
> >>
> >>Currently, we are using flat config files generated by our website
> >>provisioning software to support our mass hosted customers. The reason
> >>for doing it this way, and not using the mod_vhost_alias module is
> >>because we need to be able to turn on/off CGI, PHP, Java, shtml etc on
> >>a per vhost basis. We need the power that having a distinct
> >><VirtualHost> directive for each site gives you.
> >>
> >>Is there a better way?
> >
> >I once started a project to do this from a database, I eventually 
> >stopped
> >as I couldn't figure out a nice way to enable/disable php,cgi,whatever 
> >on
> >demand. Serving virtualhosts from documentroots you pull out of a 
> >database
> >is no big deal.
> 
> I wasn't thinking of anything radical. Just have a hook to set the 
> handler for a particular document (if it matches .php or .php4) to the 
> PHP module if it's allowed to, and serve it as a normal document if 
> not. Etc.
> 
> I've not had a great delve in the hooks but nothing has suggested in 
> what I've looked at that it's not possible.

I'm not sure if it's as simple as you describe. What is to stop a user
from placing a .htaccess file in a directory giving himself ability to
give the right content type to execute a php script for instance? 
If you want suexec to work too, there might be further complications.
(Just thinking out loud here) :)
 
> I really need to get a proof-of-concept working; maybe this weekend if 
> my other half gives me a 'allowed to use computer' note for the teacher.

What would you consider a proof-of-concept? I have my code lurking on some
machine in cvs if you want to take a look at it. 


Re: Advanced Mass Hosting Module

Posted by Nathan Ollerenshaw <na...@valuecommerce.ne.jp>.
On Saturday, March 15, 2003, at 12:02 AM, Thomas Eibner wrote:

> On Thu, Mar 13, 2003 at 08:27:30PM +0900, Nathan Ollerenshaw wrote:
>> Resending this to this list as I got no response on users list.
>>
>> Currently, we are using flat config files generated by our website
>> provisioning software to support our mass hosted customers. The reason
>> for doing it this way, and not using the mod_vhost_alias module is
>> because we need to be able to turn on/off CGI, PHP, Java, shtml etc on
>> a per vhost basis. We need the power that having a distinct
>> <VirtualHost> directive for each site gives you.
>>
>> Is there a better way?
>
> I once started a project to do this from a database, I eventually 
> stopped
> as I couldn't figure out a nice way to enable/disable php,cgi,whatever 
> on
> demand. Serving virtualhosts from documentroots you pull out of a 
> database
> is no big deal.

I wasn't thinking of anything radical. Just have a hook to set the 
handler for a particular document (if it matches .php or .php4) to the 
PHP module if it's allowed to, and serve it as a normal document if 
not. Etc.

I've not had a great delve in the hooks but nothing has suggested in 
what I've looked at that it's not possible.

I really need to get a proof-of-concept working; maybe this weekend if 
my other half gives me a 'allowed to use computer' note for the teacher.

Nathan.

-- 
Nathan Ollerenshaw - Systems Engineer - Shared Hosting
ValueCommerce Japan - http://www.valuecommerce.ne.jp

"You can't be a Real Country unless you have a BEER and an airline -
it helps if you have some kind of a football team or some nuclear
weapons, but at the very least you need a BEER." - Frank Zappa



Re: Advanced Mass Hosting Module

Posted by Thomas Eibner <ei...@mnmailhost.bridge.com>.
On Thu, Mar 13, 2003 at 08:27:30PM +0900, Nathan Ollerenshaw wrote:
> Resending this to this list as I got no response on users list.
> 
> Currently, we are using flat config files generated by our website
> provisioning software to support our mass hosted customers. The reason
> for doing it this way, and not using the mod_vhost_alias module is
> because we need to be able to turn on/off CGI, PHP, Java, shtml etc on
> a per vhost basis. We need the power that having a distinct
> <VirtualHost> directive for each site gives you.
> 
> Is there a better way?

I once started a project to do this from a database, I eventually stopped
as I couldn't figure out a nice way to enable/disable php,cgi,whatever on
demand. Serving virtualhosts from documentroots you pull out of a database
is no big deal.


Re: Advanced Mass Hosting Module

Posted by David Burry <db...@tagnet.org>.
You and someone else said the same thing.  I currently have a setup where we
run several hundred vhosts (all individually specified) without issue, I'll
have to remember this if it ever grows to thousands.  Thanks.  With the lack
of a more powerful vhost-alias type thing, I'll probably have to vhost-alias
all the "standard" bare bones configs, and list out the anomalies
separately....

Dave

----- Original Message -----
From: "Mads Toftum" <ma...@toftum.dk>
To: <de...@httpd.apache.org>
Sent: Friday, March 14, 2003 12:55 AM
Subject: Re: Advanced Mass Hosting Module


> On Thu, Mar 13, 2003 at 04:55:19PM -0800, David Burry wrote:
> > These are neat ideas.  At a few companies I've worked for we already do
> > similar things but we have scripts that generate the httpd.conf files
> > and distribute them out to the web servers and gracefully restart.
> > Adding a new web server machine to the mix is as simple as adding the
> > host name to the distribution script.
> >
> This only works when you have a limited number of vhosts - if you were
> to run thousands of vhosts on each machine, then mod_vhost_alias
> (or mod_rewrite) is currently the only way to go. A module like this
> could provide a nice compromise between the flexibility of using
> httpd.conf to specify each vhost and the "speed" of vhost_alias.
>
> vh
>
> Mads Toftum
> --
> `Darn it, who spiked my coffee with water?!' - lwall
>


Re: Advanced Mass Hosting Module

Posted by Mads Toftum <ma...@toftum.dk>.
On Thu, Mar 13, 2003 at 04:55:19PM -0800, David Burry wrote:
> These are neat ideas.  At a few companies I've worked for we already do
> similar things but we have scripts that generate the httpd.conf files
> and distribute them out to the web servers and gracefully restart.
> Adding a new web server machine to the mix is as simple as adding the
> host name to the distribution script.
> 
This only works when you have a limited number of vhosts - if you were
to run thousands of vhosts on each machine, then mod_vhost_alias
(or mod_rewrite) is currently the only way to go. A module like this
could provide a nice compromise between the flexibility of using 
httpd.conf to specify each vhost and the "speed" of vhost_alias.

vh

Mads Toftum
-- 
`Darn it, who spiked my coffee with water?!' - lwall


Re: Advanced Mass Hosting Module

Posted by Nathan Ollerenshaw <na...@valuecommerce.ne.jp>.
On Friday, March 14, 2003, at 10:15 AM, Zac Stevens wrote:

> On Thu, Mar 13, 2003 at 04:55:19PM -0800, David Burry wrote:
>> These are neat ideas.  At a few companies I've worked for we already 
>> do
>> similar things but we have scripts that generate the httpd.conf files
>> and distribute them out to the web servers and gracefully restart.
>> Adding a new web server machine to the mix is as simple as adding the
>> host name to the distribution script.
>
> I've done the same in the past.  It works fine, but becomes unweildy 
> when
> you're talking about thousands of sites per server.  Graceful restarts 
> also
> take a nontrivial amount of time in this environment.

Even a few hundred sites are now taking an inordinate time to do a 
graceful - our config is on NFS, with a separate file for each site - a 
design decision that I am beginning to regret... I did some testing, 
but I didn't account for the fact that I'd be loading the configs over 
NFS. Not great.

>> What you're talking about doing sounds like a lot more complexity to
>> achieve a similar thing, and more complexity means there's a lot more
>> that can go wrong.  For instance, what are you going to do if the LDAP
>> server is down, are many not-yet-cached virtual hosts just going to
>> fail?
>
> Redundant LDAP servers?  Or even pluggable backends - keep a DBM-format
> copy on the local filesystem as a backup.  I imagine many people would 
> be
> happy with a default vhost specified in the config, which could 
> display an
> "Ooops! Something's broken!" page.

We use redundancy everywhere, the backend LDAP is no exception tho this 
rule.

The main reason for LDAP is because we have a front-end provisioning 
system that creates accounts for FTP and Email in LDAP, it would be 
nice to keep the website configurations in there too, without the 
provisioning system having to write apache config files.

You're right, of course. Some form of graceful failure would be needed, 
but it would probably be a 'Temporarily Unavailable' error with a 
custom error page in Japanese and English (most of our customers are 
Japanese).

> In my experience, the 80:20 rule definitely applies here - and I would 
> be
> inclined to suggest the ratio is even more severe.  That is, more than 
> 80%
> of the vhosts contribute less than 20% of the load.  While the dynamic
> reconfiguration afforded by this proposal is a big win, I'm more 
> impressed
> with the opportunity to minimise the amount of wasted resources in 
> large
> environments.

This equates with my experience too. It irks me that apache spends a 
large amount of time and memory holding the configuration for a bunch 
of sites that only get hit maybe once a day (when the owner loads the 
page to see if the hit counter has increased - HAH!)

> I'm interested to hear whether this is feasible for development against
> 2.0, as I don't believe the current architecture allows for plugging in
> this sort of functionality as a 3rd-party module.

I was looking at implementing it in the URI-to-filename translation 
phase. Any memory malloc'd for a in-memory cache would only be 
accessable by that particular child, but that would not be so bad for a 
v1.0 implementation of the module.

In the future, we might look at shmem or something like that. Even a DB 
file held on a ramdisk might be acceptable (if a little perverse).

Nathan.

-- 
Nathan Ollerenshaw - Systems Engineer - Shared Hosting
ValueCommerce Japan - http://www.valuecommerce.ne.jp

In the days, When we were swinging form the trees
I was a monkey, Stealing honey from a swarm of bees
I could taste, I could taste you even then
And I would chase you down the wind


Re: Advanced Mass Hosting Module

Posted by Zac Stevens <zt...@cryptocracy.com>.
On Thu, Mar 13, 2003 at 04:55:19PM -0800, David Burry wrote:
> These are neat ideas.  At a few companies I've worked for we already do
> similar things but we have scripts that generate the httpd.conf files
> and distribute them out to the web servers and gracefully restart.
> Adding a new web server machine to the mix is as simple as adding the
> host name to the distribution script.

I've done the same in the past.  It works fine, but becomes unweildy when
you're talking about thousands of sites per server.  Graceful restarts also
take a nontrivial amount of time in this environment.

> What you're talking about doing sounds like a lot more complexity to
> achieve a similar thing, and more complexity means there's a lot more
> that can go wrong.  For instance, what are you going to do if the LDAP
> server is down, are many not-yet-cached virtual hosts just going to
> fail?  

Redundant LDAP servers?  Or even pluggable backends - keep a DBM-format
copy on the local filesystem as a backup.  I imagine many people would be
happy with a default vhost specified in the config, which could display an
"Ooops! Something's broken!" page.

In my experience, the 80:20 rule definitely applies here - and I would be
inclined to suggest the ratio is even more severe.  That is, more than 80% 
of the vhosts contribute less than 20% of the load.  While the dynamic 
reconfiguration afforded by this proposal is a big win, I'm more impressed 
with the opportunity to minimise the amount of wasted resources in large
environments.

I'm interested to hear whether this is feasible for development against
2.0, as I don't believe the current architecture allows for plugging in
this sort of functionality as a 3rd-party module.



Zac

Re: Advanced Mass Hosting Module

Posted by Nathan Ollerenshaw <na...@valuecommerce.ne.jp>.
On Friday, March 14, 2003, at 09:55 AM, David Burry wrote:

> These are neat ideas.  At a few companies I've worked for we already do
> similar things but we have scripts that generate the httpd.conf files
> and distribute them out to the web servers and gracefully restart.
> Adding a new web server machine to the mix is as simple as adding the
> host name to the distribution script.

Yup. Not too dissimilar to what we use right now. We have a shared NFS 
filesystem mounted on all the apache servers with a single level tree 
of config files, one per domain. Apache just includes the base 
directory.

This sucks, performance wise. Convenience wise, it's great.

The NFS server is a High Availability setup, so thats cool. And even if 
I was worried about the NFS going away and the server not being able to 
read it's configs, the point is mute - the NFS server also holds the 
docs.

> What you're talking about doing sounds like a lot more complexity to
> achieve a similar thing, and more complexity means there's a lot more
> that can go wrong.  For instance, what are you going to do if the LDAP

Normally, I'd agree. But like what was mentioned before, you have to 
load thousands, or if you're really lucky, tens of thousands of virtual 
hosts into your apache daemon. Eventually what happens is the apache 
daemon starts using an inordinate amount of ram just to load all those 
configurations into memory, and reloading takes an age.

At least with 1.3, I saw a massive memory usage when loading 5,000 
virtualhosts in a test. I am not sure about 2.0.

Besides. I don't want to have to keep restarting my apache daemon 
*every time* someone wants to enable/disable php on their site. It 
ruins the uptime! ;)

> server is down, are many not-yet-cached virtual hosts just going to
> fail?  In our scenario it's solved simply and easily by the generation
> script simply failing and nothing being copied (but at least the web
> servers keep working fine with the last config revision, so not 
> many/any
> end user web surfers will notice the outage).

Have more than one LDAP server :) This is easy to do, LDAP allows for 
it, and as long as the client software is smart (stops trying to use a 
borked LDAP server) you won't even notice the failure of a back-end 
LDAP slave.

Besides, LDAP is much-maligned. I've been running LDAP in production 
systems for a long time now, and I've never had one just up and die on 
me.

The ability to store all your configuration data in one place overrides 
the inconvenience of having to manage another set of servers.

Nathan.

-- 
Nathan Ollerenshaw - Systems Engineer - Shared Hosting
ValueCommerce Japan - http://www.valuecommerce.ne.jp

In the days, When we were swinging form the trees
I was a monkey, Stealing honey from a swarm of bees
I could taste, I could taste you even then
And I would chase you down the wind


RE: Advanced Mass Hosting Module

Posted by David Burry <db...@tagnet.org>.
These are neat ideas.  At a few companies I've worked for we already do
similar things but we have scripts that generate the httpd.conf files
and distribute them out to the web servers and gracefully restart.
Adding a new web server machine to the mix is as simple as adding the
host name to the distribution script.

What you're talking about doing sounds like a lot more complexity to
achieve a similar thing, and more complexity means there's a lot more
that can go wrong.  For instance, what are you going to do if the LDAP
server is down, are many not-yet-cached virtual hosts just going to
fail?  In our scenario it's solved simply and easily by the generation
script simply failing and nothing being copied (but at least the web
servers keep working fine with the last config revision, so not many/any
end user web surfers will notice the outage).

Dave

-----Original Message-----
From: Nathan Ollerenshaw [mailto:nathan@valuecommerce.ne.jp] 
Sent: Thursday, March 13, 2003 3:28 AM
To: dev@httpd.apache.org
Subject: Advanced Mass Hosting Module


Resending this to this list as I got no response on users list.

Currently, we are using flat config files generated by our website
provisioning software to support our mass hosted customers. The reason
for doing it this way, and not using the mod_vhost_alias module is
because we need to be able to turn on/off CGI, PHP, Java, shtml etc on a
per vhost basis. We need the power that having a distinct <VirtualHost>
directive for each site gives you.

Is there a better way?

What I have in mind is a module that fits in with our current LDAP based
infrastructure. Currently, LDAP services our mail users, and I would
like to see the Apache mass hosting configuration held in LDAP as well.
In this way, we can just scale by adding more apache servers, mounting
the shared docroot and pointing them to the LDAP server.

The LDAP entry would look something like this:

# www.example.com, base
dn: uid=www.example.com, o=base
siteGidNumber: 10045
siteUidNumber: 10045
objectClass: top
objectClass: apacheVhost
serverName: www.example.com
serverAlias: example.com
serverAlias: another.example.com
docRoot: /data/web/04/09/example.com/www
vhostStatus: enabled
phpStatus: enabled
shtmlStatus: enabled
cgiStatus: enabled
dataOutSoftLimit: 1000000 (in bytes per month)
dataOutHardLimit: 10000000
dataInSoftLimit: 1000000
dataInHardLimit: 10000000
dataThrottleRate: 1000000 (in bits/sec)

Then, as a request came in, the imaginary mod_advanced_masshosting
module would first check to see if it had the information about the
domain already cached in memory (to avoid hitting LDAP for every HTTP
request, which would be a Bad Idea) and then if not, it would grab the
entry from LDAP, cache it, and service the incoming requests.

The cache itself would need to be shared among the actual child apache
processes somehow.

In addition to these features, the module would keep track of the amount
of data transferred in & out for each vhost and apply a soft/hard limit
when the limits defined in the LDAP entry were reached. The amount of
actual data transferred would periodically be written to either a GDBM
file or even to an LDAP entry (not sure what is best - probably LDAP for
consistency) and the data would also need to be shared among any servers
in a cluster somehow.

This would enable ISPs to bill on a per vhost basis fairly accurately,
and limit abusive sites.

Now, I've looked around for something like this, and as far as I can
see, there isn't anything that does vhosting quite like this, except for
the commercial systems out there such as Zeus.

Do people think this is a good approach?

Will another method give me what I want? (LDAP is not a dependency, just
a nice-to-have)

Finally, I am thinking about starting an Open Source project to write
this module. My C is pretty primitive right now, though I have got
simple LDAP lookup code working already (just not in Apache, yet).

Would anyone else see this as a worthwhile project for Apache?

It certainly would solve our problems, but it sometimes feels like I'm
trying to fix a simple problem with something very heavy - though
implemented correctly, I don't think performance will be a problem.

Comments gratefully received :)

Regards,

Nathan.

-- 
Nathan Ollerenshaw - Systems Engineer - Shared Hosting ValueCommerce
Japan - http://www.valuecommerce.ne.jp

If you think nobody cares if you're alive, try missing a
couple of car payments.