You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Martin Pool <mb...@wistful.humbug.org.au> on 1998/12/30 10:47:38 UTC

hashing virtual host names

I was looking through src/main/http_vhost.c the other day and saw that
IP-based virtual hosts are indexed by a hash, but name-based virtual
hosts are not.  I should think most servers these days have
considerably more name-based vhosts than IP addresses: would it be
worth adding a hash on name to make check_hostalias() go faster?

-- 
Martin Pool

Re: hashing virtual host names

Posted by Martin Pool <mb...@wistful.humbug.org.au>.
On Wed, Dec 30, 1998 at 07:18:54PM -0800, Dean Gaudet wrote:
> Yep.  You need multiple hash tables, they'd replace the name_chains in
> http_vhost.c.

I'd already registered that, but hadn't thought about wildcards.
Thank you for the notes, Dean.

> Since you've got some time at config-time to build a nice data structure
> you can probably use something fancy to handle the last case.  Try taking
> the domain names in reverse and sticking them into a search tree... I
> bet with a little thought you can solve the second problem.

I'll give it some thought.  As you say, if there's no sufficiently
general and simple way to do it with hashing it can always fall back
to linear search.

> Finally -- I'm not sure how critical this is, because we do provide
> the tools (via UseCanonicalName and mod_rewrite) to host zillions of
> vhosts without a single addition to the config files.  I think there's
> been some posts describing how to do this... (and someone was going to
> write a special module so you didn't have to use the full
> mod_rewrite).

The only reason we [=server101.com] are using virtual hosts right now
is that we want to run scripts within each using the suEXEC User
directive, and that requires a VirtualHost scope.  Perhaps we'd be
better off with rewrite rules and using something based on cgiwrap.

On the other hand, I imagine many people _will_ configure Apache with
lots of virtual hosts because that's the obvious way to do it.  (I'm
imagining a ${middle-brow computer magazine} reporter setting up
Apache and IIS with 100 vhosts without really reading the manual.)  It
seems like it'd be nice if it performed well when used in the obvious
way.

-- 
Martin Pool

Re: hashing virtual host names

Posted by Dean Gaudet <dg...@arctic.org>.
Yep.  You need multiple hash tables, they'd replace the name_chains in
http_vhost.c.

The difficulty is choosing the hashing function because we allow wildcards
in ServerAlias.  My suggestion is to support folks that do the expected
things, such as:

    <VirtualHost 10.1.1.1>
	ServerName blah.dom
	ServerAlias *.blah.dom
	...
    </VirtualHost>

i.e. support wildcards that match one or more domain components, followed
by some fixed components.  Folks using other weirdnesses can live with
linear lookup.

Also, because we didn't really plan for this you've got to worry about
this:

    <VirtualHost 10.1.1.1>
	ServerName foo.blah.dom
	...
    </VirtualHost>

    <VirtualHost 10.1.1.1>
	ServerName blah.dom
	ServerAlias *.blah.dom
	...
    </VirtualHost>

Where foo.blah.dom goes to the special vhost, and the rest of blah.dom
goes to a wildcard vhost.  Again, you can just say "too bad" for the
folks needing this, and they get linear lookup as well.  But a little
creativity with the data structure can solve this one too I think.

Since you've got some time at config-time to build a nice data structure
you can probably use something fancy to handle the last case.  Try taking
the domain names in reverse and sticking them into a search tree... I
bet with a little thought you can solve the second problem.

Finally -- I'm not sure how critical this is, because we do provide
the tools (via UseCanonicalName and mod_rewrite) to host zillions of
vhosts without a single addition to the config files.  I think there's
been some posts describing how to do this... (and someone was going to
write a special module so you didn't have to use the full mod_rewrite).
Most of these push the "hashing" function into the filesystem, where
it's either already solved, or easier to solve.

Dean

On Wed, 30 Dec 1998, Martin Pool wrote:

> I was looking through src/main/http_vhost.c the other day and saw that
> IP-based virtual hosts are indexed by a hash, but name-based virtual
> hosts are not.  I should think most servers these days have
> considerably more name-based vhosts than IP addresses: would it be
> worth adding a hash on name to make check_hostalias() go faster?
> 
> -- 
> Martin Pool
>