You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Marc Slemko <ma...@worldgate.com> on 1997/07/27 07:38:38 UTC

making logresolve faster

While the best answer to the question of how to make logresolve faster is
"cp /bin/cat logresolve", some people have an odd idea that they need
reverse lookups for the people in suits or because they like having fun
looking at hostnames.

With that idea in mind, I looked at logresolve.  It took 370 seconds to
handle a 10000 line logfile on a P120, 96 meg RAM, decent net connection.
My named cache was cleared, of course, between each test.

I then fiddled with resolver options so it timed out after 1 second (only
resulted in missing lookups for 75 hosts; haven't tried increasing it to 2
or 3 to see the tradeoffs), didn't retry, and used persistent TCP
connections instead of UDP.  That took it down to 212 seconds. 

I then decided that since it was obviously network limited, I should try
multiple lookups at once.  I wrote a driver program that forked a bunch of
logresolves and passed one line of the logfile to the first, the next to
the second, next to the third, etc.  Very simplistic model.  Note that it
doesn't always result in a logfile that is in exactly the right order.  It
also doesn't do some things it should like keep the history hash in the
parent process and just have the child processes do lookups.  Running 60
childs at once, I got the time down to 20 seconds.  At this point I was
getting CPU limited, so increasing the number of child processes wouldn't
help.  From 370 seconds to 20 seconds is a decent speedup.

Would be interesting to compare a threaded approach, but I can't do
multiple simultaneous threaded lookups.  It could also be optimized a bit
more.  logresolve was running on the same machine as the name server.

Note that my current implementation is nasty on the cache; if I have 60
child processes, and have a bunch of entries for the same host one after
another, then they will be distributed to different child processes with
different caches so it will ask the nameserver again.  Fixing this could
help cut down the CPU usage.

Anyone have any other suggestions for speeding things up?  I should give
it a try with logresolve v2, which implements things like an on-disk DBM
cache of results.  Well, for that one I would really have to shift that
code into the parent but that takes effort, although it could eliminate
the out-of-order behavior without being too expensive.

Re: making logresolve faster

Posted by Dean Gaudet <dg...@arctic.org>.

On Sun, 27 Jul 1997, Marc Slemko wrote:

> > Look at, I think it's called, "libar" or something like that, it's with
> > contrib in bind.  It's an asynch resolver... no need for threads, I think
> > it's select event model. 
> 
> I didn't look hard, but it seems to me that it queues up requests and
> executes them sequentially, not simultaneously.
> 
> Same problem as using threaded DNS resolution; some systems have calls to
> do it, but they still serialize the lookups.

Bleh.  What a sad state of affairs.

I didn't look too hard at arlib either, I think it might do what we want
but it's both old, and appears to play lots of ptr<->int games. 

Dean

Re: making logresolve faster

Posted by Marc Slemko <ma...@worldgate.com>.

On Sat, 26 Jul 1997, Dean Gaudet wrote:

> 
> Ideally the hash wouldn't be kept on disk.  Or you should at least make
> some effort at timing out your cache entries.

I'm not a particularily big fan of the keep-on-disk club, largely because
my nameserver is up for weeks or months anyway, and it handles expiration
better than any cache code I would write would.  I do see the attraction
of it though.

> 
> Look at, I think it's called, "libar" or something like that, it's with
> contrib in bind.  It's an asynch resolver... no need for threads, I think
> it's select event model. 

I didn't look hard, but it seems to me that it queues up requests and
executes them sequentially, not simultaneously.

Same problem as using threaded DNS resolution; some systems have calls to
do it, but they still serialize the lookups.

> 
> Dean
> 
> On Sat, 26 Jul 1997, Marc Slemko wrote:
> 
> > I then decided that since it was obviously network limited, I should try
> > multiple lookups at once.
>

Re: making logresolve faster

Posted by Dean Gaudet <dg...@arctic.org>.

Two pass, one pass to get a unique set of ip names to resolve, then a pass
to resolve them in parallel.  This deals with the ordering issues as well. 
Too bad it won't work as a pipe anymore though ...

If you want it to work as a pipe, and retain ordering (imho ordering is a
nice thing) then you'll have to do some queueing in the parent.  Pass just
ip addresses to the children, let them pass back success/failure.  Poke
the success/failure into your queue (and into a hash) and flush as much as
you can.

Ideally the hash wouldn't be kept on disk.  Or you should at least make
some effort at timing out your cache entries.

Look at, I think it's called, "libar" or something like that, it's with
contrib in bind.  It's an asynch resolver... no need for threads, I think
it's select event model. 

Dean

On Sat, 26 Jul 1997, Marc Slemko wrote:

> I then decided that since it was obviously network limited, I should try
> multiple lookups at once.