You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Pane <bp...@pacbell.net> on 2002/05/06 03:39:50 UTC

Fixing BrowserMatch: multi-string search algorithms?

Given a set of n BrowserMatch directives (as in,
for example, the default config that we ship with
httpd-2.0), mod_setenvif will do O(n) regex comparisons
on every request.

I want to fix that.

The best solution I've thought of so far is to
fix the (very common) special case in which the
BrowserMatch pattern doesn't really require a
regex match.  If we detect during config processing
that agiven pattern only requires a simple substring
match, then we can use a Boyer-Moore search instead
of a regex search.

That would speed things up, but the implementation
would still take O(n) searches for n BrowserMatch
directives.  I'd really rather have a solution in
which the request processing cost doesn't grow
linearly with the number of config directives.

Does anyone know of an efficient algorithm for comparing
an input string against a set of multiple pattern strings?

Thanks,
--Brian



Re: Fixing BrowserMatch: multi-string search algorithms?

Posted by Greg Stein <gs...@lyra.org>.
On Mon, May 06, 2002 at 07:24:47PM -0500, Scott Lamb wrote:
> Scott Lamb wrote:
> > I suggest creating a new regular expression which is each of the ones in 
> > the list separated by '|'. It will use regular expression alternatives 
> > to match multiple patterns with a single state machine. And add almost 
> > no code.
> 
> Oops, I forgot to mention that I think this is possible only where you 
> are doing the same thing with each match. I.e., these three in the 
> standard config could be combined in this way:
> 
>      BrowserMatch "RealPlayer 4\.0" force-response-1.0
>      BrowserMatch "Java/1\.0" force-response-1.0
>      BrowserMatch "JDK/1\.0" force-response-1.0
> 
> but not with anything that does something other than just 
> force-response-1.0.

Well, you could use the combined regex as an initial test on whether
anything even needs to happen. Then you can iterate to determine the
specific action.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Fixing BrowserMatch: multi-string search algorithms?

Posted by Scott Lamb <sl...@slamb.org>.
Scott Lamb wrote:
> I suggest creating a new regular expression which is each of the ones in 
> the list separated by '|'. It will use regular expression alternatives 
> to match multiple patterns with a single state machine. And add almost 
> no code.

Oops, I forgot to mention that I think this is possible only where you 
are doing the same thing with each match. I.e., these three in the 
standard config could be combined in this way:

     BrowserMatch "RealPlayer 4\.0" force-response-1.0
     BrowserMatch "Java/1\.0" force-response-1.0
     BrowserMatch "JDK/1\.0" force-response-1.0

but not with anything that does something other than just 
force-response-1.0.

-- 
Scott Lamb



Re: Fixing BrowserMatch: multi-string search algorithms?

Posted by Scott Lamb <sl...@slamb.org>.
Brian Pane wrote:
> Given a set of n BrowserMatch directives (as in,
> for example, the default config that we ship with
> httpd-2.0), mod_setenvif will do O(n) regex comparisons
> on every request.
> 
> I want to fix that.

[...]

> Does anyone know of an efficient algorithm for comparing
> an input string against a set of multiple pattern strings?

I suggest creating a new regular expression which is each of the ones in 
the list separated by '|'. It will use regular expression alternatives 
to match multiple patterns with a single state machine. And add almost 
no code.

--
Scott Lamb


Re: Fixing BrowserMatch: multi-string search algorithms?

Posted by Justin Erenkrantz <je...@apache.org>.
On Sun, May 05, 2002 at 06:39:50PM -0700, Brian Pane wrote:
> Does anyone know of an efficient algorithm for comparing
> an input string against a set of multiple pattern strings?

Well, there seems to be a lot of work related to approximate
string matching.  I don't think that'd work.

But, the guy who came up with the BNDM algorithm we used in
mod_include has this paper:

Fast Multipattern Search Algorithms for Intrusion Detection
Josué Kuri and Gonzalo Navarro
http://www.dcc.uchile.cl/~gnavarro/ps/spire00.1.ps.gz

This paper also looks interesting (has some C fragments):

A Fast Multiple String-Pattern Matching Algorithm
Sun Kim and Yanggon Kim
http://bio.informatics.indiana.edu/sunkim/papers/multi/multi2.ps

I just gave those a quick look-see, but the Related Works sections
may point you in other directions.  -- justin