You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Pane <bp...@pacbell.net> on 2002/05/06 03:39:50 UTC
Fixing BrowserMatch: multi-string search algorithms?
Given a set of n BrowserMatch directives (as in,
for example, the default config that we ship with
httpd-2.0), mod_setenvif will do O(n) regex comparisons
on every request.
I want to fix that.
The best solution I've thought of so far is to
fix the (very common) special case in which the
BrowserMatch pattern doesn't really require a
regex match. If we detect during config processing
that agiven pattern only requires a simple substring
match, then we can use a Boyer-Moore search instead
of a regex search.
That would speed things up, but the implementation
would still take O(n) searches for n BrowserMatch
directives. I'd really rather have a solution in
which the request processing cost doesn't grow
linearly with the number of config directives.
Does anyone know of an efficient algorithm for comparing
an input string against a set of multiple pattern strings?
Thanks,
--Brian
Re: Fixing BrowserMatch: multi-string search algorithms?
Posted by Greg Stein <gs...@lyra.org>.
On Mon, May 06, 2002 at 07:24:47PM -0500, Scott Lamb wrote:
> Scott Lamb wrote:
> > I suggest creating a new regular expression which is each of the ones in
> > the list separated by '|'. It will use regular expression alternatives
> > to match multiple patterns with a single state machine. And add almost
> > no code.
>
> Oops, I forgot to mention that I think this is possible only where you
> are doing the same thing with each match. I.e., these three in the
> standard config could be combined in this way:
>
> BrowserMatch "RealPlayer 4\.0" force-response-1.0
> BrowserMatch "Java/1\.0" force-response-1.0
> BrowserMatch "JDK/1\.0" force-response-1.0
>
> but not with anything that does something other than just
> force-response-1.0.
Well, you could use the combined regex as an initial test on whether
anything even needs to happen. Then you can iterate to determine the
specific action.
Cheers,
-g
--
Greg Stein, http://www.lyra.org/
Re: Fixing BrowserMatch: multi-string search algorithms?
Posted by Scott Lamb <sl...@slamb.org>.
Scott Lamb wrote:
> I suggest creating a new regular expression which is each of the ones in
> the list separated by '|'. It will use regular expression alternatives
> to match multiple patterns with a single state machine. And add almost
> no code.
Oops, I forgot to mention that I think this is possible only where you
are doing the same thing with each match. I.e., these three in the
standard config could be combined in this way:
BrowserMatch "RealPlayer 4\.0" force-response-1.0
BrowserMatch "Java/1\.0" force-response-1.0
BrowserMatch "JDK/1\.0" force-response-1.0
but not with anything that does something other than just
force-response-1.0.
--
Scott Lamb
Re: Fixing BrowserMatch: multi-string search algorithms?
Posted by Scott Lamb <sl...@slamb.org>.
Brian Pane wrote:
> Given a set of n BrowserMatch directives (as in,
> for example, the default config that we ship with
> httpd-2.0), mod_setenvif will do O(n) regex comparisons
> on every request.
>
> I want to fix that.
[...]
> Does anyone know of an efficient algorithm for comparing
> an input string against a set of multiple pattern strings?
I suggest creating a new regular expression which is each of the ones in
the list separated by '|'. It will use regular expression alternatives
to match multiple patterns with a single state machine. And add almost
no code.
--
Scott Lamb
Re: Fixing BrowserMatch: multi-string search algorithms?
Posted by Justin Erenkrantz <je...@apache.org>.
On Sun, May 05, 2002 at 06:39:50PM -0700, Brian Pane wrote:
> Does anyone know of an efficient algorithm for comparing
> an input string against a set of multiple pattern strings?
Well, there seems to be a lot of work related to approximate
string matching. I don't think that'd work.
But, the guy who came up with the BNDM algorithm we used in
mod_include has this paper:
Fast Multipattern Search Algorithms for Intrusion Detection
Josué Kuri and Gonzalo Navarro
http://www.dcc.uchile.cl/~gnavarro/ps/spire00.1.ps.gz
This paper also looks interesting (has some C fragments):
A Fast Multiple String-Pattern Matching Algorithm
Sun Kim and Yanggon Kim
http://bio.informatics.indiana.edu/sunkim/papers/multi/multi2.ps
I just gave those a quick look-see, but the Related Works sections
may point you in other directions. -- justin