You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Sander Striker <st...@apache.org> on 2001/09/05 13:15:14 UTC

RE: [PATCH] Potential replacement for find_start_sequence...

> From: Sascha Schumann [mailto:sascha@schumann.cx]
> On Wed, 5 Sep 2001, Sander Striker wrote:
> 
>>> I'm not totally sure I'm sold on this approach being better.  But,
>>> I'm not sure that it is any worse either.  Don't have time to
>>> benchmark this right now.  I'm going to throw it to the wolves and
>>> see what you think.
>>
>> Me neither.  Rabin-Karp introduces a lot of * and %.
>> I'll try Boyer-Moore with precalced tables for '<!--#' and '--->'.
> 
>     Well, there are more advanced algorithms than BM available
>     today which are even easier to implement that the original BM
>     algo.

Easier to implement?  BM is so simple, I can't imagine something
being simpler.
 
>     I'd suggest looking at BNDM which combines the advantages of
>     bit-parallelism (shift-and/-or algorithms) and suffix
>     automata (BM-style).
> 
>     http://citeseer.nj.nec.com/navarro01fast.html
> 
>     To give you an idea on how a bndm implementation looks like,
>     I'm appending an unpolished implementation I did some time
>     ago which includes a test-case for locating '<!--#'.

Indeed the LOC is low.  Very cool!  Considering that we can
precal a bndm_t for '<!--#' and '--->'.
It is certainly more advanced.

>     - Sascha                                     Experience IRCG
>       http://schumann.cx/                http://schumann.cx/ircg

Ian, since you were doing the mod_include.c optimizations anyway,
can you check this out?

Btw, do we have any string matching code in apr or apr-util?  I
can see the applicability of decent string matchers in other parts
than mod_include.

Sander


RE: [PATCH] Potential replacement for find_start_sequence...

Posted by Sascha Schumann <sa...@schumann.cx>.
> Btw, do we have any string matching code in apr or apr-util?  I
> can see the applicability of decent string matchers in other parts
> than mod_include.

    Note that BNDM inherits the limit of shift-and/or algos --
    they only work well for patterns which are not longer than
    the number of bits representable by a machine word
    (sizeof(int)<<3).  In an application like Apache, long search
    patterns are uncommon, but a generic string-matching facility
    would usually need to handle these.

    - Sascha                                     Experience IRCG
      http://schumann.cx/                http://schumann.cx/ircg


RE: [PATCH] Potential replacement for find_start_sequence...

Posted by Sascha Schumann <sa...@schumann.cx>.
> Btw, do we have any string matching code in apr or apr-util?  I
> can see the applicability of decent string matchers in other parts
> than mod_include.

    Note that BNDM inherits the limit of shift-and/or algos --
    they only work well for patterns which are not longer than
    the number of bits representable by a machine word
    (sizeof(int)<<3).  In an application like Apache, long search
    patterns are uncommon, but a generic string-matching facility
    would usually need to handle these.

    - Sascha                                     Experience IRCG
      http://schumann.cx/                http://schumann.cx/ircg