You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@jakarta.apache.org by Talin <ta...@brodia.com> on 2000/10/11 19:47:15 UTC

Offtopic: Need algorithm for culling RE's

This is only slightly offtopic, as it relates to the RegExp package which is
technically part of the Jakarta project.

What I want to do is the reverse of the normal usage of regular expressions:
Instead of using a single RE to search thousands of static strings, I want
to use a single static string to search thousands of REs. I don't want to
have to execute each RE individually, what I want to do is quickly cull out
the RE's that can't possibly match, and then only execute some smaller set
of REs.

Now, this could be done by combining all of the RE expressions into one
giant RE and then compile it - but I also want to be able to dynamically add
and delete RE's from the set of RE's being searched.

What I need is some sort of discrimination net for REs. Has anyone seen an
algorithm or implementation like this? I would prefer to use the existing RE
package if possible, but I would happily re-implement the entire regular
expression matching code if that's what's needed to get this to work
efficiently.

(Think of this as a kind of string-based expert system, where "rules" are
defined by REs that have an associated action. You want to be able to add
and delete rules dynamically. More specifically, what I'm trying to build is
a web template system that allows "fuzzy binding" of templates - so for
example I can have a template named
"/web/projects/${projectname}-home/index.page" and it will match any string
of the form "/web/projects/xxx-home/index.page" and the variable
"$projectname" will be bound to "xxx".)