You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Jim Riggs <ap...@riggs.me> on 2013/04/25 16:35:26 UTC

New RewriteMap Help/Suggestions

I am in the process of preparing a patch to add a new RewriteMap type and could use some input from all of you on the best implementation. What I am creating is basically a clone of the txt map type, except that each line is a regexp followed by a replacement (with potential back-references).

"You can just do those as RewriteRules...no need for a map," you say. True, except that I am looking at an automated, application-created list, and I don't really want the application to be writing out configuration files or .htaccess files. I would be much more comfortable with the app writing out a map file that has limited functionality and scope. Plus, it would be easy for the app to just create 'regexp replacement' lines at build time.

So, I have created a crude, working proof-of-concept of this. It basically copies all of the functionality of the txt maps, including the cache, but in the lookup_map_regexpfile() function, it compiles the regexp for each line, attempts a match, and returns the backref-substituted replacement. (This pair gets cached.) This works beautifully as is, but it is horribly inefficient to have to compile the REs every time we come in with a new key/URL. So, I was thinking of precompiling all of them and see three options:

1. Precompile and store all of the REs at config load time.
2. Compile and store all of the REs the first time we hit lookup_map_regexpfile() or when the map file is updated.
3. Compile and store each RE as we read through the map file in lookup_map_regexpfile() until a match is found and bail (full list will be built over time).

#1 is nice, because all of the work is done up front and will be fast from then on. The problem, though, is that I would like this map to reload/refresh if the map file gets changed like the other types do. #2 and #3 solve this. With #2 I worry about performance of compiling everything if the map file gets updated and we get a thundering herd. With #3 there is some coordination to manage with respect to which lines have been compiled and which ones haven't.

Does anyone have thoughts as to:

1. When/how should the map REs be compiled/precompiled? One of the options above or something else?
2. Where should the compiled REs be stored: in an existing pool or a new one?

Thanks for any input.

- Jim


Re: New RewriteMap Help/Suggestions

Posted by Yehuda Katz <ye...@ymkatz.net>.
On Thu, Apr 25, 2013 at 10:35 AM, Jim Riggs <ap...@riggs.me> wrote:

> So, I have created a crude, working proof-of-concept of this. It basically
> copies all of the functionality of the txt maps, including the cache, but
> in the lookup_map_regexpfile() function, it compiles the regexp for each
> line, attempts a match, and returns the backref-substituted replacement.
> (This pair gets cached.) This works beautifully as is, but it is horribly
> inefficient to have to compile the REs every time we come in with a new
> key/URL. So, I was thinking of precompiling all of them and see three
> options:
>
> 1. Precompile and store all of the REs at config load time.
>
1a. Precompile and store all of the REs at config load time or when the map
file is updated.

> 2. Compile and store all of the REs the first time we hit
> lookup_map_regexpfile() or when the map file is updated.
> 3. Compile and store each RE as we read through the map file in
> lookup_map_regexpfile() until a match is found and bail (full list will be
> built over time).
>
> #1 is nice, because all of the work is done up front and will be fast from
> then on. The problem, though, is that I would like this map to
> reload/refresh if the map file gets changed like the other types do. #2 and
> #3 solve this. With #2 I worry about performance of compiling everything if
> the map file gets updated and we get a thundering herd. With #3 there is
> some coordination to manage with respect to which lines have been compiled
> and which ones haven't.
>
I think #3 is not a great idea for the same reason you mentioned.

I have actually seen the problem that you mention in #2 in a live
environment with a (poorly-designed) custom module. Each request tries to
clear the cached results and build them again, very quickly overloading the
server.

You could potentially use something like ap_hook_monitor to watch the file
for changes, paired with 1a (not sure how much load that might add). In my
regular apache module reference (Nick Kew's Apache Modules Book which I
keep on my office bookshelf) it is mentioned quickly (pages 67, 268, 337).

- Y