You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by "Zembower, Kevin" <kz...@jhuccp.org> on 2006/10/05 21:19:44 UTC

[users@httpd] Help with rewrite for errors?

I have a number of documents in HTML files like this:

www.popline.org/docs/0784/045796.html
www.popline.org/docs/0429/209471.html
www.popline.org/docs/0003/690206.html

In most of these records, the link is broken (as it is in these three
examples). This is a result of old files still in Google.

However, in these three cases, the original document can be found by
removing the 4 digit directory and the '.html' thusly:

www.popline.org/docs/045796
www.popline.org/docs/209471
www.popline.org/docs/690206

Because of the nature of our system, these resolve correctly.

Can anyone help me with a set of RewriteRules that will, whenever a 404
error is generated, transform the URL as indicated and resubmit it?

Here are the current Rewrite rules in my system:
   RewriteEngine on
   RewriteLog /var/www/popline/logs/rewrite.log
   #Turn off rewritelog with level 0. 2 is useful/normal.
   RewriteLogLevel 0
   RewriteRule ^/docs$  /docs/index.html
   RewriteRule ^/docs/$ /docs/index.html
   RewriteRule ^/docs/index.*                   -       [L]     #If this
matches, don't do any rewriting
   RewriteRule ^/error/.*                       -       [L]     #If this
matches, don't do any rewriting, so error pages come up correctly
   RewriteRule ^/404.shtml                      -       [L]     #If this
matches, don't do any rewriting, so error pages come up correctly
   RewriteRule ^/docs/sitemap.*                 -       [L]     #If this
matches, don't do any rewriting. For Google sitemap program
   RewriteRule ^/docs/[0-9]{4}/[0-9]{6}\.html   -       [L]     #If this
matches, don't do any rewriting
   #Note that in RewriteRule below, must use %3F for '?' after
'icswppro.dll'. '?' has special meaning in Rewrite substitutions.
   RewriteRule ^/docs/([0-9]{6})$
http://db.jhuccp.org/ics-wpd/exec/icswppro.dll?BU=http://db.jhuccp.org/i
cs-wpd/exec/icswppro.dll&QF0=DocNo&QI0=$1&TN=Popline&AC=QBE_QUERY&MR=30\
%DL=1&&RL=1&&RF=LongRecordDisplay&DF=LongRecordDisplay
[P]
   RewriteRule ^/docs/[0-9]{4}.*                -       [L]     #If this
matches, don't do any rewriting
   RewriteRule ^/.*$ http://db.jhuccp.org/ics-wpd/popweb/basic.html
[R,L]

Here's an example from the current rewrite log of a 404 generation:

10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e1570/initial] (2) init rewrite
engine with requested uri /docs/0784/045796.html
10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e1570/initial] (1) pass through
/docs/0784/045796.html
10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e2d30/initial/redir#1] (2) init
rewrite engine with requested uri /404.shtml
10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e2d30/initial/redir#1] (1) pass
through /404.shtml

Here's an earlier excerpt from the rewrite log, before I filtered out
the 'HTTP_NOT_FOUND' information:

10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e3760/initial] (2) init rewrite
engine with requested uri /docs/0211/772369.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e3760/initial] (1) pass through
/docs/0211/772369.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (2) init
rewrite engine with requested uri /error/HTTP_NOT_FOUND.html.var
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (2) rewrite
/error/HTTP_NOT_FOUND.html.var ->
http://db.jhuccp.org/ics-wpd/popweb/basic.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (2)
explicitly forcing redirect with
http://db.jhuccp.org/ics-wpd/popweb/basic.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (1) escaping
http://db.jhuccp.org/ics-wpd/popweb/basic.html for redirect
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (1) redirect
to http://db.jhuccp.org/ics-wpd/popweb/basic.html [REDIRECT/302]

My question is not so much how to transform the submitted URL into the
one without the directory and '.html'. Instead, I don't understand how
to detect the 404 condition and then invoke the rewrite rule.

Thanks in advance for all your help and suggestions.

-Kevin

Kevin Zembower
Internet Services Group manager
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
111 Market Place, Suite 310
Baltimore, Maryland  21202
410-659-6139 

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org