You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Tomi N/A <he...@gmail.com> on 2007/04/01 16:38:04 UTC

Re: Crawling + Indexing staging vs. production and URL conflict

2007/3/31, Sami Siren <ss...@gmail.com>:

> You could also let your reverse proxy do the rewriting using something
> like http://apache.webthing.com/mod_proxy_html/. I have been using
> something like that for rewriting massive amount of html in realtime for
> AA purposes to hammer web applications to different url space.

Does it put the server under noticeable additional load?

t.n.a.

Re: Crawling + Indexing staging vs. production and URL conflict

Posted by Sami Siren <ss...@gmail.com>.
Tomi N/A wrote:
> 2007/3/31, Sami Siren <ss...@gmail.com>:
> 
>> You could also let your reverse proxy do the rewriting using something
>> like http://apache.webthing.com/mod_proxy_html/. I have been using
>> something like that for rewriting massive amount of html in realtime for
>> AA purposes to hammer web applications to different url space.
> 
> Does it put the server under noticeable additional load?

We ran reverse proxy (with AA) on separate machines and the load on the
machines was minimal, network latency was more overhead (thinking of
page download times) than rewriting couple of absolute urls. I should
note however that we did not use that particular rewriter but a very
similar "home brew" solution.

--
 Sami Siren