You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jon Shoberg <jo...@shoberg.net> on 2005/10/03 12:36:00 UTC

problem, Limiting dynamic pages with static URLs

Some sites use relative links and the fetcher is getting confused.  See 
the example below:

http://www.domain.xyz/index.php/research/academics/research/libraries/

The content returned simply keeps following the few relative links and 
the URI keeps building.  It basically the same problem as sessionIDs but 
not something to clealy regex out.

Anyone see this before? Thoughts?

-j

Re: problem, Limiting dynamic pages with static URLs

Posted by Doug Cutting <cu...@nutch.org>.
Please see:

http://www.mail-archive.com/nutch-dev@incubator.apache.org/msg00634.html

Doug

Jon Shoberg wrote:
> Some sites use relative links and the fetcher is getting confused.  See 
> the example below:
> 
> http://www.domain.xyz/index.php/research/academics/research/libraries/
> 
> The content returned simply keeps following the few relative links and 
> the URI keeps building.  It basically the same problem as sessionIDs but 
> not something to clealy regex out.
> 
> Anyone see this before? Thoughts?
> 
> -j