You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jon Shoberg <jo...@shoberg.net> on 2005/10/03 12:36:00 UTC
problem, Limiting dynamic pages with static URLs
Some sites use relative links and the fetcher is getting confused. See
the example below:
http://www.domain.xyz/index.php/research/academics/research/libraries/
The content returned simply keeps following the few relative links and
the URI keeps building. It basically the same problem as sessionIDs but
not something to clealy regex out.
Anyone see this before? Thoughts?
-j
Re: problem, Limiting dynamic pages with static URLs
Posted by Doug Cutting <cu...@nutch.org>.
Please see:
http://www.mail-archive.com/nutch-dev@incubator.apache.org/msg00634.html
Doug
Jon Shoberg wrote:
> Some sites use relative links and the fetcher is getting confused. See
> the example below:
>
> http://www.domain.xyz/index.php/research/academics/research/libraries/
>
> The content returned simply keeps following the few relative links and
> the URI keeps building. It basically the same problem as sessionIDs but
> not something to clealy regex out.
>
> Anyone see this before? Thoughts?
>
> -j