You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Edward Quick <ed...@hotmail.com> on 2005/09/23 13:12:43 UTC

page crawl limit?

Hi,

My crawl is stuck on the same page (I'm crawling a Lotus Domino Server), and 
I wondered if there's anything I can configure to prevent this happening? So 
far it's crawled it over 35000 times. Here's just a short extract from the 
log:

050923 120412 fetching 
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=7%2C4%252C5%25252C4%2525252C11
050923 120414 fetching 
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=10%2C8%252C8%25252C14%2525252C6
050923 120415 fetching 
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=13%2C6%252C12%25252C8%2525252C14
050923 120416 fetching 
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=1%2C4%252C12%25252C4%2525252C13
050923 120417 fetching 
http://planet.abc.com/general/aptrix/apteba.nsf/Content/Commercial+Chat+News+Headline?OpenDocument&ExpandSection=2%2C9%252C10%25252C2
050923 120418 fetching 
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=12%2C11%252C13%25252C14%2525252C8
050923 120419 fetching 
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=3%2C6%252C4%25252C9%2525252C7
050923 120420 fetching 
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=7%2C14%252C2%25252C9%2525252C11
050923 120421 fetching 
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=4%2C4%252C1%25252C1%2525252C11
050923 120422 fetching 
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=8%2C8%252C12%25252C12%2525252C11
050923 120423 fetching 
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=3%2C9%252C13%25252C6%2525252C12



Re: page crawl limit?

Posted by EM <em...@cpuedge.com>.
Disable dynamic pages or exclude that page in your regex filter.

Edward Quick wrote:

> Hi,
>
> My crawl is stuck on the same page (I'm crawling a Lotus Domino 
> Server), and I wondered if there's anything I can configure to prevent 
> this happening? So far it's crawled it over 35000 times. Here's just a 
> short extract from the log:
>
> 050923 120412 fetching 
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=7%2C4%252C5%25252C4%2525252C11 
>
> 050923 120414 fetching 
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=10%2C8%252C8%25252C14%2525252C6 
>
> 050923 120415 fetching 
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=13%2C6%252C12%25252C8%2525252C14 
>
> 050923 120416 fetching 
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=1%2C4%252C12%25252C4%2525252C13 
>
> 050923 120417 fetching 
> http://planet.abc.com/general/aptrix/apteba.nsf/Content/Commercial+Chat+News+Headline?OpenDocument&ExpandSection=2%2C9%252C10%25252C2 
>
> 050923 120418 fetching 
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=12%2C11%252C13%25252C14%2525252C8 
>
> 050923 120419 fetching 
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=3%2C6%252C4%25252C9%2525252C7 
>
> 050923 120420 fetching 
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=7%2C14%252C2%25252C9%2525252C11 
>
> 050923 120421 fetching 
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=4%2C4%252C1%25252C1%2525252C11 
>
> 050923 120422 fetching 
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=8%2C8%252C12%25252C12%2525252C11 
>
> 050923 120423 fetching 
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=3%2C9%252C13%25252C6%2525252C12 
>
>
>