You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Edward Quick <ed...@hotmail.com> on 2005/09/23 13:12:43 UTC
page crawl limit?
Hi,
My crawl is stuck on the same page (I'm crawling a Lotus Domino Server), and
I wondered if there's anything I can configure to prevent this happening? So
far it's crawled it over 35000 times. Here's just a short extract from the
log:
050923 120412 fetching
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=7%2C4%252C5%25252C4%2525252C11
050923 120414 fetching
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=10%2C8%252C8%25252C14%2525252C6
050923 120415 fetching
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=13%2C6%252C12%25252C8%2525252C14
050923 120416 fetching
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=1%2C4%252C12%25252C4%2525252C13
050923 120417 fetching
http://planet.abc.com/general/aptrix/apteba.nsf/Content/Commercial+Chat+News+Headline?OpenDocument&ExpandSection=2%2C9%252C10%25252C2
050923 120418 fetching
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=12%2C11%252C13%25252C14%2525252C8
050923 120419 fetching
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=3%2C6%252C4%25252C9%2525252C7
050923 120420 fetching
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=7%2C14%252C2%25252C9%2525252C11
050923 120421 fetching
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=4%2C4%252C1%25252C1%2525252C11
050923 120422 fetching
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=8%2C8%252C12%25252C12%2525252C11
050923 120423 fetching
http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=3%2C9%252C13%25252C6%2525252C12
Re: page crawl limit?
Posted by EM <em...@cpuedge.com>.
Disable dynamic pages or exclude that page in your regex filter.
Edward Quick wrote:
> Hi,
>
> My crawl is stuck on the same page (I'm crawling a Lotus Domino
> Server), and I wondered if there's anything I can configure to prevent
> this happening? So far it's crawled it over 35000 times. Here's just a
> short extract from the log:
>
> 050923 120412 fetching
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=7%2C4%252C5%25252C4%2525252C11
>
> 050923 120414 fetching
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=10%2C8%252C8%25252C14%2525252C6
>
> 050923 120415 fetching
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=13%2C6%252C12%25252C8%2525252C14
>
> 050923 120416 fetching
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=1%2C4%252C12%25252C4%2525252C13
>
> 050923 120417 fetching
> http://planet.abc.com/general/aptrix/apteba.nsf/Content/Commercial+Chat+News+Headline?OpenDocument&ExpandSection=2%2C9%252C10%25252C2
>
> 050923 120418 fetching
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=12%2C11%252C13%25252C14%2525252C8
>
> 050923 120419 fetching
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=3%2C6%252C4%25252C9%2525252C7
>
> 050923 120420 fetching
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=7%2C14%252C2%25252C9%2525252C11
>
> 050923 120421 fetching
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=4%2C4%252C1%25252C1%2525252C11
>
> 050923 120422 fetching
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=8%2C8%252C12%25252C12%2525252C11
>
> 050923 120423 fetching
> http://planet.abc.com/general/aptrix/aptrix.nsf/Content/Adpt+-+Online+counselling?OpenDocument&ExpandSection=3%2C9%252C13%25252C6%2525252C12
>
>
>