You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by mina <ta...@gmail.com> on 2011/10/24 14:09:57 UTC

recrawl sites in nutch 1.3

hi all. i have a script that re_crawl a site but this re_crawler fetch URL
only for 3 times and don't get updates of this, i want re_crawler fetch an
crawl this site every day. what property i should set in nutch- site.xml?
help me. 

--
View this message in context: http://lucene.472066.n3.nabble.com/recrawl-sites-in-nutch-1-3-tp3447896p3447896.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: recrawl sites in nutch 1.3

Posted by Markus Jelsma <ma...@openindex.io>.
Yes, or set a lower default fetch interval.


327 	<property>
328 	<name>db.fetch.interval.default</name>
329 	<value>2592000</value>
330 	<description>The default number of seconds between re-fetches of a 
page (30 days).
331 	</description>
332 	</property> 


On Monday 24 October 2011 16:13:27 lewis john mcgibbney wrote:
> Please have a look at the archives and google for 'adaptive fetch
> schedule', there is plenty of info on this topic.
> 
> Thanks
> 
> On Mon, Oct 24, 2011 at 2:09 PM, mina <ta...@gmail.com> wrote:
> > hi all. i have a script that re_crawl a site but this re_crawler fetch
> > URL only for 3 times and don't get updates of this, i want re_crawler
> > fetch an crawl this site every day. what property i should set in nutch-
> > site.xml? help me.
> > 
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/recrawl-sites-in-nutch-1-3-tp3447896p3
> > 447896.html Sent from the Nutch - User mailing list archive at
> > Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: recrawl sites in nutch 1.3

Posted by lewis john mcgibbney <le...@gmail.com>.
Please have a look at the archives and google for 'adaptive fetch schedule',
there is plenty of info on this topic.

Thanks

On Mon, Oct 24, 2011 at 2:09 PM, mina <ta...@gmail.com> wrote:

> hi all. i have a script that re_crawl a site but this re_crawler fetch URL
> only for 3 times and don't get updates of this, i want re_crawler fetch an
> crawl this site every day. what property i should set in nutch- site.xml?
> help me.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/recrawl-sites-in-nutch-1-3-tp3447896p3447896.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*