You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Konstantin Ott <ot...@netropol.de> on 2005/03/16 12:55:11 UTC
How do I recrawl
Hi,
Once crawled through a list of urls from the urls-file, i would like to
crawl again. I dont want to inject again and I hope nutch can recognise
already indexed content with the digist. So nutch should only index
modified and new files.
I notice that with every crawl there are more segment-dirs.
So what is the recommended way to recrawl the same url-file?
thanks
Konstantin Ott
Re: How do I recrawl
Posted by Stefan Groschupf <sg...@media-style.com>.
try:
http://incubator.apache.org/nutch/tutorial.html#Whole-web+Crawling
Am 16.03.2005 um 12:55 schrieb Konstantin Ott:
> Hi,
> Once crawled through a list of urls from the urls-file, i would like
> to crawl again. I dont want to inject again and I hope nutch can
> recognise already indexed content with the digist. So nutch should
> only index modified and new files.
> I notice that with every crawl there are more segment-dirs.
> So what is the recommended way to recrawl the same url-file?
>
> thanks
> Konstantin Ott
>
>
-----------information technology-------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net