You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by "D.Saravanaraj" <sa...@gmail.com> on 2006/03/06 19:58:54 UTC

help needed - adaptive refetch

hi,

after applying adaptive refetch patch to nutch mapred, for the first time i
called the crawl command as i have to initialize the crawldb...
the next time, i comment out the following lines in
org.apache.nutch.crawl.Crawl.java

if (fs.exists(dir)) {
         throw new RuntimeException(dir + " already exists.");
}

and

new Injector(job).inject(crawlDb, rootUrlDir);

But i find, the files are fetched even though they were nt modified. how to
use the same crawldb and using the same for further crawls in mapred
versions?


thanks
D.Saravanaraj

Re: help needed - adaptive refetch

Posted by Andrzej Bialecki <ab...@getopt.org>.

D.Saravanaraj wrote:
> hi,
>
> after applying adaptive refetch patch to nutch mapred, for the first time i
> called the crawl command as i have to initialize the crawldb...
> the next time, i comment out the following lines in
> org.apache.nutch.crawl.Crawl.java
>
> if (fs.exists(dir)) {
>          throw new RuntimeException(dir + " already exists.");
> }
>
> and
>
> new Injector(job).inject(crawlDb, rootUrlDir);
>
> But i find, the files are fetched even though they were nt modified. how to
> use the same crawldb and using the same for further crawls in mapred
> versions?
>   

Are you using default settings? Are you sure the files are really 
fetched in full, or just their headers are fetched? I would need more 
information...

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com