You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "D.Saravanaraj" <sa...@gmail.com> on 2006/03/06 19:58:54 UTC
help needed - adaptive refetch
hi,
after applying adaptive refetch patch to nutch mapred, for the first time i
called the crawl command as i have to initialize the crawldb...
the next time, i comment out the following lines in
org.apache.nutch.crawl.Crawl.java
if (fs.exists(dir)) {
throw new RuntimeException(dir + " already exists.");
}
and
new Injector(job).inject(crawlDb, rootUrlDir);
But i find, the files are fetched even though they were nt modified. how to
use the same crawldb and using the same for further crawls in mapred
versions?
thanks
D.Saravanaraj
Re: help needed - adaptive refetch
Posted by Andrzej Bialecki <ab...@getopt.org>.
D.Saravanaraj wrote:
> hi,
>
> after applying adaptive refetch patch to nutch mapred, for the first time i
> called the crawl command as i have to initialize the crawldb...
> the next time, i comment out the following lines in
> org.apache.nutch.crawl.Crawl.java
>
> if (fs.exists(dir)) {
> throw new RuntimeException(dir + " already exists.");
> }
>
> and
>
> new Injector(job).inject(crawlDb, rootUrlDir);
>
> But i find, the files are fetched even though they were nt modified. how to
> use the same crawldb and using the same for further crawls in mapred
> versions?
>
Are you using default settings? Are you sure the files are really
fetched in full, or just their headers are fetched? I would need more
information...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com