You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Nicolás Lichtmaier <ni...@reloco.com.ar> on 2007/02/02 19:03:25 UTC

Re: How to limit nutch to fetch, refetch and index just the injected URLs?

>> I'd like to limit nutch to fetch, refetch and index just the injected 
>> URLs. Will setting db.max.outlinks.per.page to 0 enable me to do 
>> that? If not... how could achive what I'm looking to?
> You need to run "updatedb" with "-noAdditions" switch.

That doesn't work. And in the code, in org.apache.nutch.crawl.CrawlDb's 
main method there's absolutely no handling of any parameter.
How could I achive this?

Re: How to limit nutch to fetch, refetch and index just the injected URLs?

Posted by Nicolás Lichtmaier <ni...@reloco.com.ar>.

>> I've "backported" revision 450799 to the 0.8.x branch for supporting
>> "-noAdditions". Perhaps you could consider committing it there... (I
>> haven't tested it yet whough).
>>     
> Can you please create a JIRA issue for this and attach the patch there.
>   

Done. It's NUTCH-438 (https://issues.apache.org/jira/browse/NUTCH-438).

Re: How to limit nutch to fetch, refetch and index just the injected URLs?

Posted by Sami Siren <ss...@gmail.com>.

Nicolás Lichtmaier wrote:

> I've "backported" revision 450799 to the 0.8.x branch for supporting
> "-noAdditions". Perhaps you could consider committing it there... (I
> haven't tested it yet whough).

Can you please create a JIRA issue for this and attach the patch there.

--
 Sami Siren

Re: How to limit nutch to fetch, refetch and index just the injected URLs?

Posted by Nicolás Lichtmaier <ni...@reloco.com.ar>.

> Perhaps you should start from reporting which version you are using 
> ... The version in trunk/ certainly supports this argument. The 
> version in 0.8.1 does not support it, but it's easy to add.

I've "backported" revision 450799 to the 0.8.x branch for supporting 
"-noAdditions". Perhaps you could consider committing it there... (I 
haven't tested it yet whough).

Re: How to limit nutch to fetch, refetch and index just the injected URLs?

Posted by Nicolás Lichtmaier <ni...@reloco.com.ar>.

>>> You need to run "updatedb" with "-noAdditions" switch.
>> That doesn't work. And in the code, in 
>> org.apache.nutch.crawl.CrawlDb's main method there's absolutely no 
>> handling of any parameter.
>> How could I achive this?
> Perhaps you should start from reporting which version you are using 
> ... The version in trunk/ certainly supports this argument. The 
> version in 0.8.1 does not support it, but it's easy to add.

Uhm... well... I was using the latest released version. Should I use 
trunk? Is it ok for production use?

Re: How to limit nutch to fetch, refetch and index just the injected URLs?

Posted by Andrzej Bialecki <ab...@getopt.org>.

Nicolás Lichtmaier wrote:
>
>>> I'd like to limit nutch to fetch, refetch and index just the 
>>> injected URLs. Will setting db.max.outlinks.per.page to 0 enable me 
>>> to do that? If not... how could achive what I'm looking to?
>> You need to run "updatedb" with "-noAdditions" switch.
>
> That doesn't work. And in the code, in 
> org.apache.nutch.crawl.CrawlDb's main method there's absolutely no 
> handling of any parameter.
> How could I achive this?

Perhaps you should start from reporting which version you are using ... 
The version in trunk/ certainly supports this argument. The version in 
0.8.1 does not support it, but it's easy to add.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com