You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nicolás Lichtmaier <ni...@reloco.com.ar> on 2007/02/02 19:03:25 UTC
Re: How to limit nutch to fetch, refetch and index just the injected
URLs?
>> I'd like to limit nutch to fetch, refetch and index just the injected
>> URLs. Will setting db.max.outlinks.per.page to 0 enable me to do
>> that? If not... how could achive what I'm looking to?
> You need to run "updatedb" with "-noAdditions" switch.
That doesn't work. And in the code, in org.apache.nutch.crawl.CrawlDb's
main method there's absolutely no handling of any parameter.
How could I achive this?
Re: How to limit nutch to fetch, refetch and index just the injected
URLs?
Posted by Nicolás Lichtmaier <ni...@reloco.com.ar>.
>> I've "backported" revision 450799 to the 0.8.x branch for supporting
>> "-noAdditions". Perhaps you could consider committing it there... (I
>> haven't tested it yet whough).
>>
> Can you please create a JIRA issue for this and attach the patch there.
>
Done. It's NUTCH-438 (https://issues.apache.org/jira/browse/NUTCH-438).
Re: How to limit nutch to fetch, refetch and index just the injected
URLs?
Posted by Sami Siren <ss...@gmail.com>.
Nicolás Lichtmaier wrote:
> I've "backported" revision 450799 to the 0.8.x branch for supporting
> "-noAdditions". Perhaps you could consider committing it there... (I
> haven't tested it yet whough).
Can you please create a JIRA issue for this and attach the patch there.
--
Sami Siren
Re: How to limit nutch to fetch, refetch and index just the injected
URLs?
Posted by Nicolás Lichtmaier <ni...@reloco.com.ar>.
> Perhaps you should start from reporting which version you are using
> ... The version in trunk/ certainly supports this argument. The
> version in 0.8.1 does not support it, but it's easy to add.
I've "backported" revision 450799 to the 0.8.x branch for supporting
"-noAdditions". Perhaps you could consider committing it there... (I
haven't tested it yet whough).
Re: How to limit nutch to fetch, refetch and index just the injected
URLs?
Posted by Nicolás Lichtmaier <ni...@reloco.com.ar>.
>>> You need to run "updatedb" with "-noAdditions" switch.
>> That doesn't work. And in the code, in
>> org.apache.nutch.crawl.CrawlDb's main method there's absolutely no
>> handling of any parameter.
>> How could I achive this?
> Perhaps you should start from reporting which version you are using
> ... The version in trunk/ certainly supports this argument. The
> version in 0.8.1 does not support it, but it's easy to add.
Uhm... well... I was using the latest released version. Should I use
trunk? Is it ok for production use?
Re: How to limit nutch to fetch, refetch and index just the injected
URLs?
Posted by Andrzej Bialecki <ab...@getopt.org>.
Nicolás Lichtmaier wrote:
>
>>> I'd like to limit nutch to fetch, refetch and index just the
>>> injected URLs. Will setting db.max.outlinks.per.page to 0 enable me
>>> to do that? If not... how could achive what I'm looking to?
>> You need to run "updatedb" with "-noAdditions" switch.
>
> That doesn't work. And in the code, in
> org.apache.nutch.crawl.CrawlDb's main method there's absolutely no
> handling of any parameter.
> How could I achive this?
Perhaps you should start from reporting which version you are using ...
The version in trunk/ certainly supports this argument. The version in
0.8.1 does not support it, but it's easy to add.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com