You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Gabriele Kahlout <ga...@mysimpatico.com> on 2011/06/19 06:27:11 UTC
Re: Can I custom crawl using Nutch?
Gabriele Kahlout wrote:
>
> On Wed, May 4, 2011 at 6:22 PM, Kelvin <ksxh@yahoo.com.sg> wrote:
>
>> Hi Gabriele,
>>
>> Thank you for your help. I am sorry, I am a newbie to nutch. If I crawl
>> the
>> whole wikipedia, the whole wikipedia will be stored in the crawldb ofmy
>> server?
>>
>
> i think so (I'm also a newbie).
>
wikipedia will get stored in the segments. Once indexed (and did all db
update stuff) you should delete them.
Only information relating to the fetch/parse status of each link gets saved
to crawldb. The lnk structure (in linkdb) should be maintained in linkdb.
--
View this message in context: http://lucene.472066.n3.nabble.com/Can-I-custom-crawl-using-Nutch-tp2899270p3081808.html
Sent from the Nutch - User mailing list archive at Nabble.com.