You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Gabriele Kahlout <ga...@mysimpatico.com> on 2011/06/19 06:27:11 UTC

Re: Can I custom crawl using Nutch?

Gabriele Kahlout wrote:
> 
> On Wed, May 4, 2011 at 6:22 PM, Kelvin &lt;ksxh@yahoo.com.sg&gt; wrote:
> 
>> Hi Gabriele,
>>
>> Thank you for your help. I am sorry, I am a newbie to nutch. If I crawl
>> the
>> whole wikipedia, the whole wikipedia will be stored in the crawldb ofmy
>> server?
>>
> 
> i think so (I'm also a newbie).
> 
wikipedia will get stored in the segments. Once indexed (and did all db
update stuff) you should delete them. 
Only information relating to the fetch/parse status of each link gets saved
to crawldb.  The lnk structure (in linkdb) should be maintained in linkdb.


--
View this message in context: http://lucene.472066.n3.nabble.com/Can-I-custom-crawl-using-Nutch-tp2899270p3081808.html
Sent from the Nutch - User mailing list archive at Nabble.com.