You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Vanderdray, Jake" <JV...@aarp.org> on 2005/09/07 20:44:15 UTC

Recrawling

	I want to apologize in advance for this very basic question, but
my searches aren't turning up the answer so far.  I've successfully run
a crawl and I can search the results.  I'd like to update my index by
re-crawling my site, but when I try to use the same command I used the
first time I get an error saying that the index already exists.

	What is the correct method for re-crawling a site?  I'd be happy
to add the answer back into the nutch site or wiki if I can.

Thanks,
Jake.

Re: Recrawling

Posted by Jack Tang <hi...@gmail.com>.
Hi Jake

Basic, but pretty hard issue. 
Now, we re-crawling website by running "crawl" command, and put index
into temp dir. I think the core issue is how to swap index on the fly.
Some index maybe are referenced by NutchBean. Should we shutdown it?

Mapreduce will solve the problem? I mean can we switch the data node dynamic?

Regards
/Jack

On 9/8/05, Vanderdray, Jake <JV...@aarp.org> wrote:
>         I want to apologize in advance for this very basic question, but
> my searches aren't turning up the answer so far.  I've successfully run
> a crawl and I can search the results.  I'd like to update my index by
> re-crawling my site, but when I try to use the same command I used the
> first time I get an error saying that the index already exists.
> 
>         What is the correct method for re-crawling a site?  I'd be happy
> to add the answer back into the nutch site or wiki if I can.
> 
> Thanks,
> Jake.
> 


-- 
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Re: Recrawling

Posted by gekkokid <me...@gekkokid.org.uk>.
would this delete documents?

Lucene.IndexReader.delete(new Term("path",SomeParticularObject.getPath()))

if this is the command to delete documents than create a crawler application 
to crawl your site returning a list of documents/urls and then run the above 
command through in a loop deleting the documents, just an idea - im not that 
knowledgeable yet on nutch/lucene, hope it helps


----- Original Message ----- 
From: "Sébastien LE CALLONNEC" <sl...@yahoo.ie>
To: <nu...@lucene.apache.org>
Sent: Wednesday, September 07, 2005 10:13 PM
Subject: RE: Recrawling


> Hi Jake,
>
>
> I presume you're using the "crawl" command: it means you have to delete
> the already existing index to crawl again...
>
> Regards,
> Sebastien
>
>
> --- "Vanderdray, Jake" <JV...@aarp.org> a écrit :
>
>> I want to apologize in advance for this very basic question, but
>> my searches aren't turning up the answer so far.  I've successfully
>> run
>> a crawl and I can search the results.  I'd like to update my index by
>> re-crawling my site, but when I try to use the same command I used
>> the
>> first time I get an error saying that the index already exists.
>>
>> What is the correct method for re-crawling a site?  I'd be happy
>> to add the answer back into the nutch site or wiki if I can.
>>
>> Thanks,
>> Jake.
>>
>
>
>
>
>
>
>
> ___________________________________________________________________________
> Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger
> Téléchargez cette version sur http://fr.messenger.yahoo.com
> 



RE: Recrawling

Posted by Sébastien LE CALLONNEC <sl...@yahoo.ie>.
Hi Jake, 


I presume you're using the "crawl" command: it means you have to delete
the already existing index to crawl again...

Regards,
Sebastien


--- "Vanderdray, Jake" <JV...@aarp.org> a écrit :

> 	I want to apologize in advance for this very basic question, but
> my searches aren't turning up the answer so far.  I've successfully
> run
> a crawl and I can search the results.  I'd like to update my index by
> re-crawling my site, but when I try to use the same command I used
> the
> first time I get an error saying that the index already exists.
> 
> 	What is the correct method for re-crawling a site?  I'd be happy
> to add the answer back into the nutch site or wiki if I can.
> 
> Thanks,
> Jake.
> 



	

	
		
___________________________________________________________________________ 
Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger 
Téléchargez cette version sur http://fr.messenger.yahoo.com