You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ajmal Rahman <aj...@tcs.com> on 2017/05/03 08:04:24 UTC

Nutch and SOLR - Updating DB and indexes

Hi Team,

I have nutch and solr setup and it is integrated on my website and working good.

Now I need to update the search index since there are updates to the content of my website. I have a few queries regarding this as follows:

1. Do I need to delete the contents of the crawl folder (apache-nutch-1.10/crawl/) - that is segments, linkdb and crawldb? What would happen if I run the crawl without deleting it? Is there any significance in deleting it?

2. I believe only running the crawl script (bin/crawl ..) would be enough. In this context what is the significance deleting the existing indexes from the Solr Admin (by submitting in documents : <delete><query>*:*</query></delete>). Is it actually required?

3. Also I was told to do a Reload and Optimize from the Solr Admin page (Core Admin) before running the crawl job. Is it really required? What is the significance?

Regards,

Mohammed Ajmal Rahman
Tata Consultancy Services
Mailto: ajmal.rahman@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty.	IT Services
Business Solutions
Consulting
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you