You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Ramanathapuram, Rajesh" <Ra...@turner.com> on 2011/08/23 17:36:28 UTC

Nutch crawl updates ignore cans URL

Summary : Nutch crawl updates to solr ignores case-insensitive URL index key.

Let me explain, the site nutch crawls is on apache server and the URL is case sensitive. When updating the solr index  I use URL as my key and it is not getting updated with the new different case URL. When accessing the URL from web app, the links are broken.

I know I can drop and recreate the index to alleviate this issue, doing so will mess up website webmetric data collection.

I am new to nutch, and not very sure if i can do any configuration changes to update the solr index URL key?

I am open for any other suggestions.

Thanks
Rajesh Ramana