You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jesse Hires <jh...@gmail.com> on 2010/02/18 03:57:42 UTC

help trouble shooting search problems.

I just decided to start everything over with the latest version of nutch
from the trunk. So far I am able to crawl and index ok, but I am having
trouble getting results back from a search.

I get the typical "0" results found when the searchers/indexes cannot be
found, but I don't know where to look for traces of why this is happening.
I verified the path I start the searchers with is pointing to the local
directory that contains the index and segments subdirs.
The searchers on the datanodes appear to be running fine.
All of the searchers are running on the correct port.
the config files from the search config are copied to
webapps/ROOT/WEB-INF/classes
search-servers.txt looks to be correct.
I'm not showing any errors or exception in any of the logs.
the tomcat log just shows I searched and got zero results.

Any ideas where to look to find what's wrong?
Is there a command I can issue if I connect to one of the searchers via
telnet?
Does searching produce a log that has info somewhere if it cannot connect to
any of the searcher nodes?



Jesse

int GetRandomNumber()
{
   return 4; // Chosen by fair roll of dice
                // Guaranteed to be random
} // xkcd.com

How to add sitemp attribute to crawldb while fetching

Posted by Pravin Karne <pr...@persistent.co.in>.
Hi,
Sitemap.xml contains URLinfo for "updatefrequency" and "lastmodify"  .

So , while fetching the URLs, can we update crawldatum with above values.

So long run crawl will have upadated information every time. No need to re-crawl for updated links

By default this value is the 30 days(my understanding).


Waiting for your response.

Thanks
-Pravin



DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.