You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by openxu <op...@gmail.com> on 2007/04/23 08:06:03 UTC

Why Nutch returns 0 results?

Hi,all!
I am a newbie to Nutch.
After  installing Tomcat and Nutch,I crawled several websites. Nutch did not
report any error.
However, when I search some words, Nutch returned 0 results. I tried sevral
words, none of them had results.
Will you give me some hints?
Thanks in advance! 
-- 
View this message in context: http://www.nabble.com/Why-Nutch-returns-0-results--tf3629370.html#a10134346
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Why Nutch returns 0 results?

Posted by openxu <op...@gmail.com>.


openxu wrote:
> 
> Hi,all!
> I am a newbie to Nutch.
> After  installing Tomcat and Nutch,I crawled several websites. Nutch did
> not report any error.
> However, when I search some words, Nutch returned 0 results. I tried
> sevral words, none of them had results.
> Will you give me some hints?
> Thanks in advance! 
> 
The version of nutch is nutch-0.9.Tomcat is apache-tomcat-5.5.23. Linux is
Fedora core 5.
-- 
View this message in context: http://www.nabble.com/Why-Nutch-returns-0-results--tf3629370.html#a10135046
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Why Nutch returns 0 results?

Posted by openxu <op...@gmail.com>.
The most common reason for this is not setting an agent name in the 
configuration and therefore no results are fetched.  Another possibility 
is not setting the searcher.dir configuration directive correctly.

Dennis Kubes
----------------------------------------------------------
Thanks Dennis Kubes.
During craweling, It seems nutch crawels successfully and many datum are
added into
  the crawl directory.
Here is my /webapps/root/web-inf/classes/nutch-site.xml which sets search
directory:
////////////// /webapps/root/web-inf/classes/nutch-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>searcher.dir</name>
<value>/mnt/hdb7/search/nutch-0.9/nutch-0.9/bin/crawl</value>
</property>
</configuration>
//////////////////////////////////////////////////////////
Below is my /nutch-0.9/nutch-0.9/conf/nutch-site.xml
////////// /nutch-0.9/nutch-0.9/conf/nutch-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
  <name>http.agent.name</name>
  <value>nutch</value>
  <description>HTTP 'User-Agent' request header. MUST NOT be empty - 
  please set this to a single word uniquely related to your organization.
  NOTE: You should also check other related properties:
	http.robots.agents
	http.agent.description
	http.agent.url
	http.agent.email
	http.agent.version
  and set their values appropriately.
  </description>
</property>
<property>
  <name>http.agent.description</name>
  <value>hello</value>
  <description>Further description of our bot- this text is used in
  the User-Agent header.  It appears in parenthesis after the agent name.
  </description>
</property>

<property>
  <name>http.agent.url</name>
  <value>http://hello.com</value>
  <description>A URL to advertise in the User-Agent header.  This will 
   appear in parenthesis after the agent name. Custom dictates that this
   should be a URL of a page explaining the purpose and behavior of this
   crawler.
  </description>
</property>

<property>
  <name>http.agent.email</name>
  <value>science@gmail.com</value>
  <description>An email address to advertise in the HTTP 'From' request
   header and User-Agent header. A good practice is to mangle this
   address (e.g. 'info at example dot com') to avoid spamming.
  </description>
</property>
</configuration>
//////////////////////////////////////////////////////////
Still returns 0.
-- 
View this message in context: http://www.nabble.com/Why-Nutch-returns-0-results--tf3629370.html#a10139011
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Why Nutch returns 0 results?

Posted by Dennis Kubes <nu...@dragonflymc.com>.
The most common reason for this is not setting an agent name in the 
configuration and therefore no results are fetched.  Another possibility 
is not setting the searcher.dir configuration directive correctly.

Dennis Kubes

openxu wrote:
> Hi,all!
> I am a newbie to Nutch.
> After  installing Tomcat and Nutch,I crawled several websites. Nutch did not
> report any error.
> However, when I search some words, Nutch returned 0 results. I tried sevral
> words, none of them had results.
> Will you give me some hints?
> Thanks in advance! 

Re: Why Nutch returns 0 results?

Posted by karthik085 <ka...@gmail.com>.
Look at stats of crawled db. That will show you about the information it
fetched.

Sometimes, in nutch - I type word(s) that is in a document and it does not
show up. Try searching by url.
Say, www.sitea.com - search sitea - see if you get any results.
or you can do domain search too like 'site:sitea.com'
I don't have a solution to this problem. But, atleast know what pages were
fetched.


openxu wrote:
> 
> Hi,all!
> I am a newbie to Nutch.
> After  installing Tomcat and Nutch,I crawled several websites. Nutch did
> not report any error.
> However, when I search some words, Nutch returned 0 results. I tried
> sevral words, none of them had results.
> Will you give me some hints?
> Thanks in advance! 
> 

-- 
View this message in context: http://www.nabble.com/Why-Nutch-returns-0-results--tf3629370.html#a10192652
Sent from the Nutch - User mailing list archive at Nabble.com.