You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by openxu <op...@gmail.com> on 2007/04/23 08:06:03 UTC
Why Nutch returns 0 results?
Hi,all!
I am a newbie to Nutch.
After installing Tomcat and Nutch,I crawled several websites. Nutch did not
report any error.
However, when I search some words, Nutch returned 0 results. I tried sevral
words, none of them had results.
Will you give me some hints?
Thanks in advance!
--
View this message in context: http://www.nabble.com/Why-Nutch-returns-0-results--tf3629370.html#a10134346
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Why Nutch returns 0 results?
Posted by openxu <op...@gmail.com>.
openxu wrote:
>
> Hi,all!
> I am a newbie to Nutch.
> After installing Tomcat and Nutch,I crawled several websites. Nutch did
> not report any error.
> However, when I search some words, Nutch returned 0 results. I tried
> sevral words, none of them had results.
> Will you give me some hints?
> Thanks in advance!
>
The version of nutch is nutch-0.9.Tomcat is apache-tomcat-5.5.23. Linux is
Fedora core 5.
--
View this message in context: http://www.nabble.com/Why-Nutch-returns-0-results--tf3629370.html#a10135046
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Why Nutch returns 0 results?
Posted by openxu <op...@gmail.com>.
The most common reason for this is not setting an agent name in the
configuration and therefore no results are fetched. Another possibility
is not setting the searcher.dir configuration directive correctly.
Dennis Kubes
----------------------------------------------------------
Thanks Dennis Kubes.
During craweling, It seems nutch crawels successfully and many datum are
added into
the crawl directory.
Here is my /webapps/root/web-inf/classes/nutch-site.xml which sets search
directory:
////////////// /webapps/root/web-inf/classes/nutch-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>searcher.dir</name>
<value>/mnt/hdb7/search/nutch-0.9/nutch-0.9/bin/crawl</value>
</property>
</configuration>
//////////////////////////////////////////////////////////
Below is my /nutch-0.9/nutch-0.9/conf/nutch-site.xml
////////// /nutch-0.9/nutch-0.9/conf/nutch-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>http.agent.name</name>
<value>nutch</value>
<description>HTTP 'User-Agent' request header. MUST NOT be empty -
please set this to a single word uniquely related to your organization.
NOTE: You should also check other related properties:
http.robots.agents
http.agent.description
http.agent.url
http.agent.email
http.agent.version
and set their values appropriately.
</description>
</property>
<property>
<name>http.agent.description</name>
<value>hello</value>
<description>Further description of our bot- this text is used in
the User-Agent header. It appears in parenthesis after the agent name.
</description>
</property>
<property>
<name>http.agent.url</name>
<value>http://hello.com</value>
<description>A URL to advertise in the User-Agent header. This will
appear in parenthesis after the agent name. Custom dictates that this
should be a URL of a page explaining the purpose and behavior of this
crawler.
</description>
</property>
<property>
<name>http.agent.email</name>
<value>science@gmail.com</value>
<description>An email address to advertise in the HTTP 'From' request
header and User-Agent header. A good practice is to mangle this
address (e.g. 'info at example dot com') to avoid spamming.
</description>
</property>
</configuration>
//////////////////////////////////////////////////////////
Still returns 0.
--
View this message in context: http://www.nabble.com/Why-Nutch-returns-0-results--tf3629370.html#a10139011
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Why Nutch returns 0 results?
Posted by Dennis Kubes <nu...@dragonflymc.com>.
The most common reason for this is not setting an agent name in the
configuration and therefore no results are fetched. Another possibility
is not setting the searcher.dir configuration directive correctly.
Dennis Kubes
openxu wrote:
> Hi,all!
> I am a newbie to Nutch.
> After installing Tomcat and Nutch,I crawled several websites. Nutch did not
> report any error.
> However, when I search some words, Nutch returned 0 results. I tried sevral
> words, none of them had results.
> Will you give me some hints?
> Thanks in advance!
Re: Why Nutch returns 0 results?
Posted by karthik085 <ka...@gmail.com>.
Look at stats of crawled db. That will show you about the information it
fetched.
Sometimes, in nutch - I type word(s) that is in a document and it does not
show up. Try searching by url.
Say, www.sitea.com - search sitea - see if you get any results.
or you can do domain search too like 'site:sitea.com'
I don't have a solution to this problem. But, atleast know what pages were
fetched.
openxu wrote:
>
> Hi,all!
> I am a newbie to Nutch.
> After installing Tomcat and Nutch,I crawled several websites. Nutch did
> not report any error.
> However, when I search some words, Nutch returned 0 results. I tried
> sevral words, none of them had results.
> Will you give me some hints?
> Thanks in advance!
>
--
View this message in context: http://www.nabble.com/Why-Nutch-returns-0-results--tf3629370.html#a10192652
Sent from the Nutch - User mailing list archive at Nabble.com.