You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by inet-fan <mi...@mail2.co.il> on 2008/06/23 00:44:39 UTC

No search results - Nutch 0.9 on FreeBSD

Hello!

I don't get search results after crawling and need help, please.

I install in a FreeBSD Jail the Java Development Kit 1.5, tomcat-5.5,
apache-ant-1.7 and nutch-0.9
in the directories: /usr/local/diablo-jdk1.5.0 , /usr/local/tomcat5.5 and
/usr/local/nutch

i run:
setenv JAVA_HOME /usr/local/diablo-jdk1.5.0

sh bin/nutch inject crawl/crawldb urls
sh bin/nutch generate crawl/crawldb crawl/segments
sh bin/nutch fetch crawl/segments/20080622124704
sh bin/nutch updatedb crawl/crawldb crawl/segments/20080622124704

sh bin/nutch invertlinks crawl/linkdb -dir crawl/segments/20080622124704
sh bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb
crawl/segments/20080622124704

then i run:
sh bin/nutch org.apache.nutch.searcher.NutchBean wiki
Total hits: 0

I start the tomcat server from the rc.conf file of the jail.
The nutch.war file is in /usr/local/tomcat5.5/webapps/ROOT.war

What should be the value for searcher.dir?
What did I do else wrong?

the config file:
/jail/jvm/usr/local/tomcat5.5/webapps/search/WEB-INF/classes/nutch-site.xml
looks like:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
  <name>http.agent.name</name>
  <value>example.com</value>
  <description>HTTP 'User-Agent' request header. MUST NOT be empty - 
  please set this to a single word uniquely related to your organization.

  NOTE: You should also check other related properties:

  http.robots.agents
  http.agent.description
  http.agent.url
  http.agent.email
  http.agent.version

  and set their values appropriately.

  </description>
</property>

<property>
  <name>http.agent.description</name>
  <value>search agent</value>
  <description>Further description of our bot- this text is used in
  the User-Agent header.  It appears in parenthesis after the agent name.
  </description>
</property>

<property>
  <name>http://search.example.com</name>
  <value></value>
  <description>A URL to advertise in the User-Agent header.  This will 
   appear in parenthesis after the agent name. Custom dictates that this
   should be a URL of a page explaining the purpose and behavior of this
   crawler.
  </description>
</property>

<property>
  <name>info@example.com</name>
  <value></value>
  <description>An email address to advertise in the HTTP 'From' request
   header and User-Agent header. A good practice is to mangle this
   address (e.g. 'info at example dot com') to avoid spamming.
  </description>
</property>

  <property>
    <name>searcher.dir</name>
    <value>/usr/local/nutch/crawl</value>
  </property>
</configuration>

thx!
-- 
View this message in context: http://www.nabble.com/No-search-results---Nutch-0.9-on-FreeBSD-tp18060064p18060064.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: No search results - Nutch 0.9 on FreeBSD

Posted by inet-fan <mi...@mail2.co.il>.

after i restart with /usr/local/tomcat5.5/bin/catalina.sh start its works.

thx!

-- 
View this message in context: http://www.nabble.com/No-search-results---Nutch-0.9-on-FreeBSD-tp18060064p18067083.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: No search results - Nutch 0.9 on FreeBSD

Posted by inet-fan <mi...@mail2.co.il>.
Hello!

After I creat a new nutch-site.xml file searching works with: 
bin/nutch org.apache.nutch.searcher.NutchBean txt

but not on the website, what is wrong?

Hits 0-0 (out of about 0 total matching pages):

The searcher.dir looks now like:
<property>
    <name>searcher.dir</name>
    <value>/usr/local/nutch/crawl</value>
  </property>


Thanks!
miki
-- 
View this message in context: http://www.nabble.com/No-search-results---Nutch-0.9-on-FreeBSD-tp18060064p18066876.html
Sent from the Nutch - User mailing list archive at Nabble.com.