You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Zaihan <za...@unrealasia.net> on 2009/07/13 14:57:59 UTC

Integrating Nutch frontend with Backend.

Hi All,

I've gotten nutch frontend and backend working.

But at the Nutch site I see something is missing - Where do I go to
"linking" up the frontend with the backend data (such as crawl data).

<excerpt from Nutch 0.8 tutorial>
Searching

Simplest way to verify the integrity of your crawl is to launch NutchBean
from command line:

bin/nutch org.apache.nutch.searcher.NutchBean apache

After you have verified that the above command returns results you can
proceed to setting up the web interface.

To search you need to put the nutch war file into your servlet container.
(If instead of downloading a Nutch release you checked the sources out of
SVN, then you'll first need to build the war file, with the command ant
war.)

Assuming you've unpacked Tomcat as ~/local/tomcat, then the Nutch war file
may be installed with the commands:

rm -rf ~/local/tomcat/webapps/ROOT*
cp nutch*.war ~/local/tomcat/webapps/ROOT.war

<question>
**************The webapp finds its indexes in ./crawl, relative to where you
start Tomcat, so use a command like:**********  <-- Where to put "./crawl"
data? In the root directory? ~/local/tomcat/webapps/ROOT? 
</question>

~/local/tomcat/bin/catalina.sh start

Then visit http://localhost:8080/ and have fun!


Re: Integrating Nutch frontend with Backend.

Posted by Alex McLintock <al...@gmail.com>.
Hello Zaihan,

So you have your servlet container running providing a web application
- but it doesnt know where your crawled data is....

Find the nutch-site file something like

/var/lib/tomcat6/webapps/ROOT/WEB-INF/classes/nutch-site.xml

And make sure it contains something like

<configuration>
 <property>
          <name>searcher.dir</name>
          <value>/local/data/nutch_runs/mytestrun/crawl/</value>
     </property>
</configuration>


The instructions I have seen almost all assume that the servlet can
deduce this parameter by starting Tomcat in the test run/crawl
directory. That of course is difficult when your servlet container is
started by init scripts at boot up time.

Goodluck

Alex



2009/7/13 Zaihan <za...@unrealasia.net>:
> Hi All,
>
> I've gotten nutch frontend and backend working.
>
> But at the Nutch site I see something is missing - Where do I go to
> "linking" up the frontend with the backend data (such as crawl data).