You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Chip Calhoun <cc...@aip.org> on 2011/07/13 18:50:25 UTC

Deploying the web application in Nutch 1.2

I'm a newbie trying to set up a Nutch 1.2 web app, because it seems a bit better suited to my smallish site than the Nutch 1.3 / Solr connection.  I'm going through the tutorial at http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine , and I've hit the following instruction:

Deploy the Nutch web application as the ROOT context

I'm not sure what I'm meant to do here.  I get the idea that I'm supposed to replace the current contents of $CATALINA_HOME/webapps/ROOT/ with something from my Nutch directory, but I don't know what from my Nutch directory I'm supposed to move.   Can someone please explain what I need to move?

Thanks,
Chip

RE: Deploying the web application in Nutch 1.2

Posted by Chip Calhoun <cc...@aip.org>.
Success!  I'm posting this not because I need further help, but in case someone with a similar issue finds this in the list archives.

First: I now know that if I make no changes to nutch-site.xml, Nutch will expect my crawl directory to be C:\Apache\Tomcat-5.5\crawl .  So now I know that much.

Second, for some reason when I add the searcher.dir language to nutch-site.xml it causes a "SEVERE: Error listenerStart" issue.  The obvious solution for me is to just stop editing nutch-site.xml, and live with /crawl/ being in my main Tomcat folder.   Whatever's causing this "listenerStart" issue when I play with this on my own machine may very well not come up when I put this on the production server, so I'm not going to waste any time on it.


-----Original Message-----
From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com] 
Sent: Friday, July 15, 2011 3:32 PM
To: user@nutch.apache.org
Subject: Re: Deploying the web application in Nutch 1.2

As a resource it would be wise to have a look at the list archives for an exact answer to this. Take a look at your catalina.out logs for more verbose info on where the error is.

It has been a while since I have configured this now, sorry I can't be of more help in giving a definite answer.

On Fri, Jul 15, 2011 at 8:27 PM, Chip Calhoun <cc...@aip.org> wrote:

> I'm definitely changing the file in my webapp.  I can tell I'm doing 
> that much right because it makes a noticeable change to the function 
> of my web app; unfortunately, the change is that it seems to break everything.
>
> I've tried playing with the actual value for this, but with no 
> success.  In the tutorial's example, <value>/somewhere/crawl<value>, 
> what is that relative to?  Where would that hypothetical /somewhere/ 
> directory be, relative to $CATALINA_HOME/webapps/?  It feels like this 
> is my problem, because I can't think of anything else it could be.
>
> -----Original Message-----
> From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> Sent: Friday, July 15, 2011 3:19 PM
> To: user@nutch.apache.org
> Subject: Re: Deploying the web application in Nutch 1.2
>
> Are you adding this to nutch-site within your webapp or just in your 
> root Nutch installation. This needs to be included in your webapp 
> version of nutch-site.xml. In my experience this was a small case of 
> confusion at first.
>
> On Fri, Jul 15, 2011 at 7:03 PM, Chip Calhoun <cc...@aip.org> wrote:
>
> > You've gotten me very close to a breakthrough.  I've started over, 
> > and I've found that If I don't make any edits to nutch-site.xml, I 
> > get a working Nutch web app; I have no index and all of my searches 
> > fail, but I have Nutch.  When I add my crawl location to 
> > nutch-site.xml and restart Tomcat, that's when I start getting the 
> > 404 with the "The requested resource () is not available" message.
> > Clearly I'm doing something wrong when I edit nutch-site.xml.  I'm 
> > going to paste the entire contents of my nutch-site.xml.  Where am I 
> > screwing this up?
> >
> > Thanks for your help on this.
> >
> > <?xml version="1.0"?>
> > <configuration>
> > <property>
> > <name>http.agent.name</name>
> > <value>nutch-solr-integration</value>
> > </property>
> > <property>
> > <name>generate.max.per.host</name>
> > <value>100</value>
> > </property>
> > <property>
> > <name>plugin.includes</name>
> >
> > <value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)
> > |q 
> > uery-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic
> > |u rlnormalizer-(pass|regex|basic)</value>
> > </property>
> > <property>
> > <name>searcher.dir</name>
> > <value>C:/Apache/apache-nutch-1.2/crawl<value>
> > </property>
> > </configuration>
> >
> >
> > -----Original Message-----
> > From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> > Sent: Thursday, July 14, 2011 5:38 PM
> > To: user@nutch.apache.org
> > Subject: Re: Deploying the web application in Nutch 1.2
> >
> > On Thu, Jul 14, 2011 at 8:01 PM, Chip Calhoun <cc...@aip.org> wrote:
> >
> > > Thanks Lewis.
> > >
> > > I'm still having trouble.  I've moved the war file to 
> > > $CATALINA_HOME/webapps/nutch/ and unpacked it.  I don't' seem to 
> > > have a "catalina.sh" file, so I've skipped that step.
> >
> >
> > From memory the catalina.sh file is used to start you Tomcat server 
> > instance... this has nothing to do with Nutch. Regardless of what 
> > lind of WAR files you have in your Tomcat webapps directory, 
> > starting your tomat server from the command line sould be the same...
> >
> >  And I've added the following to
> > > C:\Apache\Tomcat-5.5\webapps\nutch\WEB-INF\classes\nutch-site.xml :
> > >
> >
> > As far as a I can remember nutch-site.xml is already there, however 
> > you need to specify various property values after this has been 
> > uploaded the first time. After rebooting Tomcat all of your property 
> > setting will be running.
> >
> >
> > >
> > > <property>
> > > <name>searcher.dir</name>
> > > <value>C:\Apache\apache-nutch-1.2\crawl<value> <!-- There must be 
> > > a crawl/index directory to run off !--> </property>
> > >
> >
> > Looks fine, however please remove the <!...> as this is not required.
> >
> > >
> > > However, when I go to http://localhost:8080/nutch/ I always get a
> > > 404
> > with
> > > the message, "The requested resource () is not available."  What 
> > > am I missing?
> > >
> >
> > As I said the name of the WAR file needs to be identical to the 
> > webapp you specify in the tomcat URL... can you confirm this. There 
> > should really be no problem starting up the Nutch web app if you 
> > follow the tutorial carfeully.
> >
> >
> > > Thanks,
> > > Chip
> > >
> > > -----Original Message-----
> > > From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> > > Sent: Thursday, July 14, 2011 5:40 AM
> > > To: user@nutch.apache.org
> > > Subject: Re: Deploying the web application in Nutch 1.2
> > >
> > > Hi Chip,
> > >
> > > Please see this tutorial for 1.2 administration [1], many people 
> > > have
> > been
> > > using it recently and as far as I'm aware it is working perfectly.
> > >
> > > Please post back if you have any troubles
> > >
> > > [1] http://wiki.apache.org/nutch/NutchTutorial
> > >
> > >
> > >
> > > On Wed, Jul 13, 2011 at 5:50 PM, Chip Calhoun <cc...@aip.org>
> wrote:
> > >
> > > > I'm a newbie trying to set up a Nutch 1.2 web app, because it 
> > > > seems a bit better suited to my smallish site than the Nutch 1.3 
> > > > / Solr connection.  I'm going through the tutorial at 
> > > > http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine , 
> > > > and I've hit the following instruction:
> > > >
> > > > Deploy the Nutch web application as the ROOT context
> > > >
> > > > I'm not sure what I'm meant to do here.  I get the idea that I'm 
> > > > supposed to replace the current contents of 
> > > > $CATALINA_HOME/webapps/ROOT/ with something from my Nutch 
> > > > directory,
> > but
> > > I don't know what from my Nutch
> > > > directory I'm supposed to move.   Can someone please explain what I
> > need
> > > to
> > > > move?
> > > >
> > > > Thanks,
> > > > Chip
> > > >
> > >
> > >
> > >
> > > --
> > > *Lewis*
> > >
> >
> >
> >
> > --
> > *Lewis*
> >
>
>
>
> --
> *Lewis*
>



--
*Lewis*

Re: Deploying the web application in Nutch 1.2

Posted by lewis john mcgibbney <le...@gmail.com>.
As a resource it would be wise to have a look at the list archives for an
exact answer to this. Take a look at your catalina.out logs for more verbose
info on where the error is.

It has been a while since I have configured this now, sorry I can't be of
more help in giving a definite answer.

On Fri, Jul 15, 2011 at 8:27 PM, Chip Calhoun <cc...@aip.org> wrote:

> I'm definitely changing the file in my webapp.  I can tell I'm doing that
> much right because it makes a noticeable change to the function of my web
> app; unfortunately, the change is that it seems to break everything.
>
> I've tried playing with the actual value for this, but with no success.  In
> the tutorial's example, <value>/somewhere/crawl<value>, what is that
> relative to?  Where would that hypothetical /somewhere/ directory be,
> relative to $CATALINA_HOME/webapps/?  It feels like this is my problem,
> because I can't think of anything else it could be.
>
> -----Original Message-----
> From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> Sent: Friday, July 15, 2011 3:19 PM
> To: user@nutch.apache.org
> Subject: Re: Deploying the web application in Nutch 1.2
>
> Are you adding this to nutch-site within your webapp or just in your root
> Nutch installation. This needs to be included in your webapp version of
> nutch-site.xml. In my experience this was a small case of confusion at
> first.
>
> On Fri, Jul 15, 2011 at 7:03 PM, Chip Calhoun <cc...@aip.org> wrote:
>
> > You've gotten me very close to a breakthrough.  I've started over, and
> > I've found that If I don't make any edits to nutch-site.xml, I get a
> > working Nutch web app; I have no index and all of my searches fail,
> > but I have Nutch.  When I add my crawl location to nutch-site.xml and
> > restart Tomcat, that's when I start getting the 404 with the "The
> > requested resource () is not available" message.
> > Clearly I'm doing something wrong when I edit nutch-site.xml.  I'm
> > going to paste the entire contents of my nutch-site.xml.  Where am I
> > screwing this up?
> >
> > Thanks for your help on this.
> >
> > <?xml version="1.0"?>
> > <configuration>
> > <property>
> > <name>http.agent.name</name>
> > <value>nutch-solr-integration</value>
> > </property>
> > <property>
> > <name>generate.max.per.host</name>
> > <value>100</value>
> > </property>
> > <property>
> > <name>plugin.includes</name>
> >
> > <value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|q
> > uery-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|u
> > rlnormalizer-(pass|regex|basic)</value>
> > </property>
> > <property>
> > <name>searcher.dir</name>
> > <value>C:/Apache/apache-nutch-1.2/crawl<value>
> > </property>
> > </configuration>
> >
> >
> > -----Original Message-----
> > From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> > Sent: Thursday, July 14, 2011 5:38 PM
> > To: user@nutch.apache.org
> > Subject: Re: Deploying the web application in Nutch 1.2
> >
> > On Thu, Jul 14, 2011 at 8:01 PM, Chip Calhoun <cc...@aip.org> wrote:
> >
> > > Thanks Lewis.
> > >
> > > I'm still having trouble.  I've moved the war file to
> > > $CATALINA_HOME/webapps/nutch/ and unpacked it.  I don't' seem to
> > > have a "catalina.sh" file, so I've skipped that step.
> >
> >
> > From memory the catalina.sh file is used to start you Tomcat server
> > instance... this has nothing to do with Nutch. Regardless of what lind
> > of WAR files you have in your Tomcat webapps directory, starting your
> > tomat server from the command line sould be the same...
> >
> >  And I've added the following to
> > > C:\Apache\Tomcat-5.5\webapps\nutch\WEB-INF\classes\nutch-site.xml :
> > >
> >
> > As far as a I can remember nutch-site.xml is already there, however
> > you need to specify various property values after this has been
> > uploaded the first time. After rebooting Tomcat all of your property
> > setting will be running.
> >
> >
> > >
> > > <property>
> > > <name>searcher.dir</name>
> > > <value>C:\Apache\apache-nutch-1.2\crawl<value> <!-- There must be a
> > > crawl/index directory to run off !--> </property>
> > >
> >
> > Looks fine, however please remove the <!...> as this is not required.
> >
> > >
> > > However, when I go to http://localhost:8080/nutch/ I always get a
> > > 404
> > with
> > > the message, "The requested resource () is not available."  What am
> > > I missing?
> > >
> >
> > As I said the name of the WAR file needs to be identical to the webapp
> > you specify in the tomcat URL... can you confirm this. There should
> > really be no problem starting up the Nutch web app if you follow the
> > tutorial carfeully.
> >
> >
> > > Thanks,
> > > Chip
> > >
> > > -----Original Message-----
> > > From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> > > Sent: Thursday, July 14, 2011 5:40 AM
> > > To: user@nutch.apache.org
> > > Subject: Re: Deploying the web application in Nutch 1.2
> > >
> > > Hi Chip,
> > >
> > > Please see this tutorial for 1.2 administration [1], many people
> > > have
> > been
> > > using it recently and as far as I'm aware it is working perfectly.
> > >
> > > Please post back if you have any troubles
> > >
> > > [1] http://wiki.apache.org/nutch/NutchTutorial
> > >
> > >
> > >
> > > On Wed, Jul 13, 2011 at 5:50 PM, Chip Calhoun <cc...@aip.org>
> wrote:
> > >
> > > > I'm a newbie trying to set up a Nutch 1.2 web app, because it
> > > > seems a bit better suited to my smallish site than the Nutch 1.3 /
> > > > Solr connection.  I'm going through the tutorial at
> > > > http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine , and
> > > > I've hit the following instruction:
> > > >
> > > > Deploy the Nutch web application as the ROOT context
> > > >
> > > > I'm not sure what I'm meant to do here.  I get the idea that I'm
> > > > supposed to replace the current contents of
> > > > $CATALINA_HOME/webapps/ROOT/ with something from my Nutch
> > > > directory,
> > but
> > > I don't know what from my Nutch
> > > > directory I'm supposed to move.   Can someone please explain what I
> > need
> > > to
> > > > move?
> > > >
> > > > Thanks,
> > > > Chip
> > > >
> > >
> > >
> > >
> > > --
> > > *Lewis*
> > >
> >
> >
> >
> > --
> > *Lewis*
> >
>
>
>
> --
> *Lewis*
>



-- 
*Lewis*

RE: Deploying the web application in Nutch 1.2

Posted by Chip Calhoun <cc...@aip.org>.
I'm definitely changing the file in my webapp.  I can tell I'm doing that much right because it makes a noticeable change to the function of my web app; unfortunately, the change is that it seems to break everything.

I've tried playing with the actual value for this, but with no success.  In the tutorial's example, <value>/somewhere/crawl<value>, what is that relative to?  Where would that hypothetical /somewhere/ directory be, relative to $CATALINA_HOME/webapps/?  It feels like this is my problem, because I can't think of anything else it could be.

-----Original Message-----
From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com] 
Sent: Friday, July 15, 2011 3:19 PM
To: user@nutch.apache.org
Subject: Re: Deploying the web application in Nutch 1.2

Are you adding this to nutch-site within your webapp or just in your root Nutch installation. This needs to be included in your webapp version of nutch-site.xml. In my experience this was a small case of confusion at first.

On Fri, Jul 15, 2011 at 7:03 PM, Chip Calhoun <cc...@aip.org> wrote:

> You've gotten me very close to a breakthrough.  I've started over, and 
> I've found that If I don't make any edits to nutch-site.xml, I get a 
> working Nutch web app; I have no index and all of my searches fail, 
> but I have Nutch.  When I add my crawl location to nutch-site.xml and 
> restart Tomcat, that's when I start getting the 404 with the "The 
> requested resource () is not available" message.
> Clearly I'm doing something wrong when I edit nutch-site.xml.  I'm 
> going to paste the entire contents of my nutch-site.xml.  Where am I 
> screwing this up?
>
> Thanks for your help on this.
>
> <?xml version="1.0"?>
> <configuration>
> <property>
> <name>http.agent.name</name>
> <value>nutch-solr-integration</value>
> </property>
> <property>
> <name>generate.max.per.host</name>
> <value>100</value>
> </property>
> <property>
> <name>plugin.includes</name>
>
> <value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|q
> uery-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|u
> rlnormalizer-(pass|regex|basic)</value>
> </property>
> <property>
> <name>searcher.dir</name>
> <value>C:/Apache/apache-nutch-1.2/crawl<value>
> </property>
> </configuration>
>
>
> -----Original Message-----
> From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> Sent: Thursday, July 14, 2011 5:38 PM
> To: user@nutch.apache.org
> Subject: Re: Deploying the web application in Nutch 1.2
>
> On Thu, Jul 14, 2011 at 8:01 PM, Chip Calhoun <cc...@aip.org> wrote:
>
> > Thanks Lewis.
> >
> > I'm still having trouble.  I've moved the war file to 
> > $CATALINA_HOME/webapps/nutch/ and unpacked it.  I don't' seem to 
> > have a "catalina.sh" file, so I've skipped that step.
>
>
> From memory the catalina.sh file is used to start you Tomcat server 
> instance... this has nothing to do with Nutch. Regardless of what lind 
> of WAR files you have in your Tomcat webapps directory, starting your 
> tomat server from the command line sould be the same...
>
>  And I've added the following to
> > C:\Apache\Tomcat-5.5\webapps\nutch\WEB-INF\classes\nutch-site.xml :
> >
>
> As far as a I can remember nutch-site.xml is already there, however 
> you need to specify various property values after this has been 
> uploaded the first time. After rebooting Tomcat all of your property 
> setting will be running.
>
>
> >
> > <property>
> > <name>searcher.dir</name>
> > <value>C:\Apache\apache-nutch-1.2\crawl<value> <!-- There must be a 
> > crawl/index directory to run off !--> </property>
> >
>
> Looks fine, however please remove the <!...> as this is not required.
>
> >
> > However, when I go to http://localhost:8080/nutch/ I always get a 
> > 404
> with
> > the message, "The requested resource () is not available."  What am 
> > I missing?
> >
>
> As I said the name of the WAR file needs to be identical to the webapp 
> you specify in the tomcat URL... can you confirm this. There should 
> really be no problem starting up the Nutch web app if you follow the 
> tutorial carfeully.
>
>
> > Thanks,
> > Chip
> >
> > -----Original Message-----
> > From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> > Sent: Thursday, July 14, 2011 5:40 AM
> > To: user@nutch.apache.org
> > Subject: Re: Deploying the web application in Nutch 1.2
> >
> > Hi Chip,
> >
> > Please see this tutorial for 1.2 administration [1], many people 
> > have
> been
> > using it recently and as far as I'm aware it is working perfectly.
> >
> > Please post back if you have any troubles
> >
> > [1] http://wiki.apache.org/nutch/NutchTutorial
> >
> >
> >
> > On Wed, Jul 13, 2011 at 5:50 PM, Chip Calhoun <cc...@aip.org> wrote:
> >
> > > I'm a newbie trying to set up a Nutch 1.2 web app, because it 
> > > seems a bit better suited to my smallish site than the Nutch 1.3 / 
> > > Solr connection.  I'm going through the tutorial at 
> > > http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine , and 
> > > I've hit the following instruction:
> > >
> > > Deploy the Nutch web application as the ROOT context
> > >
> > > I'm not sure what I'm meant to do here.  I get the idea that I'm 
> > > supposed to replace the current contents of 
> > > $CATALINA_HOME/webapps/ROOT/ with something from my Nutch 
> > > directory,
> but
> > I don't know what from my Nutch
> > > directory I'm supposed to move.   Can someone please explain what I
> need
> > to
> > > move?
> > >
> > > Thanks,
> > > Chip
> > >
> >
> >
> >
> > --
> > *Lewis*
> >
>
>
>
> --
> *Lewis*
>



--
*Lewis*

Re: Deploying the web application in Nutch 1.2

Posted by lewis john mcgibbney <le...@gmail.com>.
Are you adding this to nutch-site within your webapp or just in your root
Nutch installation. This needs to be included in your webapp version of
nutch-site.xml. In my experience this was a small case of confusion at
first.

On Fri, Jul 15, 2011 at 7:03 PM, Chip Calhoun <cc...@aip.org> wrote:

> You've gotten me very close to a breakthrough.  I've started over, and I've
> found that If I don't make any edits to nutch-site.xml, I get a working
> Nutch web app; I have no index and all of my searches fail, but I have
> Nutch.  When I add my crawl location to nutch-site.xml and restart Tomcat,
> that's when I start getting the 404 with the "The requested resource () is
> not available" message.
> Clearly I'm doing something wrong when I edit nutch-site.xml.  I'm going to
> paste the entire contents of my nutch-site.xml.  Where am I screwing this
> up?
>
> Thanks for your help on this.
>
> <?xml version="1.0"?>
> <configuration>
> <property>
> <name>http.agent.name</name>
> <value>nutch-solr-integration</value>
> </property>
> <property>
> <name>generate.max.per.host</name>
> <value>100</value>
> </property>
> <property>
> <name>plugin.includes</name>
>
> <value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
> </property>
> <property>
> <name>searcher.dir</name>
> <value>C:/Apache/apache-nutch-1.2/crawl<value>
> </property>
> </configuration>
>
>
> -----Original Message-----
> From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> Sent: Thursday, July 14, 2011 5:38 PM
> To: user@nutch.apache.org
> Subject: Re: Deploying the web application in Nutch 1.2
>
> On Thu, Jul 14, 2011 at 8:01 PM, Chip Calhoun <cc...@aip.org> wrote:
>
> > Thanks Lewis.
> >
> > I'm still having trouble.  I've moved the war file to
> > $CATALINA_HOME/webapps/nutch/ and unpacked it.  I don't' seem to have
> > a "catalina.sh" file, so I've skipped that step.
>
>
> From memory the catalina.sh file is used to start you Tomcat server
> instance... this has nothing to do with Nutch. Regardless of what lind of
> WAR files you have in your Tomcat webapps directory, starting your tomat
> server from the command line sould be the same...
>
>  And I've added the following to
> > C:\Apache\Tomcat-5.5\webapps\nutch\WEB-INF\classes\nutch-site.xml :
> >
>
> As far as a I can remember nutch-site.xml is already there, however you
> need to specify various property values after this has been uploaded the
> first time. After rebooting Tomcat all of your property setting will be
> running.
>
>
> >
> > <property>
> > <name>searcher.dir</name>
> > <value>C:\Apache\apache-nutch-1.2\crawl<value> <!-- There must be a
> > crawl/index directory to run off !--> </property>
> >
>
> Looks fine, however please remove the <!...> as this is not required.
>
> >
> > However, when I go to http://localhost:8080/nutch/ I always get a 404
> with
> > the message, "The requested resource () is not available."  What am I
> > missing?
> >
>
> As I said the name of the WAR file needs to be identical to the webapp you
> specify in the tomcat URL... can you confirm this. There should really be
> no
> problem starting up the Nutch web app if you follow the tutorial carfeully.
>
>
> > Thanks,
> > Chip
> >
> > -----Original Message-----
> > From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> > Sent: Thursday, July 14, 2011 5:40 AM
> > To: user@nutch.apache.org
> > Subject: Re: Deploying the web application in Nutch 1.2
> >
> > Hi Chip,
> >
> > Please see this tutorial for 1.2 administration [1], many people have
> been
> > using it recently and as far as I'm aware it is working perfectly.
> >
> > Please post back if you have any troubles
> >
> > [1] http://wiki.apache.org/nutch/NutchTutorial
> >
> >
> >
> > On Wed, Jul 13, 2011 at 5:50 PM, Chip Calhoun <cc...@aip.org> wrote:
> >
> > > I'm a newbie trying to set up a Nutch 1.2 web app, because it seems a
> > > bit better suited to my smallish site than the Nutch 1.3 / Solr
> > > connection.  I'm going through the tutorial at
> > > http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine , and I've
> > > hit the following instruction:
> > >
> > > Deploy the Nutch web application as the ROOT context
> > >
> > > I'm not sure what I'm meant to do here.  I get the idea that I'm
> > > supposed to replace the current contents of
> > > $CATALINA_HOME/webapps/ROOT/ with something from my Nutch directory,
> but
> > I don't know what from my Nutch
> > > directory I'm supposed to move.   Can someone please explain what I
> need
> > to
> > > move?
> > >
> > > Thanks,
> > > Chip
> > >
> >
> >
> >
> > --
> > *Lewis*
> >
>
>
>
> --
> *Lewis*
>



-- 
*Lewis*

RE: Deploying the web application in Nutch 1.2

Posted by Chip Calhoun <cc...@aip.org>.
You've gotten me very close to a breakthrough.  I've started over, and I've found that If I don't make any edits to nutch-site.xml, I get a working Nutch web app; I have no index and all of my searches fail, but I have Nutch.  When I add my crawl location to nutch-site.xml and restart Tomcat, that's when I start getting the 404 with the "The requested resource () is not available" message.
Clearly I'm doing something wrong when I edit nutch-site.xml.  I'm going to paste the entire contents of my nutch-site.xml.  Where am I screwing this up?

Thanks for your help on this.

<?xml version="1.0"?>
<configuration>
<property>
<name>http.agent.name</name>
<value>nutch-solr-integration</value>
</property>
<property>
<name>generate.max.per.host</name>
<value>100</value>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>
<property>
<name>searcher.dir</name>
<value>C:/Apache/apache-nutch-1.2/crawl<value>
</property>
</configuration>


-----Original Message-----
From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com] 
Sent: Thursday, July 14, 2011 5:38 PM
To: user@nutch.apache.org
Subject: Re: Deploying the web application in Nutch 1.2

On Thu, Jul 14, 2011 at 8:01 PM, Chip Calhoun <cc...@aip.org> wrote:

> Thanks Lewis.
>
> I'm still having trouble.  I've moved the war file to 
> $CATALINA_HOME/webapps/nutch/ and unpacked it.  I don't' seem to have 
> a "catalina.sh" file, so I've skipped that step.


>From memory the catalina.sh file is used to start you Tomcat server instance... this has nothing to do with Nutch. Regardless of what lind of WAR files you have in your Tomcat webapps directory, starting your tomat server from the command line sould be the same...

 And I've added the following to
> C:\Apache\Tomcat-5.5\webapps\nutch\WEB-INF\classes\nutch-site.xml :
>

As far as a I can remember nutch-site.xml is already there, however you need to specify various property values after this has been uploaded the first time. After rebooting Tomcat all of your property setting will be running.


>
> <property>
> <name>searcher.dir</name>
> <value>C:\Apache\apache-nutch-1.2\crawl<value> <!-- There must be a 
> crawl/index directory to run off !--> </property>
>

Looks fine, however please remove the <!...> as this is not required.

>
> However, when I go to http://localhost:8080/nutch/ I always get a 404 with
> the message, "The requested resource () is not available."  What am I
> missing?
>

As I said the name of the WAR file needs to be identical to the webapp you
specify in the tomcat URL... can you confirm this. There should really be no
problem starting up the Nutch web app if you follow the tutorial carfeully.


> Thanks,
> Chip
>
> -----Original Message-----
> From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> Sent: Thursday, July 14, 2011 5:40 AM
> To: user@nutch.apache.org
> Subject: Re: Deploying the web application in Nutch 1.2
>
> Hi Chip,
>
> Please see this tutorial for 1.2 administration [1], many people have been
> using it recently and as far as I'm aware it is working perfectly.
>
> Please post back if you have any troubles
>
> [1] http://wiki.apache.org/nutch/NutchTutorial
>
>
>
> On Wed, Jul 13, 2011 at 5:50 PM, Chip Calhoun <cc...@aip.org> wrote:
>
> > I'm a newbie trying to set up a Nutch 1.2 web app, because it seems a
> > bit better suited to my smallish site than the Nutch 1.3 / Solr
> > connection.  I'm going through the tutorial at
> > http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine , and I've
> > hit the following instruction:
> >
> > Deploy the Nutch web application as the ROOT context
> >
> > I'm not sure what I'm meant to do here.  I get the idea that I'm
> > supposed to replace the current contents of
> > $CATALINA_HOME/webapps/ROOT/ with something from my Nutch directory, but
> I don't know what from my Nutch
> > directory I'm supposed to move.   Can someone please explain what I need
> to
> > move?
> >
> > Thanks,
> > Chip
> >
>
>
>
> --
> *Lewis*
>



-- 
*Lewis*

Re: Deploying the web application in Nutch 1.2

Posted by lewis john mcgibbney <le...@gmail.com>.
On Thu, Jul 14, 2011 at 8:01 PM, Chip Calhoun <cc...@aip.org> wrote:

> Thanks Lewis.
>
> I'm still having trouble.  I've moved the war file to
> $CATALINA_HOME/webapps/nutch/ and unpacked it.  I don't' seem to have a
> "catalina.sh" file, so I've skipped that step.


>From memory the catalina.sh file is used to start you Tomcat server
instance... this has nothing to do with Nutch. Regardless of what lind of
WAR files you have in your Tomcat webapps directory, starting your tomat
server from the command line sould be the same...

 And I've added the following to
> C:\Apache\Tomcat-5.5\webapps\nutch\WEB-INF\classes\nutch-site.xml :
>

As far as a I can remember nutch-site.xml is already there, however you need
to specify various property values after this has been uploaded the first
time. After rebooting Tomcat all of your property setting will be running.


>
> <property>
> <name>searcher.dir</name>
> <value>C:\Apache\apache-nutch-1.2\crawl<value> <!-- There must be a
> crawl/index directory to run off !-->
> </property>
>

Looks fine, however please remove the <!...> as this is not required.

>
> However, when I go to http://localhost:8080/nutch/ I always get a 404 with
> the message, "The requested resource () is not available."  What am I
> missing?
>

As I said the name of the WAR file needs to be identical to the webapp you
specify in the tomcat URL... can you confirm this. There should really be no
problem starting up the Nutch web app if you follow the tutorial carfeully.


> Thanks,
> Chip
>
> -----Original Message-----
> From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> Sent: Thursday, July 14, 2011 5:40 AM
> To: user@nutch.apache.org
> Subject: Re: Deploying the web application in Nutch 1.2
>
> Hi Chip,
>
> Please see this tutorial for 1.2 administration [1], many people have been
> using it recently and as far as I'm aware it is working perfectly.
>
> Please post back if you have any troubles
>
> [1] http://wiki.apache.org/nutch/NutchTutorial
>
>
>
> On Wed, Jul 13, 2011 at 5:50 PM, Chip Calhoun <cc...@aip.org> wrote:
>
> > I'm a newbie trying to set up a Nutch 1.2 web app, because it seems a
> > bit better suited to my smallish site than the Nutch 1.3 / Solr
> > connection.  I'm going through the tutorial at
> > http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine , and I've
> > hit the following instruction:
> >
> > Deploy the Nutch web application as the ROOT context
> >
> > I'm not sure what I'm meant to do here.  I get the idea that I'm
> > supposed to replace the current contents of
> > $CATALINA_HOME/webapps/ROOT/ with something from my Nutch directory, but
> I don't know what from my Nutch
> > directory I'm supposed to move.   Can someone please explain what I need
> to
> > move?
> >
> > Thanks,
> > Chip
> >
>
>
>
> --
> *Lewis*
>



-- 
*Lewis*

RE: Deploying the web application in Nutch 1.2

Posted by Chip Calhoun <cc...@aip.org>.
Thanks Lewis.

I'm still having trouble.  I've moved the war file to $CATALINA_HOME/webapps/nutch/ and unpacked it.  I don't' seem to have a "catalina.sh" file, so I've skipped that step.  And I've added the following to C:\Apache\Tomcat-5.5\webapps\nutch\WEB-INF\classes\nutch-site.xml : 

<property>
<name>searcher.dir</name>
<value>C:\Apache\apache-nutch-1.2\crawl<value> <!-- There must be a crawl/index directory to run off !-->
</property>

However, when I go to http://localhost:8080/nutch/ I always get a 404 with the message, "The requested resource () is not available."  What am I missing?

Thanks,
Chip

-----Original Message-----
From: lewis john mcgibbney [mailto:lewis.mcgibbney@gmail.com] 
Sent: Thursday, July 14, 2011 5:40 AM
To: user@nutch.apache.org
Subject: Re: Deploying the web application in Nutch 1.2

Hi Chip,

Please see this tutorial for 1.2 administration [1], many people have been using it recently and as far as I'm aware it is working perfectly.

Please post back if you have any troubles

[1] http://wiki.apache.org/nutch/NutchTutorial



On Wed, Jul 13, 2011 at 5:50 PM, Chip Calhoun <cc...@aip.org> wrote:

> I'm a newbie trying to set up a Nutch 1.2 web app, because it seems a 
> bit better suited to my smallish site than the Nutch 1.3 / Solr 
> connection.  I'm going through the tutorial at 
> http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine , and I've 
> hit the following instruction:
>
> Deploy the Nutch web application as the ROOT context
>
> I'm not sure what I'm meant to do here.  I get the idea that I'm 
> supposed to replace the current contents of 
> $CATALINA_HOME/webapps/ROOT/ with something from my Nutch directory, but I don't know what from my Nutch
> directory I'm supposed to move.   Can someone please explain what I need to
> move?
>
> Thanks,
> Chip
>



--
*Lewis*

Re: Deploying the web application in Nutch 1.2

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi Chip,

Please see this tutorial for 1.2 administration [1], many people have been
using it recently and as far as I'm aware it is working perfectly.

Please post back if you have any troubles

[1] http://wiki.apache.org/nutch/NutchTutorial



On Wed, Jul 13, 2011 at 5:50 PM, Chip Calhoun <cc...@aip.org> wrote:

> I'm a newbie trying to set up a Nutch 1.2 web app, because it seems a bit
> better suited to my smallish site than the Nutch 1.3 / Solr connection.  I'm
> going through the tutorial at
> http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine , and I've hit
> the following instruction:
>
> Deploy the Nutch web application as the ROOT context
>
> I'm not sure what I'm meant to do here.  I get the idea that I'm supposed
> to replace the current contents of $CATALINA_HOME/webapps/ROOT/ with
> something from my Nutch directory, but I don't know what from my Nutch
> directory I'm supposed to move.   Can someone please explain what I need to
> move?
>
> Thanks,
> Chip
>



-- 
*Lewis*