You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Richard Salz <rs...@us.ibm.com> on 2007/08/10 18:44:50 UTC
Best way to index local files intended for http access
I want to run nutch on a set of local files that will be available through
HTTP running on the same machine.
I'd rather avoid the overhead of fetching the files to index them, and
then keeping a local cached copy.
What's the best way to do this? Failing that, pointers into the source
code appreciated. :)
/r$
--
STSM, Senior Security Architect
DataPower SOA Appliances
http://www.ibm.com/software/integration/datapower/
Re: Best way to index local files intended for http access
Posted by Fabian López <fa...@syameses.com>.
Hi everyone,
I am using Nutch since few time. I have followed all the steps for using
Nutch, but when I execute:
rm -rf ~/local/tomcat/webapps/ROOT*
cp nutch*.war ~/local/tomcat/webapps/ROOT.war
And start tomcat again, I find this error in web server:
HTTP STATUS 500 - NO CONTEXT CONFIGURED TO PROCESS THIS REQUEST
And Tomcat worked well before. Root.war has been compressed on its folder,
in webapps folder. I am using Ubuntu. The thing is that I have other folders
in webapp folder, as /examples and tomcat works perfectly. Why ROOT FOLDER
doesn't work? What do I have to change to make it work?? I have been told to
change the server.xml Context tag, but when I did it Tomcat couldn't start
Thanks for your help
Fabian.
Re: Best way to index local files intended for http access
Posted by Richard Salz <rs...@us.ibm.com>.
> <name>fetcher.store.content</name>
Thanks. Between that and the quick of URL-rewrite in search.jsp I'm all
set ...
/r$
--
STSM, Senior Security Architect
DataPower SOA Appliances
http://www.ibm.com/software/integration/datapower/
Re: Best way to index local files intended for http access
Posted by qi wu <ch...@gmail.com>.
Hi Richard,
Just took a quick hack in the protocol-file plug-in. The protocol-file will return a "ProtocolOutput" object , where the file content is kept,and by default, this content will be kept in the directory "content" in the segment eventually.
You can try to set the property to false..
<property>
<name>fetcher.store.content</name>
<value>true</value>
<description>If true, fetcher will store content.</description>
</property>
Thanks
-Qi
----- Original Message -----
From: "Richard Salz" <rs...@us.ibm.com>
To: <nu...@lucene.apache.org>
Cc: <nu...@lucene.apache.org>
Sent: Saturday, August 11, 2007 11:25 PM
Subject: Re: Best way to index local files intended for http access
>> try to take a look at the source code of plugin "protocol-file"....
>
> And do what, put in a Location header with the URL I want?
>
> For now, I did the hack of editing search.jsp
>
> /r$
>
> --
> STSM, Senior Security Architect
> DataPower SOA Appliances
> http://www.ibm.com/software/integration/datapower/
>
Re: Best way to index local files intended for http access
Posted by Richard Salz <rs...@us.ibm.com>.
> try to take a look at the source code of plugin "protocol-file"....
And do what, put in a Location header with the URL I want?
For now, I did the hack of editing search.jsp
/r$
--
STSM, Senior Security Architect
DataPower SOA Appliances
http://www.ibm.com/software/integration/datapower/
Re: Best way to index local files intended for http access
Posted by qi wu <ch...@gmail.com>.
wow.... I just quit IBM a few months ago..
try to take a look at the source code of plugin "protocol-file"....
----- Original Message -----
From: "Richard Salz" <rs...@us.ibm.com>
To: <nu...@lucene.apache.org>
Sent: Saturday, August 11, 2007 12:44 AM
Subject: Best way to index local files intended for http access
>I want to run nutch on a set of local files that will be available through
> HTTP running on the same machine.
> I'd rather avoid the overhead of fetching the files to index them, and
> then keeping a local cached copy.
>
> What's the best way to do this? Failing that, pointers into the source
> code appreciated. :)
>
> /r$
>
> --
> STSM, Senior Security Architect
> DataPower SOA Appliances
> http://www.ibm.com/software/integration/datapower/
>