You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Richard Salz <rs...@us.ibm.com> on 2007/08/10 18:44:50 UTC

Best way to index local files intended for http access

I want to run nutch on a set of local files that will be available through 
HTTP running on the same machine.
I'd rather avoid the overhead of fetching the files to index them, and 
then keeping a local cached copy.

What's the best way to do this?  Failing that, pointers into the source 
code appreciated. :)

        /r$

--
STSM, Senior Security Architect
DataPower SOA Appliances
http://www.ibm.com/software/integration/datapower/


Re: Best way to index local files intended for http access

Posted by Fabian López <fa...@syameses.com>.
Hi everyone,
I am using Nutch since few time. I have followed all the steps for using
Nutch, but when I execute:

rm -rf ~/local/tomcat/webapps/ROOT*
cp nutch*.war ~/local/tomcat/webapps/ROOT.war

And start tomcat again, I find this error in web server:


HTTP STATUS 500 - NO CONTEXT CONFIGURED TO PROCESS THIS REQUEST

And Tomcat worked well before. Root.war has been compressed on its folder,
in webapps folder. I am using Ubuntu. The thing is that I have other folders
in webapp folder, as /examples and tomcat works perfectly. Why ROOT FOLDER
doesn't work? What do I have to change to make it work?? I have been told to
change the server.xml Context tag, but when I did it Tomcat couldn't start
Thanks for your help

Fabian.

Re: Best way to index local files intended for http access

Posted by Richard Salz <rs...@us.ibm.com>.
>   <name>fetcher.store.content</name>

Thanks.  Between that and the quick of URL-rewrite in search.jsp I'm all 
set ...

        /r$

--
STSM, Senior Security Architect
DataPower SOA Appliances
http://www.ibm.com/software/integration/datapower/


Re: Best way to index local files intended for http access

Posted by qi wu <ch...@gmail.com>.
Hi Richard,
Just took  a quick hack in the protocol-file plug-in. The protocol-file will return a "ProtocolOutput" object , where the file content is kept,and by default, this content will be kept in the directory "content" in the segment eventually.
You can try to set the property to false.. 
<property>
  <name>fetcher.store.content</name>
  <value>true</value>
  <description>If true, fetcher will store content.</description>
</property>
Thanks
-Qi

----- Original Message ----- 
From: "Richard Salz" <rs...@us.ibm.com>
To: <nu...@lucene.apache.org>
Cc: <nu...@lucene.apache.org>
Sent: Saturday, August 11, 2007 11:25 PM
Subject: Re: Best way to index local files intended for http access


>> try to take a look at the source code  of plugin "protocol-file"....
> 
> And do what, put in a Location header with the URL I want?
> 
> For now, I did the hack of editing search.jsp
> 
>        /r$
> 
> --
> STSM, Senior Security Architect
> DataPower SOA Appliances
> http://www.ibm.com/software/integration/datapower/
>

Re: Best way to index local files intended for http access

Posted by Richard Salz <rs...@us.ibm.com>.
> try to take a look at the source code  of plugin "protocol-file"....

And do what, put in a Location header with the URL I want?

For now, I did the hack of editing search.jsp

        /r$

--
STSM, Senior Security Architect
DataPower SOA Appliances
http://www.ibm.com/software/integration/datapower/


Re: Best way to index local files intended for http access

Posted by qi wu <ch...@gmail.com>.
wow....  I just quit IBM a few months ago..
try to take a look at the source code  of plugin "protocol-file"....

----- Original Message ----- 
From: "Richard Salz" <rs...@us.ibm.com>
To: <nu...@lucene.apache.org>
Sent: Saturday, August 11, 2007 12:44 AM
Subject: Best way to index local files intended for http access


>I want to run nutch on a set of local files that will be available through 
> HTTP running on the same machine.
> I'd rather avoid the overhead of fetching the files to index them, and 
> then keeping a local cached copy.
> 
> What's the best way to do this?  Failing that, pointers into the source 
> code appreciated. :)
> 
>        /r$
> 
> --
> STSM, Senior Security Architect
> DataPower SOA Appliances
> http://www.ibm.com/software/integration/datapower/
>