You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by praveen pathiyil <pa...@gmail.com> on 2005/08/23 01:10:29 UTC

Re: [Nutch-general] Index local file.

Hi Benny,

Check out this mail thread

http://www.mail-archive.com/nutch-user@lucene.apache.org/msg00340.html

HTH,
Praveen.

On 8/22/05, Benny <be...@gmail.com> wrote:
> Hi,
> 
> Can someone give me some hints how index local files?
> 
> I have a lot of plain HTML files (more than 50K pages, the size is
> around 2-3k/page). I don't prefer puting them in the web service and
> using url to index them. I'd like NUTCH to index them from local HD.
> Is it possible? if it is, what kind of url I need inject into db? for
> example, if you use web service, we use the
> 
> http://domain/file.html
> 
> How about local HD file's format? I believe no more "http", what's
> protocol supposed to be. These file are still in plain HTML format.
> 
> 
> Benny
> 
> 
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Nutch-general mailing list
> Nutch-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-general
>