You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Francesco Cipriani <f....@mclink.net> on 2005/06/15 22:39:33 UTC

Re: Nutch indexes & page retrieving

On Wed, Jun 15, 2005 at 08:18:47PM +0200, Stefan Groschupf wrote:

["./index" dirs]
> Yes, it's a lucene index.

I'm sorry, this is explained in the tutorial itself, but I had forgotten
it.

> May this document can help you..
> http://wiki.media-style.com/display/nutchDocu/Home

In http://wiki.media-style.com/display/nutchDocu/Nutch+architecture
there's a box saying something about a "developer chapter", but is there
a developer chapter? I cannot find it.

I've read the document you wrote (thank you, by the way), but I still 
cannot understand if Nutch, given an URL, is able to return the content
of the page crawled from that URL

Bye.
-- 
Francesco

Re: Nutch indexes & page retrieving

Posted by Stefan Groschupf <sg...@media-style.com>.
> but is there
> a developer chapter? I cannot find it.
>
Work in progress. :)
> I've read the document you wrote (thank you, by the way), but I still
> cannot understand if Nutch, given an URL, is able to return the  
> content
> of the page crawled from that URL
Have a look how the site index filter and the index more plugin work  
and you will understand how to realize such a functionality you are  
interested in.

HTH
Stefan