You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Tesfaye Guta <te...@gmail.com> on 2010/06/09 07:32:35 UTC

Your Help on Nutch!

Hello all,
I am able to configure Nutch and use it on my PC.
I am working a thesis on a local search engine.
I hope in the way I understood Nutch, it is automatically indexing the
documents it has crawled.
I want to do some preprocessing on the documents cralwed before they get
indexed. Can you help me
on how to go about?

Thank u in advance and hope to hear from you soon.

Re: Your Help on Nutch!

Posted by Hemanth Yamijala <yh...@gmail.com>.
Tesfaye,

> I am able to configure Nutch and use it on my PC.
> I am working a thesis on a local search engine.
> I hope in the way I understood Nutch, it is automatically indexing the
> documents it has crawled.
> I want to do some preprocessing on the documents cralwed before they get
> indexed. Can you help me
> on how to go about?

You can probably write plugins that can help you achieve this. Please
take a look at http://wiki.apache.org/nutch/PluginCentral. For e.g. if
you want to parse special tags in documents you've crawled and want to
index on them, it is possible to do so using something like is
documented here: http://wiki.apache.org/nutch/HowToMakeCustomSearch

Can you see if this gives you an idea to get forward ?

Thanks
Hemanth