You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Pedro Bezunartea López <pe...@bezunartea.net> on 2010/02/22 02:40:11 UTC

Re: Content storage, results highlighting [SOLVED]

My mistake! I'd done the proper step, which actually is number 3: Modify the
BasicIndexingFilter to include the fields you need stored and compiled the
plugins with "ant compile-plugins"... but I didn't copy the new plugin jar
to the proper location, I assumed ant would do it... stupid mistake!

For future reference, copy the jar from
.../build/plugins/index-basic/index-basic.jar to .../plugins/index-basic/

Next hurdle... query highlight! :)

BTW, I don't know if the first 2 steps are necessary at all. Cheers,

Pedro.

2010/2/21 Pedro Bezunartea López <pe...@bezunartea.net>

>
> Hi,
>
> I've developed a web application in lucene that searches web pages using a
> nutch generated index. I'd like to highlight the query searched for when
> showing the results, and I understand that the content of the pages need to
> be stored, as well as indexed.
>
> This is what I've tried so far:
> 1.- In the file conf/nutch-site.xml, I changed the value of
> "file.content.ignored" to false.
> 2.- In the file conf/schema.xml I modified the line:
>  <field name="content" type="text" stored="false" indexed="true"/>
> to
>  <field name="content" type="text" stored="true" indexed="true"/>
> 3.- In the sources file
> src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java,
> line 116 to:
>  LuceneWriter.addFieldOptions("content", LuceneWriter.STORE.YES,
>         LuceneWriter.INDEX.TOKENIZED, conf)
>
> I tried running the command "bin/nutch crawl urls -dir crawl -depth 10
> -topN 5000" after the first two steps, but the crawl didn't store the
> contents. I then tried the third step, recompiled nutch, and run the crawl
> command again to no avail.
>
> What am I missing? Any hints, please?
>
> TIA,
>
> Pedro.
>