You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Ale <at...@yahoo.com> on 2012/01/20 00:36:53 UTC

make nutch plugin to get termfreqvectors


Hi,
I'm quite new working with nutch plugins. I'm trying to save the termfreqvectors of the documents.
I'm using nutch 1.4

I've seen that I had to use, in the plugin class, the method addFieldOption, like:
--
public void addIndexBackendOptions(Configuration conf) {

       //    add lucene options   // 


// host is un-stored, indexed and tokenized
    LuceneWriter.addFieldOptions("host", LuceneWriter.STORE.NO,
    LuceneWriter.INDEX.TOKENIZED, conf);
      // site is un-stored, indexed and un-tokenized
    LuceneWriter.addFieldOptions("site", LuceneWriter.STORE.NO,
    LuceneWriter.INDEX.UNTOKENIZED, conf); 

      // url is both stored and indexed, so it's both searchable and returned 

    LuceneWriter.addFieldOptions("url", LuceneWriter.STORE.YES, 

    LuceneWriter.INDEX.TOKENIZED, conf); 


       // content is indexed, so that it's searchable, but not stored in index 

      LuceneWriter.addFieldOptions("content", LuceneWriter.STORE.NO, 

          LuceneWriter.INDEX.TOKENIZED, conf);
      // anchors are indexed, so they're searchable, but not stored in index 

      LuceneWriter.addFieldOptions("anchor", LuceneWriter.STORE.NO, 

          LuceneWriter.INDEX.TOKENIZED, conf); 

      // title is indexed and stored so that it can be displayed 

      LuceneWriter.addFieldOptions("title", LuceneWriter.STORE.YES, 

    LuceneWriter.INDEX.TOKENIZED, conf);

----
The problem is, as far as I have seen, that LuceneWriter no longer exists in 1.4 (Lucene 3.5)
WHich is the correct way to do it ?

Thank you very much in advance !

--


Re: make nutch plugin to get termfreqvectors

Posted by Lewis John Mcgibbney <le...@gmail.com>.
To get you started you would be as well looking at the 1.4 Javadoc [1] and
focus on the o.a.n.indexer & o.a.n.indexer.solr packages as these provide
all the stuff you should need to understand how and what can be done within
Nutch 1.4.

[1] http://nutch.apache.org/apidocs-1.4/index.html

On Thu, Jan 19, 2012 at 11:36 PM, Ale <at...@yahoo.com> wrote:

>
>
> ----
> The problem is, as far as I have seen, that LuceneWriter no longer exists
> in 1.4 (Lucene 3.5)
>
>
> Correct we dropped the legacy Lucene material when the project was
revamped. Have a look at the SolrWriter class


-- 
*Lewis*

Re: make nutch plugin to get termfreqvectors

Posted by kaveh minooie <ka...@plutoz.com>.
Hi

I am having simillar problem in that I have to update bunch of plugins 
that were writen for nutch 1.1. It would be great If we could get some 
hints.

Thanks,

On 01/19/2012 03:36 PM, Ale wrote:
>
>
> Hi,
> I'm quite new working with nutch plugins. I'm trying to save the termfreqvectors of the documents.
> I'm using nutch 1.4
>
> I've seen that I had to use, in the plugin class, the method addFieldOption, like:
> --
> public void addIndexBackendOptions(Configuration conf) {
>
>         //    add lucene options   //
>
>
> // host is un-stored, indexed and tokenized
>      LuceneWriter.addFieldOptions("host", LuceneWriter.STORE.NO,
>      LuceneWriter.INDEX.TOKENIZED, conf);
>        // site is un-stored, indexed and un-tokenized
>      LuceneWriter.addFieldOptions("site", LuceneWriter.STORE.NO,
>      LuceneWriter.INDEX.UNTOKENIZED, conf);
>
>        // url is both stored and indexed, so it's both searchable and returned
>
>      LuceneWriter.addFieldOptions("url", LuceneWriter.STORE.YES,
>
>      LuceneWriter.INDEX.TOKENIZED, conf);
>
>
>         // content is indexed, so that it's searchable, but not stored in index
>
>        LuceneWriter.addFieldOptions("content", LuceneWriter.STORE.NO,
>
>            LuceneWriter.INDEX.TOKENIZED, conf);
>        // anchors are indexed, so they're searchable, but not stored in index
>
>        LuceneWriter.addFieldOptions("anchor", LuceneWriter.STORE.NO,
>
>            LuceneWriter.INDEX.TOKENIZED, conf);
>
>        // title is indexed and stored so that it can be displayed
>
>        LuceneWriter.addFieldOptions("title", LuceneWriter.STORE.YES,
>
>      LuceneWriter.INDEX.TOKENIZED, conf);
>
> ----
> The problem is, as far as I have seen, that LuceneWriter no longer exists in 1.4 (Lucene 3.5)
> WHich is the correct way to do it ?
>
> Thank you very much in advance !
>
> --
>

-- 
Kaveh Minooie

www.plutoz.com