You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Ale <at...@yahoo.com> on 2012/01/20 00:36:53 UTC
make nutch plugin to get termfreqvectors
Hi,
I'm quite new working with nutch plugins. I'm trying to save the termfreqvectors of the documents.
I'm using nutch 1.4
I've seen that I had to use, in the plugin class, the method addFieldOption, like:
--
public void addIndexBackendOptions(Configuration conf) {
// add lucene options //
// host is un-stored, indexed and tokenized
LuceneWriter.addFieldOptions("host", LuceneWriter.STORE.NO,
LuceneWriter.INDEX.TOKENIZED, conf);
// site is un-stored, indexed and un-tokenized
LuceneWriter.addFieldOptions("site", LuceneWriter.STORE.NO,
LuceneWriter.INDEX.UNTOKENIZED, conf);
// url is both stored and indexed, so it's both searchable and returned
LuceneWriter.addFieldOptions("url", LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
// content is indexed, so that it's searchable, but not stored in index
LuceneWriter.addFieldOptions("content", LuceneWriter.STORE.NO,
LuceneWriter.INDEX.TOKENIZED, conf);
// anchors are indexed, so they're searchable, but not stored in index
LuceneWriter.addFieldOptions("anchor", LuceneWriter.STORE.NO,
LuceneWriter.INDEX.TOKENIZED, conf);
// title is indexed and stored so that it can be displayed
LuceneWriter.addFieldOptions("title", LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
----
The problem is, as far as I have seen, that LuceneWriter no longer exists in 1.4 (Lucene 3.5)
WHich is the correct way to do it ?
Thank you very much in advance !
--
Re: make nutch plugin to get termfreqvectors
Posted by Lewis John Mcgibbney <le...@gmail.com>.
To get you started you would be as well looking at the 1.4 Javadoc [1] and
focus on the o.a.n.indexer & o.a.n.indexer.solr packages as these provide
all the stuff you should need to understand how and what can be done within
Nutch 1.4.
[1] http://nutch.apache.org/apidocs-1.4/index.html
On Thu, Jan 19, 2012 at 11:36 PM, Ale <at...@yahoo.com> wrote:
>
>
> ----
> The problem is, as far as I have seen, that LuceneWriter no longer exists
> in 1.4 (Lucene 3.5)
>
>
> Correct we dropped the legacy Lucene material when the project was
revamped. Have a look at the SolrWriter class
--
*Lewis*
Re: make nutch plugin to get termfreqvectors
Posted by kaveh minooie <ka...@plutoz.com>.
Hi
I am having simillar problem in that I have to update bunch of plugins
that were writen for nutch 1.1. It would be great If we could get some
hints.
Thanks,
On 01/19/2012 03:36 PM, Ale wrote:
>
>
> Hi,
> I'm quite new working with nutch plugins. I'm trying to save the termfreqvectors of the documents.
> I'm using nutch 1.4
>
> I've seen that I had to use, in the plugin class, the method addFieldOption, like:
> --
> public void addIndexBackendOptions(Configuration conf) {
>
> // add lucene options //
>
>
> // host is un-stored, indexed and tokenized
> LuceneWriter.addFieldOptions("host", LuceneWriter.STORE.NO,
> LuceneWriter.INDEX.TOKENIZED, conf);
> // site is un-stored, indexed and un-tokenized
> LuceneWriter.addFieldOptions("site", LuceneWriter.STORE.NO,
> LuceneWriter.INDEX.UNTOKENIZED, conf);
>
> // url is both stored and indexed, so it's both searchable and returned
>
> LuceneWriter.addFieldOptions("url", LuceneWriter.STORE.YES,
>
> LuceneWriter.INDEX.TOKENIZED, conf);
>
>
> // content is indexed, so that it's searchable, but not stored in index
>
> LuceneWriter.addFieldOptions("content", LuceneWriter.STORE.NO,
>
> LuceneWriter.INDEX.TOKENIZED, conf);
> // anchors are indexed, so they're searchable, but not stored in index
>
> LuceneWriter.addFieldOptions("anchor", LuceneWriter.STORE.NO,
>
> LuceneWriter.INDEX.TOKENIZED, conf);
>
> // title is indexed and stored so that it can be displayed
>
> LuceneWriter.addFieldOptions("title", LuceneWriter.STORE.YES,
>
> LuceneWriter.INDEX.TOKENIZED, conf);
>
> ----
> The problem is, as far as I have seen, that LuceneWriter no longer exists in 1.4 (Lucene 3.5)
> WHich is the correct way to do it ?
>
> Thank you very much in advance !
>
> --
>
--
Kaveh Minooie
www.plutoz.com