You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Matt Pearson <mp...@lizearle.com> on 2009/01/05 14:55:18 UTC
store 'content' field in the index
Hi Everyone,
Does anyone know of a way I can configure the Nutch crawler to store the
contents of a document in the generated index?
Thanks
Matt
Matt Pearson
AW: store 'content' field in the index
Posted by Koch Martina <Ko...@huberverlag.de>.
Hi Matt,
line 87 in BasicIndexingFilter.java (package org.apache.nutch.indexer.basic) is what you're looking for...
Kind regards,
Martina
-----Ursprüngliche Nachricht-----
Von: Matt Pearson [mailto:mpearson@lizearle.com]
Gesendet: 06 January 2009 10:30
An: nutch-user@lucene.apache.org
Betreff: RE: store 'content' field in the index
Hmmm, the only occurances of Field.Store I can find are Field.Store.YES.
The 'content' Field does not appear to be explicitly set in the way that
'boost' and 'digest' are.
I guess this question should really be asked on nutch-dev...
Thanks
Matt
-----Original Message-----
From: Ian.huang [mailto:yiwong2001@hotmail.com]
Sent: 05 January 2009 13:58
To: nutch-user@lucene.apache.org
Subject: Re: store 'content' field in the index
I think you need to change the basicindexfilter
Field content=new Field("content", parse.getText(),
Field.Store.NO,
Field.Index.TOKENIZED);
change the Field.Store.NO to Field.Store.YES
Ian
--------------------------------------------------
From: "Matt Pearson" <mp...@lizearle.com>
Sent: Monday, January 05, 2009 1:55 PM
To: <nu...@lucene.apache.org>
Subject: store 'content' field in the index
> Hi Everyone,
>
>
>
> Does anyone know of a way I can configure the Nutch crawler to store
the
> contents of a document in the generated index?
>
>
>
> Thanks
>
>
>
>
>
> Matt
>
>
>
>
>
> Matt Pearson
>
>
>
>
RE: store 'content' field in the index
Posted by Matt Pearson <mp...@lizearle.com>.
Hmmm, the only occurances of Field.Store I can find are Field.Store.YES.
The 'content' Field does not appear to be explicitly set in the way that
'boost' and 'digest' are.
I guess this question should really be asked on nutch-dev...
Thanks
Matt
-----Original Message-----
From: Ian.huang [mailto:yiwong2001@hotmail.com]
Sent: 05 January 2009 13:58
To: nutch-user@lucene.apache.org
Subject: Re: store 'content' field in the index
I think you need to change the basicindexfilter
Field content=new Field("content", parse.getText(),
Field.Store.NO,
Field.Index.TOKENIZED);
change the Field.Store.NO to Field.Store.YES
Ian
--------------------------------------------------
From: "Matt Pearson" <mp...@lizearle.com>
Sent: Monday, January 05, 2009 1:55 PM
To: <nu...@lucene.apache.org>
Subject: store 'content' field in the index
> Hi Everyone,
>
>
>
> Does anyone know of a way I can configure the Nutch crawler to store
the
> contents of a document in the generated index?
>
>
>
> Thanks
>
>
>
>
>
> Matt
>
>
>
>
>
> Matt Pearson
>
>
>
>
Re: store 'content' field in the index
Posted by "Ian.huang" <yi...@hotmail.com>.
I think you need to change the basicindexfilter
Field content=new Field("content", parse.getText(), Field.Store.NO,
Field.Index.TOKENIZED);
change the Field.Store.NO to Field.Store.YES
Ian
--------------------------------------------------
From: "Matt Pearson" <mp...@lizearle.com>
Sent: Monday, January 05, 2009 1:55 PM
To: <nu...@lucene.apache.org>
Subject: store 'content' field in the index
> Hi Everyone,
>
>
>
> Does anyone know of a way I can configure the Nutch crawler to store the
> contents of a document in the generated index?
>
>
>
> Thanks
>
>
>
>
>
> Matt
>
>
>
>
>
> Matt Pearson
>
>
>
>