You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Matt Pearson <mp...@lizearle.com> on 2009/01/05 14:55:18 UTC

store 'content' field in the index

Hi Everyone,

 

Does anyone know of a way I can configure the Nutch crawler to store the
contents of a document in the generated index?

 

Thanks

 

 

Matt

 

 

Matt Pearson 

 


AW: store 'content' field in the index

Posted by Koch Martina <Ko...@huberverlag.de>.
Hi Matt,

line 87 in BasicIndexingFilter.java (package org.apache.nutch.indexer.basic) is what you're looking for...

Kind regards,
Martina

-----Ursprüngliche Nachricht-----
Von: Matt Pearson [mailto:mpearson@lizearle.com] 
Gesendet: 06 January 2009 10:30
An: nutch-user@lucene.apache.org
Betreff: RE: store 'content' field in the index

Hmmm, the only occurances of Field.Store I can find are Field.Store.YES.
The 'content' Field does not appear to be explicitly set in the way that
'boost' and 'digest' are.

I guess this question should really be asked on nutch-dev...

Thanks

Matt

-----Original Message-----
From: Ian.huang [mailto:yiwong2001@hotmail.com] 
Sent: 05 January 2009 13:58
To: nutch-user@lucene.apache.org
Subject: Re: store 'content' field in the index

I think you need to change the basicindexfilter

	Field content=new Field("content", parse.getText(),
Field.Store.NO, 
Field.Index.TOKENIZED);

change the Field.Store.NO to Field.Store.YES

Ian

--------------------------------------------------
From: "Matt Pearson" <mp...@lizearle.com>
Sent: Monday, January 05, 2009 1:55 PM
To: <nu...@lucene.apache.org>
Subject: store 'content' field in the index

> Hi Everyone,
>
>
>
> Does anyone know of a way I can configure the Nutch crawler to store
the
> contents of a document in the generated index?
>
>
>
> Thanks
>
>
>
>
>
> Matt
>
>
>
>
>
> Matt Pearson
>
>
>
> 

RE: store 'content' field in the index

Posted by Matt Pearson <mp...@lizearle.com>.
Hmmm, the only occurances of Field.Store I can find are Field.Store.YES.
The 'content' Field does not appear to be explicitly set in the way that
'boost' and 'digest' are.

I guess this question should really be asked on nutch-dev...

Thanks

Matt

-----Original Message-----
From: Ian.huang [mailto:yiwong2001@hotmail.com] 
Sent: 05 January 2009 13:58
To: nutch-user@lucene.apache.org
Subject: Re: store 'content' field in the index

I think you need to change the basicindexfilter

	Field content=new Field("content", parse.getText(),
Field.Store.NO, 
Field.Index.TOKENIZED);

change the Field.Store.NO to Field.Store.YES

Ian

--------------------------------------------------
From: "Matt Pearson" <mp...@lizearle.com>
Sent: Monday, January 05, 2009 1:55 PM
To: <nu...@lucene.apache.org>
Subject: store 'content' field in the index

> Hi Everyone,
>
>
>
> Does anyone know of a way I can configure the Nutch crawler to store
the
> contents of a document in the generated index?
>
>
>
> Thanks
>
>
>
>
>
> Matt
>
>
>
>
>
> Matt Pearson
>
>
>
> 

Re: store 'content' field in the index

Posted by "Ian.huang" <yi...@hotmail.com>.
I think you need to change the basicindexfilter

	Field content=new Field("content", parse.getText(), Field.Store.NO, 
Field.Index.TOKENIZED);

change the Field.Store.NO to Field.Store.YES

Ian

--------------------------------------------------
From: "Matt Pearson" <mp...@lizearle.com>
Sent: Monday, January 05, 2009 1:55 PM
To: <nu...@lucene.apache.org>
Subject: store 'content' field in the index

> Hi Everyone,
>
>
>
> Does anyone know of a way I can configure the Nutch crawler to store the
> contents of a document in the generated index?
>
>
>
> Thanks
>
>
>
>
>
> Matt
>
>
>
>
>
> Matt Pearson
>
>
>
>