You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by jafarim <ja...@gmail.com> on 2007/03/17 13:36:32 UTC

Storing whole document in the index

Hello
It's a whil that I am using lucene and as most of people seemingly do, I
used to save only some important fields of a docuemnt in the index. But
recently I thought why not store the whole document bytes as an untokenized
field in the index in order to ease the retrieval process? For example
serialize the pdf file into a byte[] and then save the bytes as a field in
the index.(some gzip and base64 encodings may be needed as glue logic). Then
I can delete the original file from the system. Is there any reason against
this idea? Can lucene bear this large volume of input streamed data?

Re: Storing whole document in the index

Posted by Grant Ingersoll <gs...@apache.org>.
Please ask these type of questions on the user mailing list, you will  
get much better responses.  The dev list is for developers of Lucene.

To answer your question, yes you can do this.  You may find the  
FieldSelector API additions and Lazy Field Loading to be helpful  
performance wise, as well.

-Grant

On Mar 17, 2007, at 8:36 AM, jafarim wrote:

> Hello
> It's a whil that I am using lucene and as most of people seemingly  
> do, I
> used to save only some important fields of a docuemnt in the index.  
> But
> recently I thought why not store the whole document bytes as an  
> untokenized
> field in the index in order to ease the retrieval process? For example
> serialize the pdf file into a byte[] and then save the bytes as a  
> field in
> the index.(some gzip and base64 encodings may be needed as glue  
> logic). Then
> I can delete the original file from the system. Is there any reason  
> against
> this idea? Can lucene bear this large volume of input streamed data?

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org