You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Anton Leuski <le...@ict.usc.edu> on 2005/10/08 22:31:15 UTC
Adding information to an index
Greetings,
I'm looking to store some additional information in a Lucene index
and I'm looking for an advise on how to implement the functionality.
Specifically, I'm planning to store 1) collection frequency count for
each term, 2) actual document length for each document (yes, I looked
at the norm factor, I'm still considering how to adapt it...) 3)
collection size (total number of terms) for each field 4) vocabulary
size (number of unique terms) for each field. All this info can be
computed on the fly, but I would prefer to generate it at the
indexing time and store somewhere.
I think I figured out how to handle #1) -- I found a post by Doug
Cutting about it which pointed me in the right direction. What to do
about the rest of the info? I'd like the implementation to
automatically update the counts as documents are added and deleted
from the index.
Thank you.
-- Anton
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Adding information to an index
Posted by Chris Hostetter <ho...@fucit.org>.
: I'm looking to store some additional information in a Lucene index
: and I'm looking for an advise on how to implement the functionality.
: Specifically, I'm planning to store 1) collection frequency count for
: each term, 2) actual document length for each document (yes, I looked
: at the norm factor, I'm still considering how to adapt it...) 3)
: collection size (total number of terms) for each field 4) vocabulary
: size (number of unique terms) for each field. All this info can be
: computed on the fly, but I would prefer to generate it at the
: indexing time and store somewhere.
Unless I'm missunderstanding your terminology, It seems like all of this
information is either already stored in the index, or easy to add using
the existing API
#1 - Searchable.docFreq(Term):int
#2 - add as a new field per document.
#3 & #4 ...
...these are a little trickier. You can easily get both by iterating over
IndexReader.terms(), but if you specifically want to store the data in the
index, I would first add all of your documents, then use the TermEnum
to compute the information and put it all as stored fields in a single
"metadata" document with no indexed fields (or at least: none in common
with your regular data).
now you've precomputed everything you want to know, and it's easily
available at query time.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org