You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Artem Lokotosh <ar...@gmail.com> on 2011/12/23 18:26:45 UTC
Storing only unique terms in index
Hi, all
I have catchall "text" field, and use it for searching.This field
stores the non-unique terms. For example, this field stores the
following terms:test test searchIs it possible to store non-unique
terms in the following way: "term"|"number of terms", i.e. test|2
search?
I guess it should reduce the size of index
And if yes - is it possible to use this number of terms when
calculating the relevance?
--
Best regards,
Artem Lokotosh mailto:arconen@gmail.com
Re: Storing only unique terms in index
Posted by Chris Hostetter <ho...@fucit.org>.
: I have catchall "text" field, and use it for searching.This field
: stores the non-unique terms. For example, this field stores the
: following terms:test test searchIs it possible to store non-unique
: terms in the following way: "term"|"number of terms", i.e. test|2
: search?
: I guess it should reduce the size of index
:
: And if yes - is it possible to use this number of terms when
: calculating the relevance?
what you are describing is exactly how an inverted index like Lucene/Solr
works -- the original raw text can optionally be "stored" for retrieval,
but the index that is *searched* contains each term a single time, along
with pointers refering to which documents and where in those documents the
term exists. the number of times a term exists in a document is the term
frequency (or "tf") and is one of the two primary components used in
the basic scoring formula (TF/IDF)
https://lucene.apache.org/java/3_5_0/fileformats.html
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
-Hoss