You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Peter Karman <pe...@peknet.com> on 2011/09/11 03:18:02 UTC
Re: [lucy-dev] which fields contained which terms
Marvin Humphrey wrote on 8/30/11 4:59 PM:
>
> To support highlighting, at index-time we create an inverted representation
> for each field that has been marked as "highlightable", then serialize all the
> inverted fields together in one blob (called, for no particularly good reason,
> a "DocVector"). Effectively this is a miniature inverted-index containing a
> single document. The class which does the work is
> Lucy::Index::HighlightWriter, and the relevant segment files are named
> seg_NNN/highlight.ix and seg_NNN/highlight.dat.
>
my brief tests show that setting highlightable => 1 for all fields increases the
size of the index by about 65%. Is that about right, in your experience?
--
Peter Karman . http://peknet.com/ . peter@peknet.com
Re: [lucy-dev] which fields contained which terms
Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, Sep 10, 2011 at 08:18:02PM -0500, Peter Karman wrote:
> my brief tests show that setting highlightable => 1 for all fields increases the
> size of the index by about 65%. Is that about right, in your experience?
Yes, that's not surprising. Those miniature inverted indexes contain a lot of
data. Each has its own term dictionary. Both term frequency and positional
data are included, and the per-token positional data is augmented with start
and end offsets measured in code points.
Marvin Humphrey