You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Peter Karman <pe...@peknet.com> on 2011/09/11 03:18:02 UTC

Re: [lucy-dev] which fields contained which terms

Marvin Humphrey wrote on 8/30/11 4:59 PM:

> 
> To support highlighting, at index-time we create an inverted representation
> for each field that has been marked as "highlightable", then serialize all the
> inverted fields together in one blob (called, for no particularly good reason,
> a "DocVector").  Effectively this is a miniature inverted-index containing a
> single document.  The class which does the work is
> Lucy::Index::HighlightWriter, and the relevant segment files are named
> seg_NNN/highlight.ix and seg_NNN/highlight.dat.
> 

my brief tests show that setting highlightable => 1 for all fields increases the
size of the index by about 65%. Is that about right, in your experience?

-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [lucy-dev] which fields contained which terms

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, Sep 10, 2011 at 08:18:02PM -0500, Peter Karman wrote:
> my brief tests show that setting highlightable => 1 for all fields increases the
> size of the index by about 65%. Is that about right, in your experience?

Yes, that's not surprising.  Those miniature inverted indexes contain a lot of
data.  Each has its own term dictionary.  Both term frequency and positional
data are included, and the per-token positional data is augmented with start
and end offsets measured in code points.  

Marvin Humphrey