You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Rose, Stuart J" <st...@pnnl.gov> on 2013/10/07 21:31:05 UTC

optimal way to access many TermVectors

Is there an optimal way to access many document TermVectors (in the same chunk) consecutively when using the LZ4 termvector compression?

I'm curious to know whether all TermVectors in a single compressed chunk are decompressed and cached when one TermVector in the same chunk is accessed?

Also wondering if there is a mapping of TermVector order to docID order? Or is it always one to one? If docIds are dynamic, then presumably they are not necessarily in the same order as their documents' corresponding term vectors...

Thanks,
Stuart


Re: optimal way to access many TermVectors

Posted by Adrien Grand <jp...@gmail.com>.
Hi,

On Mon, Oct 7, 2013 at 9:31 PM, Rose, Stuart J <st...@pnnl.gov> wrote:
> Is there an optimal way to access many document TermVectors (in the same chunk) consecutively when using the LZ4 termvector compression?
>
> I'm curious to know whether all TermVectors in a single compressed chunk are decompressed and cached when one TermVector in the same chunk is accessed?

The main use-case for term vectors today being more-like-this and
highlighting, term vectors are generally accessed in no particular
order. This is why we don't cache the uncompressed chunk (it would
never get reused) so you need to decompress everytime you are
retrieving a document or its term vectors.

> Also wondering if there is a mapping of TermVector order to docID order? Or is it always one to one? If docIds are dynamic, then presumably they are not necessarily in the same order as their documents' corresponding term vectors...

Term vectors are stored in doc ID order, meaning that for a given
segment, term vectors for document N are followed by term vectors for
document N+1.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org