You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Mohit Sidana <ms...@gmail.com> on 2016/07/06 16:04:46 UTC

Lucene Block term Dictionary

Hello,

I am interested to learn more about how Lucene uses block tree term
dictionary.

while doing research on this topic i found some useful information listed
on below links.


1.
http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
2.
http://blog.mikemccandless.com/2013/09/lucene-now-has-in-memory-terms.html
3. http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal


I do understand that Lucene uses <FST> to store Prefixes of terms in to
memory and lookup terms/posting on disk but i am unable to visualize how
actual search working in Lucene 6.0.

Please can someone suggest a guide which i can follow to understand all
step by step operation how actually a term search works with blockterms
dictionary?

Thanks.

Re: Lucene Block term Dictionary

Posted by Michael McCandless <lu...@mikemccandless.com>.
The latest terms dictionary is "block tree", and unfortunately there are no
guides here, besides of course the source code
(BlockTreeTermsWriter/Reader).  See especially the comments in those
sources: they point to a paper describing the inspiration for this
implementation.

The high level view is that this terms dictionary breaks up the sorted
terms into variable sized blocks (25 to 48 terms in each block) at "good"
boundaries, where the term prefixes change, to maximize overall compression.

The in-memory (JVM heap) FST terms index is used to find which on-disk
block may have a given term, and so on lookup of a given term, we walk the
FST, and then seek to that block and scan.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 6, 2016 at 12:04 PM, Mohit Sidana <ms...@gmail.com> wrote:

> Hello,
>
> I am interested to learn more about how Lucene uses block tree term
> dictionary.
>
> while doing research on this topic i found some useful information listed
> on below links.
>
>
> 1.
> http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
> 2.
> http://blog.mikemccandless.com/2013/09/lucene-now-has-in-memory-terms.html
> 3. http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal
>
>
> I do understand that Lucene uses <FST> to store Prefixes of terms in to
> memory and lookup terms/posting on disk but i am unable to visualize how
> actual search working in Lucene 6.0.
>
> Please can someone suggest a guide which i can follow to understand all
> step by step operation how actually a term search works with blockterms
> dictionary?
>
> Thanks.
>