You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ravikumar Govindarajan <ra...@gmail.com> on 2013/11/15 18:16:54 UTC

FST Builder pruning

I was trying to understand some logic in Builder class of FST.

The method freezeTail() looks quite hairy. I gather that there is an some
logic for pruning a node or compiling it.

What exactly is pruning a node? An example of it will be really really
helpful

--
Ravi

Re: FST Builder pruning

Posted by Michael McCandless <lu...@mikemccandless.com>.
I think BlockTree should be better (less disk space, RAM and faster
lookups), but if you can make a benchmark comparing the two that would
help confirm/deny!

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 18, 2013 at 9:21 PM, Ravikumar Govindarajan
<ra...@gmail.com> wrote:
> Many thanks Mike.
>
> In a given document, I usually have 15-20 fields out of which 6-7 fields
> are plain key-value fields.
>
> Typically these key-value fields don't involve prefixes, reflexes, fuzzies
> etc...It's always a full match. Non-existent values are also not possible
> during search.
>
> In such a case, will it be better for me to use the old TII/TIS formats or
> the BlockTree format will be much better for K-V access also?
>
> --
> Ravi
>
> On Monday, November 18, 2013, Michael McCandless wrote:
>
>> Yes, BlockTreeTermsWriter uses freezeTail to figure out where to draw
>> the lines for assigning terms to blocks, but to build the trie terms
>> index it builds a separate FST, by adding in each block's prefix (it
>> doesn't use the FST's builder pruning to create the trie).
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Nov 15, 2013 at 1:33 PM, Ravikumar Govindarajan
>> <ravikumar.govindarajan@gmail.com <javascript:;>> wrote:
>> > Yeah, now I kind of understood.
>> >
>> > Is this why BlockTreeTermsWriter plugs in it's freezeTail logic of
>> meeting
>> > min-nbr of terms per block and building a trie for locating sub-blocks?
>> >
>> > --
>> > Ravi
>> >
>> >
>> > On Fri, Nov 15, 2013 at 11:17 PM, Michael McCandless <
>> > lucene@mikemccandless.com <javascript:;>> wrote:
>> >
>> >> When you turn on pruning, FST Builder will just remove nodes that
>> >> don't have a high enough count of input terms traversing through them.
>> >>  E.g. if minSuffixCount1 is 100 then only FST nodes that see >= 100
>> >> input terms coming through them, are preserved.
>> >>
>> >> You can use this to build a prefix trie instead of the full FST.
>> >>
>> >> Creating a custom tail freezer is very expert: it lets you implement
>> >> arbitrary logic on which nodes are pruned or not.
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Fri, Nov 15, 2013 at 12:16 PM, Ravikumar Govindarajan
>> >> <ravikumar.govindarajan@gmail.com <javascript:;>> wrote:
>> >> > I was trying to understand some logic in Builder class of FST.
>> >> >
>> >> > The method freezeTail() looks quite hairy. I gather that there is an
>> some
>> >> > logic for pruning a node or compiling it.
>> >> >
>> >> > What exactly is pruning a node? An example of it will be really really
>> >> > helpful
>> >> >
>> >> > --
>> >> > Ravi
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<javascript:;>
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org<javascript:;>
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<javascript:;>
>> For additional commands, e-mail: java-user-help@lucene.apache.org<javascript:;>
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FST Builder pruning

Posted by Ravikumar Govindarajan <ra...@gmail.com>.
Many thanks Mike.

In a given document, I usually have 15-20 fields out of which 6-7 fields
are plain key-value fields.

Typically these key-value fields don't involve prefixes, reflexes, fuzzies
etc...It's always a full match. Non-existent values are also not possible
during search.

In such a case, will it be better for me to use the old TII/TIS formats or
the BlockTree format will be much better for K-V access also?

--
Ravi

On Monday, November 18, 2013, Michael McCandless wrote:

> Yes, BlockTreeTermsWriter uses freezeTail to figure out where to draw
> the lines for assigning terms to blocks, but to build the trie terms
> index it builds a separate FST, by adding in each block's prefix (it
> doesn't use the FST's builder pruning to create the trie).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 15, 2013 at 1:33 PM, Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com <javascript:;>> wrote:
> > Yeah, now I kind of understood.
> >
> > Is this why BlockTreeTermsWriter plugs in it's freezeTail logic of
> meeting
> > min-nbr of terms per block and building a trie for locating sub-blocks?
> >
> > --
> > Ravi
> >
> >
> > On Fri, Nov 15, 2013 at 11:17 PM, Michael McCandless <
> > lucene@mikemccandless.com <javascript:;>> wrote:
> >
> >> When you turn on pruning, FST Builder will just remove nodes that
> >> don't have a high enough count of input terms traversing through them.
> >>  E.g. if minSuffixCount1 is 100 then only FST nodes that see >= 100
> >> input terms coming through them, are preserved.
> >>
> >> You can use this to build a prefix trie instead of the full FST.
> >>
> >> Creating a custom tail freezer is very expert: it lets you implement
> >> arbitrary logic on which nodes are pruned or not.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Fri, Nov 15, 2013 at 12:16 PM, Ravikumar Govindarajan
> >> <ravikumar.govindarajan@gmail.com <javascript:;>> wrote:
> >> > I was trying to understand some logic in Builder class of FST.
> >> >
> >> > The method freezeTail() looks quite hairy. I gather that there is an
> some
> >> > logic for pruning a node or compiling it.
> >> >
> >> > What exactly is pruning a node? An example of it will be really really
> >> > helpful
> >> >
> >> > --
> >> > Ravi
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<javascript:;>
> >> For additional commands, e-mail: java-user-help@lucene.apache.org<javascript:;>
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<javascript:;>
> For additional commands, e-mail: java-user-help@lucene.apache.org<javascript:;>
>
>

Re: FST Builder pruning

Posted by Michael McCandless <lu...@mikemccandless.com>.
Yes, BlockTreeTermsWriter uses freezeTail to figure out where to draw
the lines for assigning terms to blocks, but to build the trie terms
index it builds a separate FST, by adding in each block's prefix (it
doesn't use the FST's builder pruning to create the trie).

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 15, 2013 at 1:33 PM, Ravikumar Govindarajan
<ra...@gmail.com> wrote:
> Yeah, now I kind of understood.
>
> Is this why BlockTreeTermsWriter plugs in it's freezeTail logic of meeting
> min-nbr of terms per block and building a trie for locating sub-blocks?
>
> --
> Ravi
>
>
> On Fri, Nov 15, 2013 at 11:17 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> When you turn on pruning, FST Builder will just remove nodes that
>> don't have a high enough count of input terms traversing through them.
>>  E.g. if minSuffixCount1 is 100 then only FST nodes that see >= 100
>> input terms coming through them, are preserved.
>>
>> You can use this to build a prefix trie instead of the full FST.
>>
>> Creating a custom tail freezer is very expert: it lets you implement
>> arbitrary logic on which nodes are pruned or not.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Nov 15, 2013 at 12:16 PM, Ravikumar Govindarajan
>> <ra...@gmail.com> wrote:
>> > I was trying to understand some logic in Builder class of FST.
>> >
>> > The method freezeTail() looks quite hairy. I gather that there is an some
>> > logic for pruning a node or compiling it.
>> >
>> > What exactly is pruning a node? An example of it will be really really
>> > helpful
>> >
>> > --
>> > Ravi
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FST Builder pruning

Posted by Ravikumar Govindarajan <ra...@gmail.com>.
Yeah, now I kind of understood.

Is this why BlockTreeTermsWriter plugs in it's freezeTail logic of meeting
min-nbr of terms per block and building a trie for locating sub-blocks?

--
Ravi


On Fri, Nov 15, 2013 at 11:17 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> When you turn on pruning, FST Builder will just remove nodes that
> don't have a high enough count of input terms traversing through them.
>  E.g. if minSuffixCount1 is 100 then only FST nodes that see >= 100
> input terms coming through them, are preserved.
>
> You can use this to build a prefix trie instead of the full FST.
>
> Creating a custom tail freezer is very expert: it lets you implement
> arbitrary logic on which nodes are pruned or not.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 15, 2013 at 12:16 PM, Ravikumar Govindarajan
> <ra...@gmail.com> wrote:
> > I was trying to understand some logic in Builder class of FST.
> >
> > The method freezeTail() looks quite hairy. I gather that there is an some
> > logic for pruning a node or compiling it.
> >
> > What exactly is pruning a node? An example of it will be really really
> > helpful
> >
> > --
> > Ravi
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: FST Builder pruning

Posted by Michael McCandless <lu...@mikemccandless.com>.
When you turn on pruning, FST Builder will just remove nodes that
don't have a high enough count of input terms traversing through them.
 E.g. if minSuffixCount1 is 100 then only FST nodes that see >= 100
input terms coming through them, are preserved.

You can use this to build a prefix trie instead of the full FST.

Creating a custom tail freezer is very expert: it lets you implement
arbitrary logic on which nodes are pruned or not.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 15, 2013 at 12:16 PM, Ravikumar Govindarajan
<ra...@gmail.com> wrote:
> I was trying to understand some logic in Builder class of FST.
>
> The method freezeTail() looks quite hairy. I gather that there is an some
> logic for pruning a node or compiling it.
>
> What exactly is pruning a node? An example of it will be really really
> helpful
>
> --
> Ravi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org