You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by John Wang <jo...@gmail.com> on 2018/06/23 22:04:56 UTC

SuRf FST implementation

The SigMod paper describes a more compact FST implementation looks really
interesting:

 http://www.cs.cmu.edu/~huanche1/publications/surf_paper.pdf

 (reference implementation: https://github.com/efficient/SuRF)

Was wondering if Lucene's FST implementation used by term dictionary can
take advantage of this.

Thanks

-John

Re: SuRf FST implementation

Posted by Michael McCandless <lu...@mikemccandless.com>.
Thanks for sharing, John; this looks interesting!  Would be nice to build a
codec using that reference impl (hmm, except it is C++)

FST in this context stands for "Fast Succinct Trie" not "Finite State
Transducer" and it seems to be a data structure similar to (but claimed
better than) bloom filters, in that it can efficiently evaluate set
membership with a tunable chance for being wrong (false positive), but it'd
be nice to see some numbers on a real Lucene index / use case.

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jun 23, 2018 at 6:04 PM, John Wang <jo...@gmail.com> wrote:

> The SigMod paper describes a more compact FST implementation looks really
> interesting:
>
>  http://www.cs.cmu.edu/~huanche1/publications/surf_paper.pdf
>
>  (reference implementation: https://github.com/efficient/SuRF)
>
> Was wondering if Lucene's FST implementation used by term dictionary can
> take advantage of this.
>
> Thanks
>
> -John
>