You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by John Wang <jo...@gmail.com> on 2018/06/23 22:04:56 UTC
SuRf FST implementation
The SigMod paper describes a more compact FST implementation looks really
interesting:
http://www.cs.cmu.edu/~huanche1/publications/surf_paper.pdf
(reference implementation: https://github.com/efficient/SuRF)
Was wondering if Lucene's FST implementation used by term dictionary can
take advantage of this.
Thanks
-John
Re: SuRf FST implementation
Posted by Michael McCandless <lu...@mikemccandless.com>.
Thanks for sharing, John; this looks interesting! Would be nice to build a
codec using that reference impl (hmm, except it is C++)
FST in this context stands for "Fast Succinct Trie" not "Finite State
Transducer" and it seems to be a data structure similar to (but claimed
better than) bloom filters, in that it can efficiently evaluate set
membership with a tunable chance for being wrong (false positive), but it'd
be nice to see some numbers on a real Lucene index / use case.
Mike McCandless
http://blog.mikemccandless.com
On Sat, Jun 23, 2018 at 6:04 PM, John Wang <jo...@gmail.com> wrote:
> The SigMod paper describes a more compact FST implementation looks really
> interesting:
>
> http://www.cs.cmu.edu/~huanche1/publications/surf_paper.pdf
>
> (reference implementation: https://github.com/efficient/SuRF)
>
> Was wondering if Lucene's FST implementation used by term dictionary can
> take advantage of this.
>
> Thanks
>
> -John
>