You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Michael Sokolov <ms...@gmail.com> on 2019/04/25 19:51:15 UTC

Improve performance of FST Arc traversal

I've been experimenting with a new FST encoding, and the performance
gains are exciting on FST-intensive benchmarks like for the suggesters
and for PKLookup in luceneutil. In our production system we see some
gains in regular search performance as well, although these are modest
since FST lookup is not a major component of our costs there. The
gains vary a lot depending on the workload and the makeup of the terms
encoded in the FST, but for example I've seen as much as a 2x speedup
in Fuzzy Suggester performance. I'd like to open an issue to discuss
further, and share a PR, but here's a quick summary of what the
proposed change is:

FST is basically a graph composed of Arcs. Today, outgoing Arcs of a
given Arc can be encoded in two ways: either as a list of
variable-length-encoded Arcs, or as an array of fixed-length-encoded
Arcs. When seeking in the FST (eg looking up a term), one matches
successive characters against the Arc labels in the graph. If Arcs are
encoded in the list format, we scan each item in the list to find a
matching label, terminating early since they are ordered (by label).
When Arcs are in the array format, we use a binary search to find an
Arc with a matching label. The array format is used when there are a
relatively larger number of Arcs, and more so nearer the start of FST
graph.The new "direct array" encoding stores outgoing Arcs in a
full-sized array big enough to cover the complete span of outgoing
labels, so we can address directly by label and avoid the binary
search. Generally such an array will have some gaps, so this
fundamentally offers a space/time tradeoff.

Unless there is some strenuous objection (FST must not be touched!),
I'll open an issue soon and post a patch for comments.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Improve performance of FST Arc traversal

Posted by Michael Sokolov <ms...@gmail.com>.

Yes I think so too. Apparently this can be a ram hog if you have zillions
of fields?

 I plan to post an issue with more details over the weekend when I will
have the time to write up the results I saw.

On Fri, Apr 26, 2019, 10:31 AM David Smiley <da...@gmail.com>
wrote:

> Maybe what you propose would allow FST.cachedRootArcs to be eliminated?
> Basically your algorithm could decide that the root would always be direct
> access.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Thu, Apr 25, 2019 at 5:18 PM Dawid Weiss <da...@gmail.com> wrote:
>
>> > I'm curious how the hot cache you describe would be maintained and
>> > accessed.
>>
>> The only gain I was successful at was with static precomputed cache.
>> Calculated from a priori data (hot paths) or, in the extreme, the
>> entire FST is converted to a hash map. :) This allows very fast
>> lookups of any node-arc pair, but is not usable for enumerating
>> outgoing arcs (for example)... It's really something we did to
>> accelerate millions and millions of lookups over the FST (in an
>> otherwise not even Lucene-related data structure). I don't think it'll
>> be generally useful.
>>
>> > I know eg that the current implementation has a cache of the
>> > first 128 "root" arcs, which I guess are presumed to be
>> > highly-visited, but I think what you are describing is a more dynamic
>> > cache based on usage?
>>
>> I don't think it's an assumption of high-visited ratio... it's just a
>> limit so that we don't cache very extreme root node fan-outs. The
>> choice of this limit is very arbitrary I'd say...
>>
>> > Were you thinking that one would maintain an LRU
>> > cache say? Or was this some offline analysis you did based on access
>> > patterns?
>>
>> On patterns. So not generally useful.
>>
>> Don't get me wrong -- try to experiment with that constant-expanded
>> array... This can be useful especially for nodes close to the root (if
>> they're dense)... So definitely worth looking into.
>>
>> Dawid
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

Re: Improve performance of FST Arc traversal

Posted by David Smiley <da...@gmail.com>.

Maybe what you propose would allow FST.cachedRootArcs to be eliminated?
Basically your algorithm could decide that the root would always be direct
access.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Apr 25, 2019 at 5:18 PM Dawid Weiss <da...@gmail.com> wrote:

> > I'm curious how the hot cache you describe would be maintained and
> > accessed.
>
> The only gain I was successful at was with static precomputed cache.
> Calculated from a priori data (hot paths) or, in the extreme, the
> entire FST is converted to a hash map. :) This allows very fast
> lookups of any node-arc pair, but is not usable for enumerating
> outgoing arcs (for example)... It's really something we did to
> accelerate millions and millions of lookups over the FST (in an
> otherwise not even Lucene-related data structure). I don't think it'll
> be generally useful.
>
> > I know eg that the current implementation has a cache of the
> > first 128 "root" arcs, which I guess are presumed to be
> > highly-visited, but I think what you are describing is a more dynamic
> > cache based on usage?
>
> I don't think it's an assumption of high-visited ratio... it's just a
> limit so that we don't cache very extreme root node fan-outs. The
> choice of this limit is very arbitrary I'd say...
>
> > Were you thinking that one would maintain an LRU
> > cache say? Or was this some offline analysis you did based on access
> > patterns?
>
> On patterns. So not generally useful.
>
> Don't get me wrong -- try to experiment with that constant-expanded
> array... This can be useful especially for nodes close to the root (if
> they're dense)... So definitely worth looking into.
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Improve performance of FST Arc traversal

Posted by Dawid Weiss <da...@gmail.com>.

> I'm curious how the hot cache you describe would be maintained and
> accessed.

The only gain I was successful at was with static precomputed cache.
Calculated from a priori data (hot paths) or, in the extreme, the
entire FST is converted to a hash map. :) This allows very fast
lookups of any node-arc pair, but is not usable for enumerating
outgoing arcs (for example)... It's really something we did to
accelerate millions and millions of lookups over the FST (in an
otherwise not even Lucene-related data structure). I don't think it'll
be generally useful.

> I know eg that the current implementation has a cache of the
> first 128 "root" arcs, which I guess are presumed to be
> highly-visited, but I think what you are describing is a more dynamic
> cache based on usage?

I don't think it's an assumption of high-visited ratio... it's just a
limit so that we don't cache very extreme root node fan-outs. The
choice of this limit is very arbitrary I'd say...

> Were you thinking that one would maintain an LRU
> cache say? Or was this some offline analysis you did based on access
> patterns?

On patterns. So not generally useful.

Don't get me wrong -- try to experiment with that constant-expanded
array... This can be useful especially for nodes close to the root (if
they're dense)... So definitely worth looking into.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Improve performance of FST Arc traversal

Posted by Michael Sokolov <ms...@gmail.com>.

Hi Dawid,

The heuristic I used was to encode using the direct-array approach
when more than 1/4 of the array indices would be filled (ie max-label
- min-label / num-labels < 4), and otherwise to use the existing
packed array encoding. I only applied the direct encoding when we
previously would have used the fixed-size arc arrays *and* this
density heuristic was met. I experimented with other "load factors,"
and it's true that the sweet spot varies, but that seemed to lead to a
good compromise across a variety of test cases.

I'm curious how the hot cache you describe would be maintained and
accessed. I know eg that the current implementation has a cache of the
first 128 "root" arcs, which I guess are presumed to be
highly-visited, but I think what you are describing is a more dynamic
cache based on usage? Were you thinking that one would maintain an LRU
cache say? Or was this some offline analysis you did based on access
patterns?

On Thu, Apr 25, 2019 at 4:32 PM Dawid Weiss <da...@gmail.com> wrote:
>
> Hi Mike,
>
> My experience tells me that in practice it's really difficult to tell
> which nodes should be expanded (where this "cost" of binary lookup
> would significantly outweight a direct offset jump). I had some luck
> in speeding up (very intensive) lookups by creating a hash of [node,
> arc label] => node for those paths which were frequently accessed...
> perhaps such a "hot path" cache would be better (compared to static
> expansion of all outgoing arcs)?
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Improve performance of FST Arc traversal

Posted by Dawid Weiss <da...@gmail.com>.

Hi Mike,

My experience tells me that in practice it's really difficult to tell
which nodes should be expanded (where this "cost" of binary lookup
would significantly outweight a direct offset jump). I had some luck
in speeding up (very intensive) lookups by creating a hash of [node,
arc label] => node for those paths which were frequently accessed...
perhaps such a "hot path" cache would be better (compared to static
expansion of all outgoing arcs)?

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org