You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2017/12/13 00:55:00 UTC

[jira] [Commented] (LUCENE-8084) FSTs can be very space-inefficient on array-expanded nodes

    [ https://issues.apache.org/jira/browse/LUCENE-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288549#comment-16288549 ] 

Michael McCandless commented on LUCENE-8084:
--------------------------------------------

I like that idea!

Maybe in the meantime we should improve the logic that decides if an node should use the dense encoding to take into account waste due to large outputs?

> FSTs can be very space-inefficient on array-expanded nodes
> ----------------------------------------------------------
>
>                 Key: LUCENE-8084
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8084
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: capture-4.png
>
>
> We have FSTs which operate on a larger alphabet (keys in int) space and emit character sequence outputs. I noticed that certain nodes get expanded into fixed-size arrays to accelerate lookups (binary search). This has a potential problem, however, when the outputs emit larger blobs of data (say, one of the outputs is very long, all the others are small). Then the fixed-size array is very much overallocated, as evident on the attached picture.
> I wonder if it'd be better to encode the array as fixed-size, but without the outputs and use a local fixed-size pointer into a "value pool" somewhere next to the node's arcs. Theoretically this "value pool" could even be shared by all of automaton's nodes (saved once at the end or flushed periodically). 
> Just a thought.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org