You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by GitBox <gi...@apache.org> on 2019/07/28 19:26:23 UTC

[GitHub] [lucene-solr] msokolov opened a new pull request #811: LUCENE-8920: Fix bug preventing FST duplicate tails from being shared…

msokolov opened a new pull request #811: LUCENE-8920: Fix bug preventing FST duplicate tails from being shared…
URL: https://github.com/apache/lucene-solr/pull/811
 
 
   … when encoded as array-with-gaps
   While trying to reduce the size of FSTs with array-with-gap encoding, I found that I had neglected to update the comparison function in NodeHash that is used to determine when two arcs are equal, enabling shared tails to be collapsed together. That behavior wasn't tested anywhere, and relied on some internal details of the Arc encoding to short circuit the equality test when two array Arcs are different-sized.
   
   This patch adds a function to check if arcs are packed array, and thus amenable to such an optimization, and a unit test that demonstrates the size reduction. 
   
   This fix won't address the worst-case example Adrien posted, but it addresses a common case, I think. It would be interesting to see how the ES benchmarks are impacted by this fix. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org