You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/11/22 21:05:27 UTC

[GitHub] [lucene] msokolov commented on a change in pull request #460: LUCENE-10247 - reduce size of FSTs by relative coding

msokolov commented on a change in pull request #460:
URL: https://github.com/apache/lucene/pull/460#discussion_r754629656



##########
File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
##########
@@ -720,9 +745,9 @@ long addNode(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn)
       }
 
       if (targetHasArcs && (flags & BIT_TARGET_NEXT) == 0) {
-        assert target.node > 0;
+        assert targetNode > 0;

Review comment:
       isn't it possible for this to be zero now? We could have arcs that target the current node?

##########
File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
##########
@@ -1000,6 +1027,98 @@ private void writePresenceBits(
     assert bytePos - dest == numPresenceBytes;
   }
 
+  private long estimateNodeAddress(

Review comment:
       I would be leery of having a function (here) that must mimic the tortuous logic elsewhere - I'm amazed you were able to (mostly) faithfully reproduce it! Something indeed that adds an additional buffer with the ability to go back and "patch up" the addresses once enough data has been written would enable the logic to be maintained in one place - and then if we add some clever opto there later, it would just reflected in the variable encoding here?  I confess I don't quite see all the way to the end of that, but if it could be made to work, I think the extra RAM and time spent during compilation would probably be OK.

##########
File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
##########
@@ -720,9 +745,9 @@ long addNode(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn)
       }
 
       if (targetHasArcs && (flags & BIT_TARGET_NEXT) == 0) {
-        assert target.node > 0;
+        assert targetNode > 0;

Review comment:
       also - if we were able to encode negative offsets we could always apply this variable encoding and avoid the need for a bit and conditional logic to check it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org