You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/06/04 19:17:48 UTC

[GitHub] [lucene] mikemccand commented on a change in pull request #157: LUCENE-9963 Fix issue with FlattenGraphFilter throwing exceptions from holes

mikemccand commented on a change in pull request #157:
URL: https://github.com/apache/lucene/pull/157#discussion_r645647506



##########
File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestFlattenGraphFilter.java
##########
@@ -314,5 +323,304 @@ public void testTwoLongParallelPaths() throws Exception {
         11);
   }
 
+  // The end node the long path is supposed to flatten over doesn't exist
+  // assert disabled = pos length of abc = 4
+  // assert enabled = AssertionError: outputEndNode=3 vs inputTo=2

Review comment:
       Was this describing the bug before, that is now fixed?  Maybe clarify?

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java
##########
@@ -193,14 +194,20 @@ private boolean releaseBufferedToken() {
         }
         if (inputNode.tokens.size() == 0) {
           assert inputNode.nextOut == 0;
-          assert output.nextOut == 0;
           // Hole dest nodes should never be merged since 1) we always
           // assign them to a new output position, and 2) since they never
-          // have arriving tokens they cannot be pushed:
-          assert output.inputNodes.size() == 1 : output.inputNodes.size();
-          outputFrom++;
-          inputNodes.freeBefore(output.inputNodes.get(0));
-          outputNodes.freeBefore(outputFrom);
+          // have arriving tokens they cannot be pushed. Skip them but don't free

Review comment:
       I sooooo wish we could stick little PNGs into comments in our java sources!!  Why must we be restricted to our Unicode characters only ;). Imagine if we could write a few dot (graphviz) commands and our IDEs/Emacs's automagically rendered those into beautiful graph pictures!

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java
##########
@@ -284,27 +316,27 @@ public boolean incrementToken() throws IOException {
 
         InputNode src = inputNodes.get(inputFrom);
         if (src.node == -1) {
-          // This means the "from" node of this token was never seen as a "to" node,
-          // which should only happen if we just crossed a hole.  This is a challenging
-          // case for us because we normally rely on the full dependencies expressed
-          // by the arcs to assign outgoing node IDs.  It would be better if tokens
-          // were never dropped but instead just marked deleted with a new
-          // TermDeletedAttribute (boolean valued) ... but until that future, we have

Review comment:
       We should really make progress on this :)

##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java
##########
@@ -362,6 +394,48 @@ public boolean incrementToken() throws IOException {
     }
   }
 
+  private OutputNode recoverFromHole(InputNode src, int startOffset) {
+    // This means the "from" node of this token was never seen as a "to" node,
+    // which should only happen if we just crossed a hole.  This is a challenging
+    // case for us because we normally rely on the full dependencies expressed
+    // by the arcs to assign outgoing node IDs.  It would be better if tokens
+    // were never dropped but instead just marked deleted with a new
+    // TermDeletedAttribute (boolean valued) ... but until that future, we have
+    // a hack here to forcefully jump the output node ID:
+    assert src.outputNode == -1;
+    src.node = inputFrom;
+
+    int maxOutIndex = outputNodes.getMaxPos();
+    OutputNode outSrc = outputNodes.get(maxOutIndex);
+    // There are two types of holes, neighbor holes and consumed holes. A neighbor hole is between
+    // two tokens, it looks like a->*hole*->b.
+    // A consumed hole is between the start a long token and the next token that is "under" the path

Review comment:
       Missing `of` after `start`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org