You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/08/09 20:08:00 UTC

[jira] [Commented] (LUCENE-9963) Flatten graph filter has errors when there are holes at beginning or end of alternate paths

    [ https://issues.apache.org/jira/browse/LUCENE-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396253#comment-17396253 ] 

ASF subversion and git services commented on LUCENE-9963:
---------------------------------------------------------

Commit 647255b4d29bb56ddcfdf44bdb6e7d5d0ca76a14 in lucene's branch refs/heads/main from Geoffrey Lawson
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=647255b ]

LUCENE-9963 Improve FlattenGraphFilter's robustness when handling incoming token graphs with holes (#157)

6 main improvements:
    1) Iterate through all output.InputNodes since dest gaps can exist.
    2) freeBefore the minimum input node instead of the first input node(which was usually, but not always, the minimum).
    3) Don't freeBefore from a hole source node. Book keeping may not be correct and could result in an early free.
    4) When adding an output node after hole recovery, calculate its new position increment instead of adding it to the end of the output graph.
    5) Nodes after holes that have edges to their source will do the output re-mapping that the deleted node would have done.
    6) If a disconnected input node swaps order with another node in the output, then map them to the same output node.

Co-authored-by: Lawson <ge...@amazon.com>

> Flatten graph filter has errors when there are holes at beginning or end of alternate paths
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9963
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9963
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: 8.8
>            Reporter: Geoffrey Lawson
>            Priority: Major
>          Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> If asserts are enabled having gaps at the beginning or end of an alternate path can result in assertion errors
> ex: 
>  
> {code:java}
> java.lang.AssertionError: 2
> at  org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
> {code}
>  
> Or
>  
> {code:java}
> java.lang.AssertionError
> at org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:191)
> {code}
>  
>  
> If asserts are not enabled these the same conditions will result in either IndexOutOfBounds Exceptions, or dropped tokens.
>  
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index -2 out of bounds for length 8
> at org.apache.lucene.util.RollingBuffer.get(RollingBuffer.java:109)
> at org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:325)
> {code}
>  
> These issues can be recreated with the following unit tests
> {code:java}
> public void testAltPathFirstStepHole() throws IOException {
>  TokenStream in = new CannedTokenStream(0, 3, new Token[]{
>  token("abc",1, 3, 0, 3),
>  token("b",1, 1, 1, 2),
>  token("c",1, 1, 2, 3)
>  });
>  TokenStream out = new FlattenGraphFilter(in);
>  assertTokenStreamContents(out,
>  new String[]{"abc", "b", "c"},
>  new int[] {0, 1, 2},
>  new int[] {3, 2, 3}, 
>  new int[] {1, 1, 1},
>  new int[] {3, 1, 1}, //token 0 may need to be len 1 after flattening
>  3);
> }{code}
> {code:java}
> public void testAltPathLastStepHole() throws IOException {
>  TokenStream in = new CannedTokenStream(0, 4, new Token[]{
>  token("abc",1, 3, 0, 3),
>  token("a",0, 1, 0, 1),
>  token("b",1, 1, 1, 2),
>  token("d",2, 1, 3, 4)
>  });
>  TokenStream out = new FlattenGraphFilter(in);
>  assertTokenStreamContents(out,
>  new String[]{"abc", "a", "b", "d"},
>  new int[] {0, 0, 1, 3},
>  new int[] {1, 1, 2, 4},
>  new int[] {1, 0, 1, 2},
>  new int[] {3, 1, 1, 1},
>  4);
> }{code}
> {code:java}
> public void testAltPathLastStepHoleWithoutEndToken() throws IOException {
>  TokenStream in = new CannedTokenStream(0, 2, new Token[]{
>  token("abc",1, 3, 0, 3),
>  token("a",0, 1, 0, 1),
>  token("b",1, 1, 1, 2)
>  });
>  TokenStream out = new FlattenGraphFilter(in);
>  assertTokenStreamContents(out,
>  new String[]{"abc", "a", "b"},
>  new int[] {0, 0, 1},
>  new int[] {1, 1, 2},
>  new int[] {1, 0, 1},
>  new int[] {1, 1, 1},
>  2);
> }{code}
> I believe Lucene-8723 is a related issue as it looks like the last token in an alternate path is being deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org