You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/08/05 18:59:27 UTC

[jira] [Created] (LUCENE-3363) minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton

minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
------------------------------------------------------------------

                 Key: LUCENE-3363
                 URL: https://issues.apache.org/jira/browse/LUCENE-3363
             Project: Lucene - Java
          Issue Type: Bug
          Components: core/other
            Reporter: Michael McCandless
             Fix For: 3.4, 4.0


Not sure what's up w/ this... if you check out the blocktree branch (LUCENE-3030) and comment out the @Ignore in TestTermsEnum2.testFiniteVersusInfinite then this should hit OOME: {[ant test-core -Dtestcase=TestTermsEnum2 -Dtestmethod=testFiniteVersusInfinite -Dtests.seed=-2577608857970454726:-2463580050179334504}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3363) minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3363:
--------------------------------

    Attachment: LUCENE-3363.patch

patch for the blocktree branch:

The issue with this automaton:
* its already minimal! so all of this work is stupid.
* it only has 2096 states, but sigma=3544, this is what causes the hopcroft blowup.

Originally, AutomatonQuery minimized the automaton in its ctor, but I'm not sure we should do this, the input automaton could be large and if someone wants to do this, they should do it themselves? 

I think my original motivation was to try to fend off any adversaries (some crazy worst-case crap that would make the query slow)... but I think this is obselete.

The patch changes this to determinize() + removeDeadTransitions() + reduce(), the first 2 operations really being all we need, but reduce() might help speed up the intersection.

Note: RegExp already minimizes "incrementally" during its parsing, but this is one op at a time, so I think there is no problem here. I tested removing this too and replacing it with det + removeDead + reduce, but it slowed down regex parsing considerably, so I think we should continue to use minimize here.

Additionally, I optimized wildcardquery here to use the optimized concatenate() to avoid useless determinize() calls when the LHS is a string, before it was using the concatenate(List) method.

> minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
> ------------------------------------------------------------------
>
>                 Key: LUCENE-3363
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3363
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/other
>            Reporter: Michael McCandless
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3363.patch
>
>
> Not sure what's up w/ this... if you check out the blocktree branch (LUCENE-3030) and comment out the @Ignore in TestTermsEnum2.testFiniteVersusInfinite then this should hit OOME: {[ant test-core -Dtestcase=TestTermsEnum2 -Dtestmethod=testFiniteVersusInfinite -Dtests.seed=-2577608857970454726:-2463580050179334504}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3363) minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3363:
---------------------------------------

    Fix Version/s:     (was: 3.4)
                   3.5

> minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
> ------------------------------------------------------------------
>
>                 Key: LUCENE-3363
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3363
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/other
>            Reporter: Michael McCandless
>             Fix For: 3.5, 4.0
>
>         Attachments: LUCENE-3363.patch
>
>
> Not sure what's up w/ this... if you check out the blocktree branch (LUCENE-3030) and comment out the @Ignore in TestTermsEnum2.testFiniteVersusInfinite then this should hit OOME: {[ant test-core -Dtestcase=TestTermsEnum2 -Dtestmethod=testFiniteVersusInfinite -Dtests.seed=-2577608857970454726:-2463580050179334504}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3363) minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080358#comment-13080358 ] 

Uwe Schindler commented on LUCENE-3363:
---------------------------------------

I have no idea, too. I hope this is not caused by minimizeSchindler changes and is the orginal minimizeHopcroft. Have you tried to undo the relevant commit? It still looks strange, as there must be something totally weird going on. I am away from computer this weekend, can look into it on Monday.

> minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
> ------------------------------------------------------------------
>
>                 Key: LUCENE-3363
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3363
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/other
>            Reporter: Michael McCandless
>             Fix For: 3.4, 4.0
>
>
> Not sure what's up w/ this... if you check out the blocktree branch (LUCENE-3030) and comment out the @Ignore in TestTermsEnum2.testFiniteVersusInfinite then this should hit OOME: {[ant test-core -Dtestcase=TestTermsEnum2 -Dtestmethod=testFiniteVersusInfinite -Dtests.seed=-2577608857970454726:-2463580050179334504}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3363) minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton

Posted by "Robert Muir (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-3363.
---------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 3.5)
    
> minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
> ------------------------------------------------------------------
>
>                 Key: LUCENE-3363
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3363
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/other
>            Reporter: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-3363.patch
>
>
> Not sure what's up w/ this... if you check out the blocktree branch (LUCENE-3030) and comment out the @Ignore in TestTermsEnum2.testFiniteVersusInfinite then this should hit OOME: {[ant test-core -Dtestcase=TestTermsEnum2 -Dtestmethod=testFiniteVersusInfinite -Dtests.seed=-2577608857970454726:-2463580050179334504}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3363) minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080985#comment-13080985 ] 

Robert Muir commented on LUCENE-3363:
-------------------------------------

by the way, for whatever reason the seed never OOM'ed for me, but:
before:

{noformat}
junit-sequential:
    [junit] Testsuite: org.apache.lucene.index.TestTermsEnum2
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 211.474 sec
    [junit] 
{noformat}

after:
{noformat}
junit-sequential:
    [junit] Testsuite: org.apache.lucene.index.TestTermsEnum2
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.856 sec
    [junit] 
{noformat}

> minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
> ------------------------------------------------------------------
>
>                 Key: LUCENE-3363
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3363
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/other
>            Reporter: Michael McCandless
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3363.patch
>
>
> Not sure what's up w/ this... if you check out the blocktree branch (LUCENE-3030) and comment out the @Ignore in TestTermsEnum2.testFiniteVersusInfinite then this should hit OOME: {[ant test-core -Dtestcase=TestTermsEnum2 -Dtestmethod=testFiniteVersusInfinite -Dtests.seed=-2577608857970454726:-2463580050179334504}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3363) minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081005#comment-13081005 ] 

Robert Muir commented on LUCENE-3363:
-------------------------------------

here is the quote from the original hopcroft paper, explaining why this crazy test that uses lots of random unicode strings causes the blowup:

An algorithm is given for minimizing the number of states in a finite automaton or for
determining if two finite automata are equivalent. The asymptotic running time of the
algorithm is bounded by *knlogn* where k is some constant and n is the number of
states. The constant *k depends linearly on the size of the input alphabet*.



> minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
> ------------------------------------------------------------------
>
>                 Key: LUCENE-3363
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3363
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/other
>            Reporter: Michael McCandless
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3363.patch
>
>
> Not sure what's up w/ this... if you check out the blocktree branch (LUCENE-3030) and comment out the @Ignore in TestTermsEnum2.testFiniteVersusInfinite then this should hit OOME: {[ant test-core -Dtestcase=TestTermsEnum2 -Dtestmethod=testFiniteVersusInfinite -Dtests.seed=-2577608857970454726:-2463580050179334504}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org