You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "James Dyer (JIRA)" <ji...@apache.org> on 2011/04/08 17:39:05 UTC

[jira] [Created] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Using spellcheck.collate can result in extremely high memory usage
------------------------------------------------------------------

Key: SOLR-2462
URL: https://issues.apache.org/jira/browse/SOLR-2462
Project: Solr
Issue Type: Bug
Components: spellchecker
Affects Versions: 3.1, 4.0
Reporter: James Dyer
Priority: Critical

When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination. But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.

This bug was introduced with SOLR-2010. However, it is triggered anytime "spellcheck.collate" is used. It is not necessary to use any features that were added with SOLR-2010.

We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops. It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app. This URL results in a search with ~12 misspelled words. We have "spellcheck.count" set to 15.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated SOLR-2462:
------------------------------

    Affects Version/s:     (was: 4.0)
        Fix Version/s: 4.0
                       3.1.1

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043981#comment-13043981 ] 

Robert Muir commented on SOLR-2462:
-----------------------------------

great! So what about the 50ms time? can we eliminate this now?

it seems unrelated to this issue (memory usage), as the memory usage is now constant... 

I think I would prefer if we want to have a time-based limit that we do this independently (and in a way where it can be configured)

Other than this, I think its ready to commit!

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043100#comment-13043100 ] 

Robert Muir commented on SOLR-2462:
-----------------------------------

James sorry for my short-barely-comprehensible response... in addition what i should have said is that if poll() returns the best suggestion, you need to reverse the comparator for this to work :)

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043058#comment-13043058 ] 

Robert Muir commented on SOLR-2462:
-----------------------------------

this the general workflow

1. check if competitive
2. offer
3. if > size, poll.

there are more examples than mine in directspellchecker, we use this in lots of places.

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

This sets the maximum limit to 1000 possibilities.  When this limit is reached, the list is sorted by rank then reduced to the top 100.  From then on, only collations with a rank equal or better than the 100th are added.  This process repeats until finished or until it has taken 50ms, at which time it quits.

I also added a "maxTimeAllowed" setting of 50ms to the collation test queries as an additional performance safeguard.

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1, 4.0
>            Reporter: James Dyer
>            Priority: Critical
>         Attachments: SOLR-2462.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

I named the new parameter "spellcheck.maxCollationEvaluations" and gave it a default of 10,000.  Its very unlikely that a competitive combination will occur past there and as you requested, it is user-configurable should a different limit be desired. 

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Shawn Heisey (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shawn Heisey updated SOLR-2462:
-------------------------------

    Attachment: SOLR-2462_3_1.patch

The original patch would not apply cleanly for me against 3.1 without fuzz and whitespace options, and when those are used, it applies incorrectly.  Here's a new patch specific to 3.1.  Before creating this, I checked 3.1 out from SVN and then applied the patch for SOLR-2469, which should not interfere in any way.

Hopefully the patch is suitable.  I am only putting it up here for convenience, in case anyone else runs into this.

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052623#comment-13052623 ] 

James Dyer commented on SOLR-2462:
----------------------------------

Peter,

I reviewed Robert's commits (r1132730 to branch_3x ; r1132729 to trunk), and they appear to match the 06/Jun/11 15:10 version of the patch.  I looked mostly at the change in TestSpellCheckResponse.java, which is the last tweak that was made.  Keep in mind there are a few things that were committed that aren't in the patch (changes.txt, etc).  Did you have other specific discrepancies in mind?

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Assignee: Robert Muir
>            Priority: Critical
>             Fix For: 3.3, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043913#comment-13043913 ] 

Robert Muir commented on SOLR-2462:
-----------------------------------

Hi, this is looking much better!

To eliminate your last concerns, I think what is needed is a tie-breaker in RankedSpellPossibility's comparator, ideally something like a sequential identifier.

This way, when the PQ fills, and the lowest score is 20, then subsequent items that also have a score of 20 will not be competitive, and rejected by the competitive check... remember peek() is constant time :)

This is the way Lucene collectors work (tiebreaker is lucene docid), and the way e.g. FuzzyQuery/FuzzyLikeThisQuery works (tiebreaker is term text)


> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Mitsu Hadeishi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058836#comment-13058836 ] 

Mitsu Hadeishi commented on SOLR-2462:
--------------------------------------

Oh now you tell us. :) Well, we already built the patched 3.2 so we're going with that for now :)

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Assignee: Robert Muir
>            Priority: Critical
>             Fix For: 3.3, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Resolved] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved SOLR-2462.
-------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 3.1.1)
                   3.3

Committed revision 1132729 (trunk), 1132730 (branch_3x)

Thanks James!

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Assignee: Robert Muir
>            Priority: Critical
>             Fix For: 3.3, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045028#comment-13045028 ] 

Robert Muir commented on SOLR-2462:
-----------------------------------

Thanks for the explanation and updated patch James... I'll test this out shortly!

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Assignee: Robert Muir
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042675#comment-13042675 ] 

Robert Muir commented on SOLR-2462:
-----------------------------------

Hi James, this sounds like an important issue to fix!

I'm sorry you have all these open improvements to the spellchecker, lets see if we can try to get some of them resolved.

{quote}
This sets the maximum limit to 1000 possibilities. When this limit is reached, the list is sorted by rank then reduced to the top 100. From then on, only collations with a rank equal or better than the 100th are added. This process repeats until finished or until it has taken 50ms, at which time it quits.
{quote}

Maybe we want to use a priority queue instead? It sorta seems like this is what you are doing.


> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Assigned] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reassigned SOLR-2462:
---------------------------------

    Assignee: Robert Muir

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Assignee: Robert Muir
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

I guess I should have run that one myself too.  This test is very similar to the ones in SpellCheckCollatorTest.  I guess while the ones in SCCT test whether or not it can collate properly, TSCR checks that the response it sends back is proper.

In any case, this is just another one of my brittle tests!  Because we're using a different comparator, results with tied scores don't come back exactly the same as before.  So now this test needs more than 5 tries to find the 2nd valid collation.  I up'ed it from 5 to 10 and now it passes.

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Assignee: Robert Muir
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057486#comment-13057486 ] 

Simon Willnauer commented on SOLR-2462:
---------------------------------------

bq. We just ran into this bug when we upgraded to 3.2
3.3 should be released in the next two days which has a fix for this. So maybe you just check the mailinglist for the release mail tomorrow or the day after!

simon

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Assignee: Robert Muir
>            Priority: Critical
>             Fix For: 3.3, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Peter Wolanin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052601#comment-13052601 ] 

Peter Wolanin commented on SOLR-2462:
-------------------------------------

I generated a patch for 3.2 looking at the commit on branch_3x.  It looks somewhat different from the last patch by James.

I also just compared the trunk commit to the last patch and it doesn't match https://issues.apache.org/jira/secure/attachment/12481574/SOLR-2462.patch  

Did the wrong patch get committed, or was the final patch just never get posted to this issue before commit?

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Assignee: Robert Muir
>            Priority: Critical
>             Fix For: 3.3, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

This is along the lines of what I initially intended on doing but didn't have time back when I first submitted this.

I felt particularly guilty in gathering all these RankedSpellPossibility objects in cases where the user isn't even using new functionality from SOLR-2010 (upgrade from 1.4 then collate becomes more expensive!).  

Thank you for another opportunity to absolve my guilt.

I ran these tests and they all pass:  SpellPossibilityIteratorTest, SpellCheckCollatorTest, SpellCheckComponentTest & DistributedSpellCheckComponentTest

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

Here's another patch.  This time PossibilityIterator is guaranteed not to save/return more than the # of collations the user requested with "maxCollationTries".

Changing this also invalidated some of the tests in SpellCheckCollatorTest.java .  My research indicates this is because many of the possibilities end up with the same score so this is not indicative of a new bug.  I changed the test to be less brittle in this regard.

While I generally like both of these last two patches, I am still unsure of the wisdom of this last change.  It is true this last change ensures we never will store more Collations than the app might possibly use.  On the other hand, the Collations ought to enter the PQ somewhat sorted already.  Having it churn in/out all of the low-ranking ones introduces a lot of extra add/remove operations for the common cases in return for saving a bit of memory in the more rare cases.

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042940#comment-13042940 ] 

Robert Muir commented on SOLR-2462:
-----------------------------------

Now that we have a PQ (thanks for iterating here!), maybe we should just bound the size by some value (say 20, or whatever the top-N the user requested is).

Then, we only add competitive possibilities always once the size reaches 20.
(this is how the spellchecker link i provided works, see the line "// possibly drop entries from queue")



> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

This patch uses a PriorityQueue instead of a sorted List to store the RankedSpellPossibility objects.  I also went with far simpler logic in safeguarding the performance:  this version simply quits at 10,000 elements.  I did this because:

1. With a PriorityQueue, there is no simple way to get the 100th element and find its rank to determine whether or not to add subsequent elements.
2. With the simpler logic, there is no need to keep calling "currentTimeMillis()" as a final fallback (in itself a performance hog).
3. It is highly unlikely a competitive spellcheck collation will ever be found past the 10,000 combination.

In all, this is a more elegant solution than the prior one.

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044125#comment-13044125 ] 

Robert Muir commented on SOLR-2462:
-----------------------------------

Hi James, when applying the latest patch, I noticed a test fail:
{noformat}
    [junit] Testsuite: org.apache.solr.client.solrj.response.TestSpellCheckResponse
    [junit] Testcase: testSpellCheckCollationResponse(org.apache.solr.client.solrj.response.TestSpellCheckResponse):	FAILED
    [junit] 
    [junit] junit.framework.AssertionFailedError: 
    [junit] 	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348)
    [junit] 	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1266)
    [junit] 	at org.apache.solr.client.solrj.response.TestSpellCheckResponse.testSpellCheckCollationResponse(TestSpellCheckResponse.java:153)
{noformat}

This seemed odd... maybe a comparator is off somewhere?

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Assignee: Robert Muir
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Mitsu Hadeishi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057483#comment-13057483 ] 

Mitsu Hadeishi commented on SOLR-2462:
--------------------------------------

We just ran into this bug when we upgraded to 3.2, and suddenly SOLR was blowing up as soon as we built the spellcheck dictionary. I attempted to apply the patch to the 3.2 source code tgz file downloadable from http://www.apache.org/dyn/closer.cgi/lucene/solr, but it didn't apply cleanly. I manually applied the patch, as we're using the released version of 3.2.

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Assignee: Robert Muir
>            Priority: Critical
>             Fix For: 3.3, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042885#comment-13042885 ] 

Robert Muir commented on SOLR-2462:
-----------------------------------

bq. 1. With a PriorityQueue, there is no simple way to get the 100th element and find its rank to determine whether or not to add subsequent elements.

Hi James, you might consider using peek() here to check

as an example, you can take a look at the main loop of DirectSpellChecker: suggestSimilar(Term, int, IndexReader, int, int, float, CharsRef)
http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/suggest/src/java/org/apache/lucene/search/spell/DirectSpellChecker.java

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043991#comment-13043991 ] 

James Dyer commented on SOLR-2462:
----------------------------------

Yeah, the I agree the time limit is a bit of a hack.  On the other hand, the list of possibilities it needs to evaluate can get really long really fast.  If you're returning 15 or 20 suggestions per word and the user misspells 10 or so words, you get a pretty big list of combinations (in our case users were pasting the URL in the search box generating a query with 12 "misspelled" words...)  Then again, this latest version is much faster than what I had put out there originally...

Maybe we can just put a hard limit on the number of possibilities it will evaluate?  It could be really high like a million or something.  We could make it a configurable parameter, something like "spellcheck.maxCollationPossibilitiesToEval" , but then again that seems silly.  Who would really change it if a million was the default ?

At the end of the day, I'd feel better where I am at if Solr had some kind of secondary fallback here.  One thing that really made me nervous about our previous search engine is it wasn't terribly hard to send a query over to it that would crash the thing or make it churn a long time just to return nothing.  So far my experience is that Solr is less prone to this kind of failure and I'd really like to keep it that way...

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045082#comment-13045082 ] 

James Dyer commented on SOLR-2462:
----------------------------------

I added "spellcheck.maxCollationEvaluations" to the wiki.  Thanks, Robert for taking time helping get this fixed!

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Assignee: Robert Muir
>            Priority: Critical
>             Fix For: 3.3, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

This version reduces churn in the PriorityQueue.  Rather than add a tie-breaker variable, I changed the rank comparison from > to >= ... This made SpellCheckCollatorTest.testCollateWithFilter() do 11 Adds and 1 Remove instead of 13 Adds and 3 Removes.  

The 4 spell-check-related tests I mentioned before still pass.  

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042988#comment-13042988 ] 

Robert Muir commented on SOLR-2462:
-----------------------------------

OK, this is looking better, but we can now keep the PQ at a fixed size, pretend for this example that maximumRequiredSuggestions = 20.

the idea is after your offer(), you check if the pq's size is > maximumRequiredSuggestions (say its 21), and then if so, you poll() to remove the lowest element (its now 20 again)

this way, the size never grows past 20... and i then think you can then remove the rankedPossibilities < 10000 check because it will never get larger than the user's requested amount.


> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

Right...because the elements are sorted already I don't have to go back to the 100th element to compare.  I can just look at the last element using peek() as you suggest.

This version uses the more sophisticated methods of the original patch but accomplishes it with less code.  Also, we're using nanoTime() instead of currentTimeMillis() to reduce any overhead, and are checking the clock only once every 10000 iterations.

>From code comments:

Three performance & memory-usage safeguards:
  1. Quit if the RankedPossibilities queue grows larger than 10000.
  2. If the RankedPossibilities queue is bigger than 1000, only add competitive possibilities.
  3. Check the clock periodically to be sure we haven't taken more than 50ms.  If so, quit immediately.
		

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043018#comment-13043018 ] 

James Dyer commented on SOLR-2462:
----------------------------------

Robert,

I'm not sure we can do this last suggestion.  If as the queue grows we poll(), we will in fact be discarding all of the best (lowest-ranked) suggestions.  What we would need to do is insert the 21st element, letting it fall into its place in order, and then we'd need an operation to remove the _worst_ suggestion from the tail of the queue.  Looking at PriorityQueue's API docs I'm not sure there is a method exposed that does that.

Did I miss something here?

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043992#comment-13043992 ] 

Robert Muir commented on SOLR-2462:
-----------------------------------

{quote}
Maybe we can just put a hard limit on the number of possibilities it will evaluate? It could be really high like a million or something. We could make it a configurable parameter, something like "spellcheck.maxCollationPossibilitiesToEval" , but then again that seems silly. Who would really change it if a million was the default ?
{quote}

Well, I think this sounds much better than being time-based? And you know, use your best judgement as a default, definitely I'm ok with it as long as its configurable and has good defaults.


> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Posted by "James Dyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

fixes a silly error in the last patch.

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org