You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "James Dyer (JIRA)" <ji...@apache.org> on 2011/06/02 18:54:47 UTC

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

This patch uses a PriorityQueue instead of a sorted List to store the RankedSpellPossibility objects.  I also went with far simpler logic in safeguarding the performance:  this version simply quits at 10,000 elements.  I did this because:

1. With a PriorityQueue, there is no simple way to get the 100th element and find its rank to determine whether or not to add subsequent elements.
2. With the simpler logic, there is no need to keep calling "currentTimeMillis()" as a final fallback (in itself a performance hog).
3. It is highly unlikely a competitive spellcheck collation will ever be found past the 10,000 combination.

In all, this is a more elegant solution than the prior one.

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list of *every* possible correction combination.  But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate" is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app.  This URL results in a search with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org