You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Andras Salamon (Jira)" <ji...@apache.org> on 2020/03/24 10:43:00 UTC

[jira] [Updated] (SOLR-14360) Speed up Levenshtein distance calculation when we don't need the exact distance

     [ https://issues.apache.org/jira/browse/SOLR-14360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andras Salamon updated SOLR-14360:
----------------------------------
    Status: Patch Available  (was: Open)

> Speed up Levenshtein distance calculation when we don't need the exact distance
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-14360
>                 URL: https://issues.apache.org/jira/browse/SOLR-14360
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Andras Salamon
>            Priority: Minor
>         Attachments: SOLR-14360-01.patch
>
>
> Sometimes when we calculate the Levenshtein distance we don't need the exact distance, we only want to know if the strings are similar enough.
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SolrSpellChecker.java#L113-L114]
> {noformat}
> sug.score = sd.getDistance(original, sug.string);        
> if (sug.score < min) continue; {noformat}
> If we use this threshold in the distance calculation, we can speed it up, we can stop the calculation when we already know that the the the distance will be lower than the threshold.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org