You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2012/05/02 20:48:50 UTC

[jira] [Updated] (LUCENE-4024) FuzzyQuery should never do edit distance > 2

     [ https://issues.apache.org/jira/browse/LUCENE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-4024:
--------------------------------

    Attachment: LUCENE-4024.patch

I agree: this crazy floating point specification of distance is hairy to be compatible with 3.x

But i think this is all a huge trap, attached is a patch that:
* removes slow capability from FuzzyTermsEnum
* Cleans up FuzzyQuery: removes float-ctors, allows transpositions as primitive edits, etc.
* adds a deprecated SlowFuzzyQuery to sandbox/ that has the old ctors
* adds a deprecated SlowFuzzyTermsEnum that it uses, which extends FuzzyTermsEnum and adds slowness.

I added a helper static method (deprecated) to FuzzyQuery that converts from the old float sim stuff to number of edits, but ceilinged at what automata support (this is used to easily cut over queryparsers).

All tests pass but patch needs javadocs. Especially I think we should adjust the query syntax and mark the old ~0.xxx stuff as deprecated, since qps can already do do ~1 ~2 now. Then we can really cleanup for 5.0

P.S. patch is huge since i didnt use SVN adds/removes, but makes it easy to apply.
                
> FuzzyQuery should never do edit distance > 2
> --------------------------------------------
>
>                 Key: LUCENE-4024
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4024
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-4024.patch
>
>
> Edit distance 1 and 2 are now very very fast compared to 3.x (100X-200X faster) ... but edit distance 3 will fallback to the super-slow scan all terms in 3.x, which is not graceful degradation.
> Not sure how to fix it ... mabye we have a SlowFuzzyQuery?  And FuzzyQuery throws exc if you try to ask it to be slow?  Or, we add boolean (off by default) that you must turn on to allow slow one..?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org