You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/11/07 08:08:20 UTC

[GitHub] [lucene] mohamedniyaz1996 opened a new issue, #11902: Customization of Edit distance costs for different operations

mohamedniyaz1996 opened a new issue, #11902:
URL: https://github.com/apache/lucene/issues/11902

   ### Description
   
   I came across this python library [weighted-levenshtein](https://pypi.org/project/weighted-levenshtein/) which has a way to specify different costs for insertion, deletion, substitution and transposition operations. I did some research on Lucene-FuzzySearch looking for such support but couldn't find any.
   
   Can we have this feature in upcoming releases? The impact will be huge incase of using Lucene for analysing keystroke errors on search-engine where all operation costs need not be 1. More details are provided on the above python-lib documentation.
   
   Please let me know if this feature already exists in lucene so that I can close this request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #11902: Customization of Edit distance costs for different operations

Posted by GitBox <gi...@apache.org>.
rmuir commented on issue #11902:
URL: https://github.com/apache/lucene/issues/11902#issuecomment-1386981136

   this would be far too trappy, entirely too slow. use toy python libraries like the one referenced if you want to build toys, but this is a library for building search engines


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir closed issue #11902: Customization of Edit distance costs for different operations

Posted by GitBox <gi...@apache.org>.
rmuir closed issue #11902: Customization of Edit distance costs for different operations
URL: https://github.com/apache/lucene/issues/11902


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] tang-hi commented on issue #11902: Customization of Edit distance costs for different operations

Posted by GitBox <gi...@apache.org>.
tang-hi commented on issue #11902:
URL: https://github.com/apache/lucene/issues/11902#issuecomment-1379206167

   Lucene does not calculate the Levenshtein distance one by one. Instead, it precompiles the Levenshtein automaton based on your output, and then finds terms that meet the distance requirements. The state transitions of the Levenshtein automaton are also already hard-coded in the code.This is also why the maximum edit distance supported by Lucene is 2.
   I think it would be difficult to support custom distances with different operation costs, because this would involve reworking the code related to the Levenshtein automaton.
   
   If you're interested, you can check out the following resources.
   [Blog](https://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mohamedniyaz1996 commented on issue #11902: Customization of Edit distance costs for different operations

Posted by GitBox <gi...@apache.org>.
mohamedniyaz1996 commented on issue #11902:
URL: https://github.com/apache/lucene/issues/11902#issuecomment-1386830082

   @tang-hi , I agree it will be a dip in performance - but still it can be provided as a feature with a warning about performance drop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org