You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Yossi Vainshtein <yo...@gmail.com> on 2015/02/18 13:33:29 UTC

Lucene fuzzy and wildcard search, and scoring in AutomatonQuery

Hi all,

I'm using Apache Lucene and currently trying to combine Fuzzy and Prefix
(or Wildcard) query to implement a kind of suggestion mechanism.

For example, if the query is "levy", a document containing "Levinshtein" should
also be returned.

As there seems no builtin query of this sort in Lucene, I've searched for
solutions, this issue has been asked about. I used the approach suggested
here
http://stackoverflow.com/questions/28565090/scoring-results-of-automatonquery
<http://stackoverflow.com/questions/2631206/lucene-query-bla-match-words-that-start-with-something-fuzzy-how>
by
Robert Muir, that creates the query as a concatenation of two Automata
(Levinshtein and Wildcard).

That works great indeed, but, now the thing is that there's no scoring. All
results get result of *1.0*. I really want "Levy" to be ranked higher then
"Levninshtein" in the previous example.

By the way, I tried using Lucene auto-suggestion in the form of
FuzzySuggester, but it's not feasible with large inputs, it holds all
suggestion in RAM and bloats the memory usage.

Is there another way of doing this? Or I should implement my own *Scorer*
 or *Similarity*?


Thanks

Yossi