You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Romiko Derbynew <Ro...@readify.onmicrosoft.com> on 2011/11/22 23:21:36 UTC
Fuzzy Search Sorting
Hi Guys,
I am using Lucene with neo4j database.
Currently if I do a fuzzy search via a rest call using the Query API with this data
GivenName: John
FamilyName: Smith
GivenName: Bob
FamilyName: Smith
GivenName: Adam
FamilyName: Smith
GivenName: Bill
FamilyName: Smath
If I query the index like this +(FamilyName:smith~)
The query results shows Smath on the top
e.g.
Smath, Bill
Smith, Adam
Smith, Bob
Smith, John
I thought lucene would automatically sort the fuzzy search result with most relevant on the top, why in this case is Smath on the top, it should be on the bottom, no?
Also, if I have two index keys (FamilyName, GivenName), and I search like this:
+(FamilyName:smith~^8 GivenName:smith~^2)
And I have this data set
GivenName: John
FamilyName: Smith
GivenName: Bob
FamilyName: Smith
GivenName: Adam
FamilyName: Smith
GivenName: Bill
FamilyName: Smath
GivenName: Smith
FamilyName: Harry
GivenName: Smath
FamilyName: Sally
I would want the result to first sort by Highest match and then by booster priority
Adam Smith
Bob Smith
John Smith
Smith Harry
Bill Smath
Smath Sally
How, can I achieve this with the Query Parser (http://lucene.apache.org/java/3_1_0/queryparsersyntax.html)
Much Appreciated.
Romiko
Re: Fuzzy Search Sorting
Posted by Ian Lea <ia...@gmail.com>.
You'll have to delve in to the output from IndexSearcher.explain, or
the details of the Levenshtein (edit distance) algorithm used by
FuzzyQuery to figure out why Smath is beating Smith. But the general
way of making sure that exact matches come top is to add an exact
match clause to your query,
FamilyName:smith^16 GivenName:smith^4 FamilyName:smith~^8 GivenName:smith~^2
You'll want to play with the boosts. Also be aware that in lucene
3.x, Fuzzy queries can be slow.
--
Ian.
On Tue, Nov 22, 2011 at 10:21 PM, Romiko Derbynew
<Ro...@readify.onmicrosoft.com> wrote:
> Hi Guys,
>
> I am using Lucene with neo4j database.
>
> Currently if I do a fuzzy search via a rest call using the Query API with this data
>
> GivenName: John
> FamilyName: Smith
>
> GivenName: Bob
> FamilyName: Smith
>
>
> GivenName: Adam
> FamilyName: Smith
>
> GivenName: Bill
> FamilyName: Smath
>
> If I query the index like this +(FamilyName:smith~)
>
> The query results shows Smath on the top
> e.g.
> Smath, Bill
> Smith, Adam
> Smith, Bob
> Smith, John
>
> I thought lucene would automatically sort the fuzzy search result with most relevant on the top, why in this case is Smath on the top, it should be on the bottom, no?
>
> Also, if I have two index keys (FamilyName, GivenName), and I search like this:
> +(FamilyName:smith~^8 GivenName:smith~^2)
>
> And I have this data set
> GivenName: John
> FamilyName: Smith
>
> GivenName: Bob
> FamilyName: Smith
>
> GivenName: Adam
> FamilyName: Smith
>
> GivenName: Bill
> FamilyName: Smath
>
> GivenName: Smith
> FamilyName: Harry
>
> GivenName: Smath
> FamilyName: Sally
>
> I would want the result to first sort by Highest match and then by booster priority
> Adam Smith
> Bob Smith
> John Smith
> Smith Harry
> Bill Smath
> Smath Sally
>
> How, can I achieve this with the Query Parser (http://lucene.apache.org/java/3_1_0/queryparsersyntax.html)
>
> Much Appreciated.
> Romiko
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org