You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Romiko Derbynew <Ro...@readify.onmicrosoft.com> on 2011/11/22 23:21:36 UTC

Fuzzy Search Sorting

Hi Guys,

I am using Lucene with neo4j database.

Currently if I do a fuzzy search via a rest call using the Query API with this data

GivenName: John
FamilyName: Smith

GivenName: Bob
FamilyName: Smith


GivenName: Adam
FamilyName: Smith

GivenName: Bill
FamilyName: Smath

If I query the index like this +(FamilyName:smith~)

The query results shows Smath on the top
e.g.
Smath, Bill
Smith, Adam
Smith, Bob
Smith, John

I thought lucene would automatically sort the fuzzy search result with most relevant on the top, why in this case is Smath on the top, it should be on the bottom, no?

Also, if I have two index keys (FamilyName, GivenName), and I search like this:
+(FamilyName:smith~^8 GivenName:smith~^2)

And I have this data set
GivenName: John
FamilyName: Smith

GivenName: Bob
FamilyName: Smith

GivenName: Adam
FamilyName: Smith

GivenName: Bill
FamilyName: Smath

GivenName: Smith
FamilyName: Harry

GivenName: Smath
FamilyName: Sally

I would want the result to first sort by Highest match and then by booster priority
Adam Smith
Bob Smith
John Smith
Smith Harry
Bill Smath
Smath Sally

How, can I achieve this with the Query Parser (http://lucene.apache.org/java/3_1_0/queryparsersyntax.html)

Much Appreciated.
Romiko

Re: Fuzzy Search Sorting

Posted by Ian Lea <ia...@gmail.com>.

You'll have to delve in to the output from IndexSearcher.explain, or
the details of the Levenshtein (edit distance) algorithm used by
FuzzyQuery to figure out why Smath is beating Smith.  But the general
way of making sure that exact matches come top is to add an exact
match clause to your query,

FamilyName:smith^16 GivenName:smith^4 FamilyName:smith~^8 GivenName:smith~^2

You'll want to play with the boosts.  Also be aware that in lucene
3.x, Fuzzy queries can be slow.


--
Ian.

On Tue, Nov 22, 2011 at 10:21 PM, Romiko Derbynew
<Ro...@readify.onmicrosoft.com> wrote:
> Hi Guys,
>
> I am using Lucene with neo4j database.
>
> Currently if I do a fuzzy search via a rest call using the Query API with this data
>
> GivenName: John
> FamilyName: Smith
>
> GivenName: Bob
> FamilyName: Smith
>
>
> GivenName: Adam
> FamilyName: Smith
>
> GivenName: Bill
> FamilyName: Smath
>
> If I query the index like this +(FamilyName:smith~)
>
> The query results shows Smath on the top
> e.g.
> Smath, Bill
> Smith, Adam
> Smith, Bob
> Smith, John
>
> I thought lucene would automatically sort the fuzzy search result with most relevant on the top, why in this case is Smath on the top, it should be on the bottom, no?
>
> Also, if I have two index keys (FamilyName, GivenName), and I search like this:
> +(FamilyName:smith~^8 GivenName:smith~^2)
>
> And I have this data set
> GivenName: John
> FamilyName: Smith
>
> GivenName: Bob
> FamilyName: Smith
>
> GivenName: Adam
> FamilyName: Smith
>
> GivenName: Bill
> FamilyName: Smath
>
> GivenName: Smith
> FamilyName: Harry
>
> GivenName: Smath
> FamilyName: Sally
>
> I would want the result to first sort by Highest match and then by booster priority
> Adam Smith
> Bob Smith
> John Smith
> Smith Harry
> Bill Smath
> Smath Sally
>
> How, can I achieve this with the Query Parser (http://lucene.apache.org/java/3_1_0/queryparsersyntax.html)
>
> Much Appreciated.
> Romiko
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org