You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by LOPEZ-CORTES Mariano-ext <ma...@pole-emploi.fr> on 2018/01/29 09:27:21 UTC

Phonetic matching relevance

Hello.

We work on a search application whose main goal is to find persons by name (surname and lastname).

Query text comes from a user-entered text field. Ordering of the text is not defined (lastname-surname, surname-lastname), but
some orderings are most important than others. The ranking is :

1 Exact match
2 Inexact match (contains entered words)
3 Inexact phonetic match (contains with Beider-Morse filter French version)

In addition, Lastname+surname  is prioritized over Surname+lastname.

All words entered by user have to match (in exact or inexact way)

We have following fields :

lastNameE : WordTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory
lastName : StandardTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory
lastNameP : StandardTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory and BMF
surnameE : WordTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory
surname : StandardTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory
surnameP : StandardTokenizer, LowerCaseFilter, ASCIIFoldingFilterFactory and BMF

We use Edismax query parser and we assign higher weights to exact fields and lower to inexact fields.

However, for the phonetic matches, there are some matches closer to the query text than others. How can we boost these results ?

Thanks in advance !

Re: Phonetic matching relevance

Posted by "alessandro.benedetti" <a....@sease.io>.
when you say : "However, for the phonetic matches, there are some matches
closer to the query text than others. How can we boost these results ? "

Do you mean closer in String edit distance ?
If that is the case you could use the String distance metric implemented in
Solr with a function query :
From the wiki[1] : 

*strdist*
Calculate the distance between two strings. Uses the Lucene spell checker
StringDistance interface and supports all of the implementations available
in that package, plus allows applications to plug in their own via Solr’s
resource loading capabilities. strdist takes (string1, string2, distance
measure).

Possible values for distance measure are:

jw: Jaro-Winkler

edit: Levenstein or Edit distance

ngram: The NGramDistance, if specified, can optionally pass in the ngram
size too. Default is 2.

FQN: Fully Qualified class Name for an implementation of the StringDistance
interface. Must have a no-arg constructor.
e.g.
strdist("SOLR",id,edit)

You can add this to the edismax using a boost function ( boost parameter)
[2]

[1] https://lucene.apache.org/solr/guide/6_6/function-queries.html
[2] https://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html