You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by adepue <an...@marathon-man.com> on 2008/04/15 21:04:31 UTC

Looking for duplicate names

I'm new to Lucene, and would like to use it to find duplicate names in a
contact list.  Is Lucene a good fit?
We have a form where a user enters a company or person's name, and we want
the system to warn them if there is already a company or person entered with
the same or very similar name.
Based on the little I know of Lucene, I'm thinking an NGram algorithm (based
on characters, not words) would work best... but, I'm not sure if Lucene
takes proximity or edit distances into account?  For example, say you have
these two names:
  Andrew John
  John Andrew

If a user enters Andy John, without proximity or edit distance, these two
names will match about the same, while, obviously, the first name should be
ranked higher.
Thanks in advance for any help or advice.
-- 
View this message in context: http://www.nabble.com/Looking-for-duplicate-names-tp16705188p16705188.html
Sent from the Lucene - General mailing list archive at Nabble.com.