You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Max Lynch <ih...@gmail.com> on 2009/07/03 18:05:13 UTC

Punctuation in Whitespace Analyzer

Hello,
I am having an issue with analyzers.  Right now, when I do a search, I am
searching for a whole name.  For example, if I have a document like this:

"This is the document text.  John Smith is mentioned right here, he is in
the john.  Smith is his last name.  His full name is John Smith."

If I search this document for the phrase "John Smith" I want to get the hits
(I'm using highlighting) only for the full names without punctuation inside
of them.  For example, I don't want "john. Smith" to be highlighted.
However, I DO want to get the hit for "John Smith." with a period or comma
allowed after the *last name* only.

What is the best analyzer to use for this?  Or is there a different way to
approach this?  Right now my whitespace analyzer won't match on the "John
Smith." case, but maybe I just throw in a few more queries to handle
punctuation at the end of the last name?

Thanks,
Max