You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Max Lynch <ih...@gmail.com> on 2009/07/03 18:05:13 UTC
Punctuation in Whitespace Analyzer
Hello,
I am having an issue with analyzers. Right now, when I do a search, I am
searching for a whole name. For example, if I have a document like this:
"This is the document text. John Smith is mentioned right here, he is in
the john. Smith is his last name. His full name is John Smith."
If I search this document for the phrase "John Smith" I want to get the hits
(I'm using highlighting) only for the full names without punctuation inside
of them. For example, I don't want "john. Smith" to be highlighted.
However, I DO want to get the hit for "John Smith." with a period or comma
allowed after the *last name* only.
What is the best analyzer to use for this? Or is there a different way to
approach this? Right now my whitespace analyzer won't match on the "John
Smith." case, but maybe I just throw in a few more queries to handle
punctuation at the end of the last name?
Thanks,
Max