You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Herman Kiefus <he...@angieslist.com> on 2011/10/19 17:30:42 UTC

stemEnglishPossessive and contractions

We utilize a comprehensive dictionary of English words, place names, surnames, male and female first names, ... you get the point.  As such, the possessive plural forms of these words are recognized as 'misspelled'.

I simply thought that 'turning on' this option for the WordDelimiterFactory would address my concerns; however, I also got an unintended consequence: Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be affected.  Is this intended behavior?  When I read 'English possessive' I hear 'apostrophe s' and not 'apostrophe anything'.  Is there something I'm missing here?

RE: stemEnglishPossessive and contractions

Posted by donato <dd...@outlook.com>.
Hi Herman,

I just noticed your post on possessives and I am having the same problem.
With Sr. Patrick's Day coming up, people are searching our site for
"patrick" and patrick's" yet they are yielding different results. If we
search for "patrick" and patricks" they yield the same results. I want all
three to yield the same results. 

Here is my schema file  CLICK HERE <https://we.tl/dyWxmkLLU4>  . Am I
missing something? Do I have the order wrong? Are they in the wrong place?

Thank you in advance. I am not too familiar with this stuff as of yet...

Cheers.



--
View this message in context: http://lucene.472066.n3.nabble.com/stemEnglishPossessive-and-contractions-tp3434657p4325808.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: stemEnglishPossessive and contractions

Posted by Herman Kiefus <he...@angieslist.com>.
Thanks Robert, exactly what I was looking for.

-----Original Message-----
From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Wednesday, October 19, 2011 1:15 PM
To: solr-user@lucene.apache.org
Subject: Re: stemEnglishPossessive and contractions

The word delimiter filter also does other things, it treats ' as punctuation by default. So it normally splits on ', except if its 's (in this case it removes the 's completely if you use this stemEnglishPossessive).

There are a couple approaches you can use:
1. you can keep worddelimiterfilter with this option on, but disabling splitting on ' by customize its type table. in this case specify types=mycustomtypes.txt, and in that file specify ' to be treated as ALPHANUM or similar. see
https://issues.apache.org/jira/browse/SOLR-2059 for some examples of this. i would only do this if you want worddelimiterfilter for other purposes, if you just want to remove possessives and don't need worddelimiterfilter's other features, look below.
2. you can instead use EnglishPossessiveFilterFactory, which only does this exact thing (remove 's) and nothing else.

On Wed, Oct 19, 2011 at 5:30 PM, Herman Kiefus <he...@angieslist.com> wrote:
> We utilize a comprehensive dictionary of English words, place names, surnames, male and female first names, ... you get the point.  As such, the possessive plural forms of these words are recognized as 'misspelled'.
>
> I simply thought that 'turning on' this option for the WordDelimiterFactory would address my concerns; however, I also got an unintended consequence: Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be affected.  Is this intended behavior?  When I read 'English possessive' I hear 'apostrophe s' and not 'apostrophe anything'.  Is there something I'm missing here?
>



--
lucidimagination.com

Re: stemEnglishPossessive and contractions

Posted by Robert Muir <rc...@gmail.com>.
The word delimiter filter also does other things, it treats ' as
punctuation by default. So it normally splits on ', except if its 's
(in this case it removes the 's completely if you use this
stemEnglishPossessive).

There are a couple approaches you can use:
1. you can keep worddelimiterfilter with this option on, but disabling
splitting on ' by customize its type table. in this case specify
types=mycustomtypes.txt, and in that file specify ' to be treated as
ALPHANUM or similar. see
https://issues.apache.org/jira/browse/SOLR-2059 for some examples of
this. i would only do this if you want worddelimiterfilter for other
purposes, if you just want to remove possessives and don't need
worddelimiterfilter's other features, look below.
2. you can instead use EnglishPossessiveFilterFactory, which only does
this exact thing (remove 's) and nothing else.

On Wed, Oct 19, 2011 at 5:30 PM, Herman Kiefus <he...@angieslist.com> wrote:
> We utilize a comprehensive dictionary of English words, place names, surnames, male and female first names, ... you get the point.  As such, the possessive plural forms of these words are recognized as 'misspelled'.
>
> I simply thought that 'turning on' this option for the WordDelimiterFactory would address my concerns; however, I also got an unintended consequence: Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be affected.  Is this intended behavior?  When I read 'English possessive' I hear 'apostrophe s' and not 'apostrophe anything'.  Is there something I'm missing here?
>



-- 
lucidimagination.com