You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Dyer, James" <Ja...@ingrambook.com> on 2011/01/12 23:19:11 UTC

StopFilterFactory and "qf" containing some fields that use it and some that do not

I'm running into a problem with StopFilterFactory in conjunction with (e)dismax queries that have a mix of fields, only some of which use StopFilterFactory.  It seems that if even 1 field on the "qf" parameter does not use StopFilterFactory, then stop words are not removed when searching any fields.  Here's an example of what I mean:

- I have 2 fields indexed:
  > Title is "textStemmed", which includes StopFilterFactory (see below).
  > Contributor is "textSimple", which does not include StopFilterFactory (see below).
- "The" is a stop word in stopwords.txt
- q=life&defType=edismax&qf=Title  ... returns 277,635 results
- q=the life&defType=edismax&qf=Title ... returns 277,635 results
- q=life&defType=edismax&qf=Title Contributor  ... returns 277,635 results
- q=the life&defType=edismax&qf=Title Contributor ... returns 0 results

It seems as if the stop words are not being stripped from the query because "qf" contains a field that doesn't use StopFilterFactory.  I did testing with combining Stemmed fields with not Stemmed fields in "qf" and it seems as if stemming gets applied regardless.  But stop words do not.

Does anyone have ideas on what is going on?  Is this a feature or possibly a bug?  Any known workarounds?  Any advice is appreciated.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
________________________________
<fieldType name="textSimple" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

<fieldType name="textStemmed" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>

Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

Posted by Markus Jelsma <ma...@openindex.io>.

I haven't used edismax but i can imagine its a feature. Ths is because 
inconstent use of stopwords in the analyzers of the fields specified in qf can 
yield really unexpected results because of the mm parameter.

In dismax, if one analyzer removed stopwords and the other doesn't the mm 
parameter goes crazy.

> I'm running into a problem with StopFilterFactory in conjunction with
> (e)dismax queries that have a mix of fields, only some of which use
> StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
> does not use StopFilterFactory, then stop words are not removed when
> searching any fields.  Here's an example of what I mean:
> 
> - I have 2 fields indexed:
>   > Title is "textStemmed", which includes StopFilterFactory (see below).
>   > Contributor is "textSimple", which does not include StopFilterFactory
>   > (see below).
> 
> - "The" is a stop word in stopwords.txt
> - q=life&defType=edismax&qf=Title  ... returns 277,635 results
> - q=the life&defType=edismax&qf=Title ... returns 277,635 results
> - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635 results
> - q=the life&defType=edismax&qf=Title Contributor ... returns 0 results
> 
> It seems as if the stop words are not being stripped from the query because
> "qf" contains a field that doesn't use StopFilterFactory.  I did testing
> with combining Stemmed fields with not Stemmed fields in "qf" and it seems
> as if stemming gets applied regardless.  But stop words do not.
> 
> Does anyone have ideas on what is going on?  Is this a feature or possibly
> a bug?  Any known workarounds?  Any advice is appreciated.
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> ________________________________
> <fieldType name="textSimple" class="solr.TextField"
> positionIncrementGap="100"> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> 
> <fieldType name="textStemmed" class="solr.TextField"
> positionIncrementGap="100"> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" /> <filter
> class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> stemEnglishPossessive="1" /> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PorterStemFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> stemEnglishPossessive="1" /> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PorterStemFilterFactory"/>
> </analyzer>
> </fieldType>

Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

Posted by Jan Høydahl <ja...@cominvent.com>.

Reviving this thread.

You say:
> I do wonder...what if (e)dismax had a flag you could set that would tell it that if any analyzers removed a term, then that term would become optional for any fields for which it remained?  I'm not sure what the development effort would perhaps it would be a nice way to circumvent this problem in a future release...

I created a JIRA issue to investigate if it is possible to implement this. See https://issues.apache.org/jira/browse/SOLR-3085

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 13. jan. 2011, at 17:36, Dyer, James wrote:

> I appreciate the reply and blog posting.  For now, I just enabled stopwords for all the fields on "Qf".  We have a very short list anyhow and our legacy search engine didn't even allow field-by-field configuration (stopwords are global on that system).
> 
> I do wonder...what if (e)dismax had a flag you could set that would tell it that if any analyzers removed a term, then that term would become optional for any fields for which it remained?  I'm not sure what the development effort would perhaps it would be a nice way to circumvent this problem in a future release...
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochkind@jhu.edu] 
> Sent: Thursday, January 13, 2011 9:54 AM
> To: solr-user@lucene.apache.org; markus.jelsma@openindex.io
> Cc: Dyer, James
> Subject: Re: StopFilterFactory and "qf" containing some fields that use it and some that do not
> 
> It's a known 'issue' in dismax, (really an inherent part of dismax's 
> design with no clear way to do anything about it), that qf over fields 
> with different stop word definitions will produce odd results for a 
> query with a stopword.
> 
> Here's my understanding of what's going on: 
> http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/
> 
> On 1/12/2011 6:48 PM, Markus Jelsma wrote:
>> Here's another thread on the subject:
>> http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-
>> td493483.html
>> 
>> And slightly off topic: you'd also might want to look at using common grams,
>> they are really useful for phrase queries that contain stopwords.
>> 
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory
>> 
>> 
>>> Here is what debug says each of these queries parse to:
>>> 
>>> 1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
>>> 2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
>>> 3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
>>> 4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results
>>> 
>>> 1. +DisjunctionMaxQuery((Title:life))
>>> 2. +((DisjunctionMaxQuery((Title:life)))~1)
>>> 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
>>> 4. +((DisjunctionMaxQuery((Contributor:the))
>>> DisjunctionMaxQuery((Contributor:life | Title:life)))~2)
>>> 
>>> I see what's going on here.  Because "the" is a stop word for Title, it
>>> gets removed from first part of the expression.  This means that
>>> "Contributor" is required to contain "the".  dismax does the same thing
>>> too.  I guess I should have run debug before asking the mail list!
>>> 
>>> It looks like the only workarounds I have is to either filter out the
>>> stopwords in the client when this happens, or enable stop words for all
>>> the fields that are used in "qf" with stopword-enabled fields.
>>> Unless...someone has a better idea??
>>> 
>>> James Dyer
>>> E-Commerce Systems
>>> Ingram Content Group
>>> (615) 213-4311
>>> 
>>> -----Original Message-----
>>> From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
>>> Sent: Wednesday, January 12, 2011 4:44 PM
>>> To: solr-user@lucene.apache.org
>>> Cc: Jayendra Patil
>>> Subject: Re: StopFilterFactory and "qf" containing some fields that use it
>>> and some that do not
>>> 
>>>> Have used edismax and Stopword filters as well. But usually use the fq
>>>> parameter e.g. fq=title:the life and never had any issues.
>>> That is because filter queries are not relevant for the mm parameter which
>>> is being used for the main query.
>>> 
>>>> Can you turn on the debugQuery and check whats the Query formed for all
>>>> the combinations you mentioned.
>>>> 
>>>> Regards,
>>>> Jayendra
>>>> 
>>>> On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James
>>> <Ja...@ingrambook.com>wrote:
>>>>> I'm running into a problem with StopFilterFactory in conjunction with
>>>>> (e)dismax queries that have a mix of fields, only some of which use
>>>>> StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
>>>>> does not use StopFilterFactory, then stop words are not removed when
>>>>> searching any fields.  Here's an example of what I mean:
>>>>> 
>>>>> - I have 2 fields indexed:
>>>>>> Title is "textStemmed", which includes StopFilterFactory (see
>>>>>> below). Contributor is "textSimple", which does not include
>>>>>> StopFilterFactory
>>>>> 
>>>>> (see below).
>>>>> - "The" is a stop word in stopwords.txt
>>>>> - q=life&defType=edismax&qf=Title  ... returns 277,635 results
>>>>> - q=the life&defType=edismax&qf=Title ... returns 277,635 results
>>>>> - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
>>>>> results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
>>>>> results
>>>>> 
>>>>> It seems as if the stop words are not being stripped from the query
>>>>> because "qf" contains a field that doesn't use StopFilterFactory.  I
>>>>> did testing with combining Stemmed fields with not Stemmed fields in
>>>>> "qf" and it seems as if stemming gets applied regardless.  But stop
>>>>> words do not.
>>>>> 
>>>>> Does anyone have ideas on what is going on?  Is this a feature or
>>>>> possibly a bug?  Any known workarounds?  Any advice is appreciated.
>>>>> 
>>>>> James Dyer
>>>>> E-Commerce Systems
>>>>> Ingram Content Group
>>>>> (615) 213-4311
>>>>> ________________________________
>>>>> <fieldType name="textSimple" class="solr.TextField"
>>>>> positionIncrementGap="100">
>>>>> <analyzer type="index">
>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>>> </analyzer>
>>>>> <analyzer type="query">
>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>>> </analyzer>
>>>>> </fieldType>
>>>>> 
>>>>> <fieldType name="textStemmed" class="solr.TextField"
>>>>> positionIncrementGap="100">
>>>>> <analyzer type="index">
>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
>>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
>>>>> stemEnglishPossessive="1" />
>>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>>> <filter class="solr.PorterStemFilterFactory"/>
>>>>> </analyzer>
>>>>> <analyzer type="query">
>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>>>> ignoreCase="true" expand="true"/>
>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
>>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
>>>>> stemEnglishPossessive="1" />
>>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>>> <filter class="solr.PorterStemFilterFactory"/>
>>>>> </analyzer>
>>>>> </fieldType>

RE: StopFilterFactory and "qf" containing some fields that use it and some that do not

Posted by "Dyer, James" <Ja...@ingrambook.com>.

I appreciate the reply and blog posting.  For now, I just enabled stopwords for all the fields on "Qf".  We have a very short list anyhow and our legacy search engine didn't even allow field-by-field configuration (stopwords are global on that system).

I do wonder...what if (e)dismax had a flag you could set that would tell it that if any analyzers removed a term, then that term would become optional for any fields for which it remained?  I'm not sure what the development effort would perhaps it would be a nice way to circumvent this problem in a future release...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Jonathan Rochkind [mailto:rochkind@jhu.edu] 
Sent: Thursday, January 13, 2011 9:54 AM
To: solr-user@lucene.apache.org; markus.jelsma@openindex.io
Cc: Dyer, James
Subject: Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

It's a known 'issue' in dismax, (really an inherent part of dismax's 
design with no clear way to do anything about it), that qf over fields 
with different stop word definitions will produce odd results for a 
query with a stopword.

Here's my understanding of what's going on: 
http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/

On 1/12/2011 6:48 PM, Markus Jelsma wrote:
> Here's another thread on the subject:
> http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-
> td493483.html
>
> And slightly off topic: you'd also might want to look at using common grams,
> they are really useful for phrase queries that contain stopwords.
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory
>
>
>> Here is what debug says each of these queries parse to:
>>
>> 1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
>> 2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
>> 3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
>> 4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results
>>
>> 1. +DisjunctionMaxQuery((Title:life))
>> 2. +((DisjunctionMaxQuery((Title:life)))~1)
>> 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
>> 4. +((DisjunctionMaxQuery((Contributor:the))
>> DisjunctionMaxQuery((Contributor:life | Title:life)))~2)
>>
>> I see what's going on here.  Because "the" is a stop word for Title, it
>> gets removed from first part of the expression.  This means that
>> "Contributor" is required to contain "the".  dismax does the same thing
>> too.  I guess I should have run debug before asking the mail list!
>>
>> It looks like the only workarounds I have is to either filter out the
>> stopwords in the client when this happens, or enable stop words for all
>> the fields that are used in "qf" with stopword-enabled fields.
>> Unless...someone has a better idea??
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>> -----Original Message-----
>> From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
>> Sent: Wednesday, January 12, 2011 4:44 PM
>> To: solr-user@lucene.apache.org
>> Cc: Jayendra Patil
>> Subject: Re: StopFilterFactory and "qf" containing some fields that use it
>> and some that do not
>>
>>> Have used edismax and Stopword filters as well. But usually use the fq
>>> parameter e.g. fq=title:the life and never had any issues.
>> That is because filter queries are not relevant for the mm parameter which
>> is being used for the main query.
>>
>>> Can you turn on the debugQuery and check whats the Query formed for all
>>> the combinations you mentioned.
>>>
>>> Regards,
>>> Jayendra
>>>
>>> On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James
>> <Ja...@ingrambook.com>wrote:
>>>> I'm running into a problem with StopFilterFactory in conjunction with
>>>> (e)dismax queries that have a mix of fields, only some of which use
>>>> StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
>>>> does not use StopFilterFactory, then stop words are not removed when
>>>> searching any fields.  Here's an example of what I mean:
>>>>
>>>> - I have 2 fields indexed:
>>>>   >  Title is "textStemmed", which includes StopFilterFactory (see
>>>>   >  below). Contributor is "textSimple", which does not include
>>>>   >  StopFilterFactory
>>>>
>>>> (see below).
>>>> - "The" is a stop word in stopwords.txt
>>>> - q=life&defType=edismax&qf=Title  ... returns 277,635 results
>>>> - q=the life&defType=edismax&qf=Title ... returns 277,635 results
>>>> - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
>>>> results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
>>>> results
>>>>
>>>> It seems as if the stop words are not being stripped from the query
>>>> because "qf" contains a field that doesn't use StopFilterFactory.  I
>>>> did testing with combining Stemmed fields with not Stemmed fields in
>>>> "qf" and it seems as if stemming gets applied regardless.  But stop
>>>> words do not.
>>>>
>>>> Does anyone have ideas on what is going on?  Is this a feature or
>>>> possibly a bug?  Any known workarounds?  Any advice is appreciated.
>>>>
>>>> James Dyer
>>>> E-Commerce Systems
>>>> Ingram Content Group
>>>> (615) 213-4311
>>>> ________________________________
>>>> <fieldType name="textSimple" class="solr.TextField"
>>>> positionIncrementGap="100">
>>>> <analyzer type="index">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> </analyzer>
>>>> <analyzer type="query">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> </analyzer>
>>>> </fieldType>
>>>>
>>>> <fieldType name="textStemmed" class="solr.TextField"
>>>> positionIncrementGap="100">
>>>> <analyzer type="index">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
>>>> stemEnglishPossessive="1" />
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> <filter class="solr.PorterStemFilterFactory"/>
>>>> </analyzer>
>>>> <analyzer type="query">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>>> ignoreCase="true" expand="true"/>
>>>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
>>>> stemEnglishPossessive="1" />
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> <filter class="solr.PorterStemFilterFactory"/>
>>>> </analyzer>
>>>> </fieldType>

Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

Posted by Jonathan Rochkind <ro...@jhu.edu>.

It's a known 'issue' in dismax, (really an inherent part of dismax's 
design with no clear way to do anything about it), that qf over fields 
with different stop word definitions will produce odd results for a 
query with a stopword.

Here's my understanding of what's going on: 
http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/

On 1/12/2011 6:48 PM, Markus Jelsma wrote:
> Here's another thread on the subject:
> http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-
> td493483.html
>
> And slightly off topic: you'd also might want to look at using common grams,
> they are really useful for phrase queries that contain stopwords.
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory
>
>
>> Here is what debug says each of these queries parse to:
>>
>> 1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
>> 2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
>> 3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
>> 4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results
>>
>> 1. +DisjunctionMaxQuery((Title:life))
>> 2. +((DisjunctionMaxQuery((Title:life)))~1)
>> 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
>> 4. +((DisjunctionMaxQuery((Contributor:the))
>> DisjunctionMaxQuery((Contributor:life | Title:life)))~2)
>>
>> I see what's going on here.  Because "the" is a stop word for Title, it
>> gets removed from first part of the expression.  This means that
>> "Contributor" is required to contain "the".  dismax does the same thing
>> too.  I guess I should have run debug before asking the mail list!
>>
>> It looks like the only workarounds I have is to either filter out the
>> stopwords in the client when this happens, or enable stop words for all
>> the fields that are used in "qf" with stopword-enabled fields.
>> Unless...someone has a better idea??
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>> -----Original Message-----
>> From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
>> Sent: Wednesday, January 12, 2011 4:44 PM
>> To: solr-user@lucene.apache.org
>> Cc: Jayendra Patil
>> Subject: Re: StopFilterFactory and "qf" containing some fields that use it
>> and some that do not
>>
>>> Have used edismax and Stopword filters as well. But usually use the fq
>>> parameter e.g. fq=title:the life and never had any issues.
>> That is because filter queries are not relevant for the mm parameter which
>> is being used for the main query.
>>
>>> Can you turn on the debugQuery and check whats the Query formed for all
>>> the combinations you mentioned.
>>>
>>> Regards,
>>> Jayendra
>>>
>>> On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James
>> <Ja...@ingrambook.com>wrote:
>>>> I'm running into a problem with StopFilterFactory in conjunction with
>>>> (e)dismax queries that have a mix of fields, only some of which use
>>>> StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
>>>> does not use StopFilterFactory, then stop words are not removed when
>>>> searching any fields.  Here's an example of what I mean:
>>>>
>>>> - I have 2 fields indexed:
>>>>   >  Title is "textStemmed", which includes StopFilterFactory (see
>>>>   >  below). Contributor is "textSimple", which does not include
>>>>   >  StopFilterFactory
>>>>
>>>> (see below).
>>>> - "The" is a stop word in stopwords.txt
>>>> - q=life&defType=edismax&qf=Title  ... returns 277,635 results
>>>> - q=the life&defType=edismax&qf=Title ... returns 277,635 results
>>>> - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
>>>> results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
>>>> results
>>>>
>>>> It seems as if the stop words are not being stripped from the query
>>>> because "qf" contains a field that doesn't use StopFilterFactory.  I
>>>> did testing with combining Stemmed fields with not Stemmed fields in
>>>> "qf" and it seems as if stemming gets applied regardless.  But stop
>>>> words do not.
>>>>
>>>> Does anyone have ideas on what is going on?  Is this a feature or
>>>> possibly a bug?  Any known workarounds?  Any advice is appreciated.
>>>>
>>>> James Dyer
>>>> E-Commerce Systems
>>>> Ingram Content Group
>>>> (615) 213-4311
>>>> ________________________________
>>>> <fieldType name="textSimple" class="solr.TextField"
>>>> positionIncrementGap="100">
>>>> <analyzer type="index">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> </analyzer>
>>>> <analyzer type="query">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> </analyzer>
>>>> </fieldType>
>>>>
>>>> <fieldType name="textStemmed" class="solr.TextField"
>>>> positionIncrementGap="100">
>>>> <analyzer type="index">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
>>>> stemEnglishPossessive="1" />
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> <filter class="solr.PorterStemFilterFactory"/>
>>>> </analyzer>
>>>> <analyzer type="query">
>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>>> ignoreCase="true" expand="true"/>
>>>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
>>>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
>>>> stemEnglishPossessive="1" />
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> <filter class="solr.PorterStemFilterFactory"/>
>>>> </analyzer>
>>>> </fieldType>

Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

Posted by Markus Jelsma <ma...@openindex.io>.

Here's another thread on the subject:
http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-
td493483.html

And slightly off topic: you'd also might want to look at using common grams, 
they are really useful for phrase queries that contain stopwords.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.CommonGramsFilterFactory


> Here is what debug says each of these queries parse to:
> 
> 1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
> 2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
> 3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
> 4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results
> 
> 1. +DisjunctionMaxQuery((Title:life))
> 2. +((DisjunctionMaxQuery((Title:life)))~1)
> 3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
> 4. +((DisjunctionMaxQuery((Contributor:the))
> DisjunctionMaxQuery((Contributor:life | Title:life)))~2)
> 
> I see what's going on here.  Because "the" is a stop word for Title, it
> gets removed from first part of the expression.  This means that
> "Contributor" is required to contain "the".  dismax does the same thing
> too.  I guess I should have run debug before asking the mail list!
> 
> It looks like the only workarounds I have is to either filter out the
> stopwords in the client when this happens, or enable stop words for all
> the fields that are used in "qf" with stopword-enabled fields. 
> Unless...someone has a better idea??
> 
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
> Sent: Wednesday, January 12, 2011 4:44 PM
> To: solr-user@lucene.apache.org
> Cc: Jayendra Patil
> Subject: Re: StopFilterFactory and "qf" containing some fields that use it
> and some that do not
> 
> > Have used edismax and Stopword filters as well. But usually use the fq
> > parameter e.g. fq=title:the life and never had any issues.
> 
> That is because filter queries are not relevant for the mm parameter which
> is being used for the main query.
> 
> > Can you turn on the debugQuery and check whats the Query formed for all
> > the combinations you mentioned.
> > 
> > Regards,
> > Jayendra
> > 
> > On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James
> 
> <Ja...@ingrambook.com>wrote:
> > > I'm running into a problem with StopFilterFactory in conjunction with
> > > (e)dismax queries that have a mix of fields, only some of which use
> > > StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
> > > does not use StopFilterFactory, then stop words are not removed when
> > > searching any fields.  Here's an example of what I mean:
> > > 
> > > - I have 2 fields indexed:
> > >  > Title is "textStemmed", which includes StopFilterFactory (see
> > >  > below). Contributor is "textSimple", which does not include
> > >  > StopFilterFactory
> > > 
> > > (see below).
> > > - "The" is a stop word in stopwords.txt
> > > - q=life&defType=edismax&qf=Title  ... returns 277,635 results
> > > - q=the life&defType=edismax&qf=Title ... returns 277,635 results
> > > - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
> > > results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
> > > results
> > > 
> > > It seems as if the stop words are not being stripped from the query
> > > because "qf" contains a field that doesn't use StopFilterFactory.  I
> > > did testing with combining Stemmed fields with not Stemmed fields in
> > > "qf" and it seems as if stemming gets applied regardless.  But stop
> > > words do not.
> > > 
> > > Does anyone have ideas on what is going on?  Is this a feature or
> > > possibly a bug?  Any known workarounds?  Any advice is appreciated.
> > > 
> > > James Dyer
> > > E-Commerce Systems
> > > Ingram Content Group
> > > (615) 213-4311
> > > ________________________________
> > > <fieldType name="textSimple" class="solr.TextField"
> > > positionIncrementGap="100">
> > > <analyzer type="index">
> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > </analyzer>
> > > <analyzer type="query">
> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > </analyzer>
> > > </fieldType>
> > > 
> > > <fieldType name="textStemmed" class="solr.TextField"
> > > positionIncrementGap="100">
> > > <analyzer type="index">
> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > words="stopwords.txt" enablePositionIncrements="true" />
> > > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > > generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> > > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> > > stemEnglishPossessive="1" />
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > <filter class="solr.PorterStemFilterFactory"/>
> > > </analyzer>
> > > <analyzer type="query">
> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > > ignoreCase="true" expand="true"/>
> > > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > > words="stopwords.txt" enablePositionIncrements="true" />
> > > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > > generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> > > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> > > stemEnglishPossessive="1" />
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > <filter class="solr.PorterStemFilterFactory"/>
> > > </analyzer>
> > > </fieldType>

RE: StopFilterFactory and "qf" containing some fields that use it and some that do not

Posted by "Dyer, James" <Ja...@ingrambook.com>.

Here is what debug says each of these queries parse to:

1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results

1. +DisjunctionMaxQuery((Title:life))
2. +((DisjunctionMaxQuery((Title:life)))~1)
3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
4. +((DisjunctionMaxQuery((Contributor:the)) DisjunctionMaxQuery((Contributor:life | Title:life)))~2)

I see what's going on here.  Because "the" is a stop word for Title, it gets removed from first part of the expression.  This means that "Contributor" is required to contain "the".  dismax does the same thing too.  I guess I should have run debug before asking the mail list!

It looks like the only workarounds I have is to either filter out the stopwords in the client when this happens, or enable stop words for all the fields that are used in "qf" with stopword-enabled fields.  Unless...someone has a better idea??

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: Markus Jelsma [mailto:markus.jelsma@openindex.io] 
Sent: Wednesday, January 12, 2011 4:44 PM
To: solr-user@lucene.apache.org
Cc: Jayendra Patil
Subject: Re: StopFilterFactory and "qf" containing some fields that use it and some that do not


> Have used edismax and Stopword filters as well. But usually use the fq
> parameter e.g. fq=title:the life and never had any issues.

That is because filter queries are not relevant for the mm parameter which is 
being used for the main query.

> 
> Can you turn on the debugQuery and check whats the Query formed for all the
> combinations you mentioned.
> 
> Regards,
> Jayendra
> 
> On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James 
<Ja...@ingrambook.com>wrote:
> > I'm running into a problem with StopFilterFactory in conjunction with
> > (e)dismax queries that have a mix of fields, only some of which use
> > StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
> > does not use StopFilterFactory, then stop words are not removed when
> > searching any fields.  Here's an example of what I mean:
> > 
> > - I have 2 fields indexed:
> >  > Title is "textStemmed", which includes StopFilterFactory (see below).
> >  > Contributor is "textSimple", which does not include StopFilterFactory
> > 
> > (see below).
> > - "The" is a stop word in stopwords.txt
> > - q=life&defType=edismax&qf=Title  ... returns 277,635 results
> > - q=the life&defType=edismax&qf=Title ... returns 277,635 results
> > - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
> > results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
> > results
> > 
> > It seems as if the stop words are not being stripped from the query
> > because "qf" contains a field that doesn't use StopFilterFactory.  I did
> > testing with combining Stemmed fields with not Stemmed fields in "qf"
> > and it seems as if stemming gets applied regardless.  But stop words do
> > not.
> > 
> > Does anyone have ideas on what is going on?  Is this a feature or
> > possibly a bug?  Any known workarounds?  Any advice is appreciated.
> > 
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> > ________________________________
> > <fieldType name="textSimple" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > </fieldType>
> > 
> > <fieldType name="textStemmed" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true" />
> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> > stemEnglishPossessive="1" />
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.PorterStemFilterFactory"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true" />
> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> > stemEnglishPossessive="1" />
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.PorterStemFilterFactory"/>
> > </analyzer>
> > </fieldType>

Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

Posted by Markus Jelsma <ma...@openindex.io>.

> Have used edismax and Stopword filters as well. But usually use the fq
> parameter e.g. fq=title:the life and never had any issues.

That is because filter queries are not relevant for the mm parameter which is 
being used for the main query.

> 
> Can you turn on the debugQuery and check whats the Query formed for all the
> combinations you mentioned.
> 
> Regards,
> Jayendra
> 
> On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James 
<Ja...@ingrambook.com>wrote:
> > I'm running into a problem with StopFilterFactory in conjunction with
> > (e)dismax queries that have a mix of fields, only some of which use
> > StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
> > does not use StopFilterFactory, then stop words are not removed when
> > searching any fields.  Here's an example of what I mean:
> > 
> > - I have 2 fields indexed:
> >  > Title is "textStemmed", which includes StopFilterFactory (see below).
> >  > Contributor is "textSimple", which does not include StopFilterFactory
> > 
> > (see below).
> > - "The" is a stop word in stopwords.txt
> > - q=life&defType=edismax&qf=Title  ... returns 277,635 results
> > - q=the life&defType=edismax&qf=Title ... returns 277,635 results
> > - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
> > results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
> > results
> > 
> > It seems as if the stop words are not being stripped from the query
> > because "qf" contains a field that doesn't use StopFilterFactory.  I did
> > testing with combining Stemmed fields with not Stemmed fields in "qf"
> > and it seems as if stemming gets applied regardless.  But stop words do
> > not.
> > 
> > Does anyone have ideas on what is going on?  Is this a feature or
> > possibly a bug?  Any known workarounds?  Any advice is appreciated.
> > 
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> > ________________________________
> > <fieldType name="textSimple" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > </fieldType>
> > 
> > <fieldType name="textStemmed" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true" />
> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> > stemEnglishPossessive="1" />
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.PorterStemFilterFactory"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true" />
> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> > stemEnglishPossessive="1" />
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.PorterStemFilterFactory"/>
> > </analyzer>
> > </fieldType>

Re: StopFilterFactory and "qf" containing some fields that use it and some that do not

Posted by Jayendra Patil <ja...@gmail.com>.

Have used edismax and Stopword filters as well. But usually use the fq
parameter e.g. fq=title:the life and never had any issues.

Can you turn on the debugQuery and check whats the Query formed for all the
combinations you mentioned.

Regards,
Jayendra

On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James <Ja...@ingrambook.com>wrote:

> I'm running into a problem with StopFilterFactory in conjunction with
> (e)dismax queries that have a mix of fields, only some of which use
> StopFilterFactory.  It seems that if even 1 field on the "qf" parameter does
> not use StopFilterFactory, then stop words are not removed when searching
> any fields.  Here's an example of what I mean:
>
> - I have 2 fields indexed:
>  > Title is "textStemmed", which includes StopFilterFactory (see below).
>  > Contributor is "textSimple", which does not include StopFilterFactory
> (see below).
> - "The" is a stop word in stopwords.txt
> - q=life&defType=edismax&qf=Title  ... returns 277,635 results
> - q=the life&defType=edismax&qf=Title ... returns 277,635 results
> - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635 results
> - q=the life&defType=edismax&qf=Title Contributor ... returns 0 results
>
> It seems as if the stop words are not being stripped from the query because
> "qf" contains a field that doesn't use StopFilterFactory.  I did testing
> with combining Stemmed fields with not Stemmed fields in "qf" and it seems
> as if stemming gets applied regardless.  But stop words do not.
>
> Does anyone have ideas on what is going on?  Is this a feature or possibly
> a bug?  Any known workarounds?  Any advice is appreciated.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
> ________________________________
> <fieldType name="textSimple" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
>
> <fieldType name="textStemmed" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> stemEnglishPossessive="1" />
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PorterStemFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> stemEnglishPossessive="1" />
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PorterStemFilterFactory"/>
> </analyzer>
> </fieldType>
>