You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by meghana <me...@amultek.com> on 2013/04/11 09:25:02 UTC

Solr : Stopwords at query time

In solr , I have text as like below format.

1s: This is very nice day. 4s: Christmas is about to come 7s: and christmas
preparation is just on 12s: this is awesome!! 

I want that words like '1s:' , '4s:' , anything like 'ns:' should not be
indexed and searchable, to do so I have added stop words filter in my text
field definition. 

below is the my field type defination 
-----------------------------------
 <fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>


and stopwords.txt field contains words 
-------------------
1s:
2s:
... 
...
... 
10000s:

----------------------
when i search for with q="109s:" , it returns 0 results, but if i search for
"109s" , then it should also return 0 results. but surprisingly solr not
doing so!! , and returning results having "190s:"  in text. 

I understand that , if words "109s:" is not indexed, thus "190s" also not
indexed. and as word "190s" is not there in index, it should not return
results for that. 

But solr is not looking to behave so, can anybody explain me of this
behavior. and if any changes i should do to fulfill my requirement 

Thanks




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Stopwords-at-query-time-tp4055249.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr : Stopwords at query time

Posted by Upayavira <uv...@odoko.co.uk>.
I'd suggest using the analyze tab in the admin UI to unpick what is
going on. You can play with scenarios there without having to waste
round trips indexing stuff.

Upayavira

On Thu, Apr 11, 2013, at 08:25 AM, meghana wrote:
> In solr , I have text as like below format.
> 
> 1s: This is very nice day. 4s: Christmas is about to come 7s: and
> christmas
> preparation is just on 12s: this is awesome!! 
> 
> I want that words like '1s:' , '4s:' , anything like 'ns:' should not be
> indexed and searchable, to do so I have added stop words filter in my
> text
> field definition. 
> 
> below is the my field type defination 
> -----------------------------------
>  <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
> 
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
> 
> 
> and stopwords.txt field contains words 
> -------------------
> 1s:
> 2s:
> ... 
> ...
> ... 
> 10000s:
> 
> ----------------------
> when i search for with q="109s:" , it returns 0 results, but if i search
> for
> "109s" , then it should also return 0 results. but surprisingly solr not
> doing so!! , and returning results having "190s:"  in text. 
> 
> I understand that , if words "109s:" is not indexed, thus "190s" also not
> indexed. and as word "190s" is not there in index, it should not return
> results for that. 
> 
> But solr is not looking to behave so, can anybody explain me of this
> behavior. and if any changes i should do to fulfill my requirement 
> 
> Thanks
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Stopwords-at-query-time-tp4055249.html
> Sent from the Solr - User mailing list archive at Nabble.com.