You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by aronitin <AR...@YAHOO.COM> on 2011/10/25 23:31:14 UTC

Incorrect Search Results showing up

Hi Group,

I've the defined a type "text" in the SOLR schema as shown below. 

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
</fieldType>

A multi valued field is defined to use the type defined above
<field name="content" type="text" indexed="true" stored="false"
multiValued="true"/>

I index some content such as 
- Google REST API
- Facebook REST API
- Software Architecture
- Design Documents
- Xml Web Services
- Web API design

When I issue a search query like content:"rest api"~4, the matches that I
get are
- Google REST API (which is fine)
- Facebook REST API (which is fine)
- *Web API design* (which is not fine, because the query was a phrase query
and rest and api should be within 4 words of each other)

Does any body see the 3rd search result as a correct search result to be
returned? If yes, then what is explanation for that result based on the
schema defined.

According to me 3rd result should not be returned as part of the search
result. If somebody can point out anything wrong in my schema it will be
great help to me.

Thanks
Nitin



--
View this message in context: http://lucene.472066.n3.nabble.com/Incorrect-Search-Results-showing-up-tp3452810p3452810.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Incorrect Search Results showing up

Posted by Grant Ingersoll <gs...@apache.org>.
If you add debugQuery=true to your request, what does it show for that last result?

On Oct 25, 2011, at 5:31 PM, aronitin wrote:

> Hi Group,
> 
> I've the defined a type "text" in the SOLR schema as shown below. 
> 
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
> </fieldType>
> 
> A multi valued field is defined to use the type defined above
> <field name="content" type="text" indexed="true" stored="false"
> multiValued="true"/>
> 
> I index some content such as 
> - Google REST API
> - Facebook REST API
> - Software Architecture
> - Design Documents
> - Xml Web Services
> - Web API design
> 
> When I issue a search query like content:"rest api"~4, the matches that I
> get are
> - Google REST API (which is fine)
> - Facebook REST API (which is fine)
> - *Web API design* (which is not fine, because the query was a phrase query
> and rest and api should be within 4 words of each other)
> 
> Does any body see the 3rd search result as a correct search result to be
> returned? If yes, then what is explanation for that result based on the
> schema defined.
> 
> According to me 3rd result should not be returned as part of the search
> result. If somebody can point out anything wrong in my schema it will be
> great help to me.
> 
> Thanks
> Nitin
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Incorrect-Search-Results-showing-up-tp3452810p3452810.html
> Sent from the Solr - User mailing list archive at Nabble.com.

--------------------------
Grant Ingersoll
http://www.lucidimagination.com