You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by ramzesua <mi...@gmail.com> on 2010/11/29 16:06:09 UTC

search strangeness

Hi all. I have a little question. Can anyone explain, why this solr search
work so strange? :)
For example, I make schema.xml:
I add some fields with fieldType = text. Here 'text' properties
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0"
generateWordParts="1" generateNumberParts="0" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>        
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>
I copied to text field all my fields:
<copyField source="name" dest="text"/>
<copyField source="caption" dest="text"/>


Then I add one document to my index. Here schema browser for field
'caption':

_term___________frequency_
|annual	        |    1         |
|golfer	        |    1         |
|tournament	|    1         |
|welcom	        |    1         |
|3rd	                |    1         |

After that I tried to find this document by terms:
annual - no results
golfer  - found document
tournament - no results
welcom - found document
3rd - no results

I read a lot of forums, some books and http://wiki.apache.org/solr/.... but
it don't help me.
Can anyone explain me, why solr search so strange? Or where is my problem?
Thank you ...

-- 
View this message in context: http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1986895.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search strangeness

Posted by ramzesua <mi...@gmail.com>.

I found the problem: solr.EnglishPorterFilterFactory in the <analyzer
type="query"> form that parsedquery.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1991321.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search strangeness

Posted by ramzesua <mi...@gmail.com>.

Here result with &debugQuery:

For term annual:
<result name="response" numFound="0" start="0"/> 
<lst name="debug"> 
 <str name="rawquerystring">annual</str> 
 <str name="querystring">annual</str> 
 <str name="parsedquery">text:year text:twelve-month text:onceayear
text:yearbook</str> 
 <str name="parsedquery_toString">text:year text:twelve-month text:onceayear
text:yearbook</str> 
 <lst name="explain"/> 
 <str name="QParser">LuceneQParser</str> 
 <lst name="timing"> 
  <double name="time">63.0</double> ....

For term welcome:
<lst name="debug"> 
 <str name="rawquerystring">welcome</str> 
 <str name="querystring">welcome</str> 
 <str name="parsedquery">text:welcom</str> 
 <str name="parsedquery_toString">text:welcom</str> 
 <lst name="explain"> 
  <str name="17"> 
0.10848885 = (MATCH) fieldWeight(text:welcom in 0), product of:
  1.4142135 = tf(termFreq(text:welcom)=2)
  0.30685282 = idf(docFreq=1, maxDocs=1)
  0.25 = fieldNorm(field=text, doc=0)
</str> 
 </lst> 
 <str name="QParser">LuceneQParser</str> 
 <lst name="timing"> 
  <double name="time">16.0</double> ...

Before this I remove SynonymFilterFactory from  <analyzer type="query">.
My query analizer for term 'annual':
annual
annual
annual
annual
annual
annual
My query analizer for term 'welcome':
welcome
welcome
welcome
welcome
welcom
welcom

How I get this parsedquery for term annual? Or there are another problems?
-- 
View this message in context: http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1991262.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search strangeness

Posted by ramzesua <mi...@gmail.com>.

Hi, Erick. There is defaultSearchField in my schema.xml. Can you give me your
example of configure for text field ?(What filters do you use for index and
for query)
-- 
View this message in context: http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1989466.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search strangeness

Posted by Erick Erickson <er...@gmail.com>.

On a quick look with Solr 3.1, these results are puzzling. Are you
sure that you are searching the field you think you are? I take it you're
searching the "text" field, but that's controlled by your
<defaultSearchField>
entry in schema.xml.

Try using the admin page, particularly the "full interface" link and
turn debugging on, that should give you a better idea of what
is actually being searched. Another admin page that's very useful
is the analysis page, that'll show you exactly what transformations
are made to your terms at index and query time and why.

I'm a little suspicious that you've put the stopword filter in a different
place in the index and query process, but I doubt that
is a problem. The analysis page will help with that too.

But nothing really jumps out at me, if you don't get anywhere with the
admin page, perhaps you can show us the field definitions for the
name, caption and text fields (not the type, the actual <field></field>
part of the schema).

Also, please post the results of appending &debugQuery=on to the request.

Best
Erick

On Mon, Nov 29, 2010 at 10:06 AM, ramzesua <mi...@gmail.com> wrote:

>
> Hi all. I have a little question. Can anyone explain, why this solr search
> work so strange? :)
> For example, I make schema.xml:
> I add some fields with fieldType = text. Here 'text' properties
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0"
> generateWordParts="1" generateNumberParts="0" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>    </fieldType>
> I copied to text field all my fields:
> <copyField source="name" dest="text"/>
> <copyField source="caption" dest="text"/>
>
>
> Then I add one document to my index. Here schema browser for field
> 'caption':
>
> _term___________frequency_
> |annual         |    1         |
> |golfer         |    1         |
> |tournament     |    1         |
> |welcom         |    1         |
> |3rd                    |    1         |
>
> After that I tried to find this document by terms:
> annual - no results
> golfer  - found document
> tournament - no results
> welcom - found document
> 3rd - no results
>
> I read a lot of forums, some books and http://wiki.apache.org/solr/....
> but
> it don't help me.
> Can anyone explain me, why solr search so strange? Or where is my problem?
> Thank you ...
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/search-strangeness-tp1986895p1986895.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>