You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by oleg_gnatovskiy <ol...@citysearch.com> on 2008/02/27 19:30:28 UTC

Question regarding Solr ranking

Hello everyone.


I've run into a weird problem with Solr's ranking engine. In a nutshell, the
problem involves certain results getting EXTREMELY high rank scores. Here is
an example:


locRvwText:"Pizza Pizza"^10 OR locName:"Pizza Pizza"^30


The way I understand it is that the locName part of the query should be
boosted 3x more then the locRvwText.

However, when running this query the first result is:



10.8226
Johnnie's New York Pizzeria


Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza...
Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza...
Pizza... Pizza... Pizza... Pizza... Pizza...



−
	

10.8226 = (MATCH) product of:
  21.6452 = (MATCH) sum of:
    21.6452 = weight(locRvwText:"pizza pizza"^10.0 in 3792465), product of:
      0.3354544 = queryWeight(locRvwText:"pizza pizza"^10.0), product of:
        10.0 = boost
        14.428232 = idf(locRvwText: pizza=8156 pizza=8156)
        0.0023249863 = queryNorm
      64.52502 = fieldWeight(locRvwText:"pizza pizza" in 3792465), product
of:
        4.472136 = tf(phraseFreq=20.0)
        14.428232 = idf(locRvwText: pizza=8156 pizza=8156)
        1.0 = fieldNorm(field=locRvwText, doc=3792465)
  0.5 = coord(1/2)




How come the phrase frequency for rvwText comes back as 20? The field
rvwText is defined in the following way:




And my text fields are defined in the following way:




      
        
	
                
        
        
        
        
        
      
      
        
        
        
        
        
        
        
      
    


Forgive me if I am wrong, but shouldn't the
RemoveDuplicatesTokenFilterFactory have the string "Pizza... Pizza...
Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza...
Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza...
Pizza... Pizza... Pizza..." Count as simplu one Pizza?

I'd appreciate any help I can get! 

Thanks!






-- 
View this message in context: http://www.nabble.com/Question-regarding-Solr-ranking-tp15719752p15719752.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question regarding Solr ranking

Posted by oleg_gnatovskiy <ol...@citysearch.com>.

Sorry about the previous message, I had some formatting issues. Below is the
actual message!

oleg_gnatovskiy wrote:
> 
> Hello everyone.
> 
> I've run into a weird problem with Solr's ranking engine. In a nutshell,
> the problem involves certain results getting EXTREMELY high rank scores.
> Here is an example:
> 
> locRvwText:"Pizza Pizza"^10 OR locName:"Pizza Pizza"^30
> 
> The way I understand it is that the locName part of the query should be
> boosted 3x more then the locRvwText.
> However, when running this query the first result is:
> 
> <float name="score">10.8226</float>
> <str name="locName">Johnnie's New York Pizzeria</str>
> <arr name="locRvwText">
> <str>
> Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza...
> Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza...
> Pizza... Pizza... Pizza... Pizza... Pizza...
> </str>
> </arr>
> <lst name="explain">
> 
> 	<str name="id=157789,internal_docid=3792465">
> 
> 10.8226 = (MATCH) product of:
>   21.6452 = (MATCH) sum of:
>     21.6452 = weight(locRvwText:"pizza pizza"^10.0 in 3792465), product
> of:
>       0.3354544 = queryWeight(locRvwText:"pizza pizza"^10.0), product of:
>         10.0 = boost
>         14.428232 = idf(locRvwText: pizza=8156 pizza=8156)
>         0.0023249863 = queryNorm
>       64.52502 = fieldWeight(locRvwText:"pizza pizza" in 3792465), product
> of:
>         4.472136 = tf(phraseFreq=20.0)
>         14.428232 = idf(locRvwText: pizza=8156 pizza=8156)
>         1.0 = fieldNorm(field=locRvwText, doc=3792465)
>   0.5 = coord(1/2)
> </str>
> </lst>
> 
> 
> How come the phrase frequency for rvwText comes back as 20? The field
> rvwText is defined in the following way:
> 
> <field name="locRvwText" type="text" index="false" stored="true"
> required="false" multiValued="true"  omitNorms="true"/>
> 
> And my text fields are defined in the following way:
> 
> <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 	<!-- in this example, we will only use synonyms at query time -->
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>        
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldtype>
> 
> Forgive me if I am wrong, but shouldn't the
> RemoveDuplicatesTokenFilterFactory have the string "Pizza... Pizza...
> Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza...
> Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza...
> Pizza... Pizza... Pizza..." Count as simplu one Pizza?<br>
> I'd appreciate any help I can get! 
> 
> Thanks!
> 

-- 
View this message in context: http://www.nabble.com/Question-regarding-Solr-ranking-tp15719752p15719834.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question regarding Solr ranking

Posted by Chris Hostetter <ho...@fucit.org>.

: I am not really clear to what the analysis mode is supposed to give me. It
: requires me to specify a field when I specify a query. What does that do?
: Also, I don't see anything in the analyzer to explain the weighting of a
: particular document.

i think what Otis ment is that the analysis tool would help you verify 
that your Analyzers are doing what you expect them to be doing.

If you try that with your locRvwText and the text you are asking about you 
would see that RemoveDuplicatesTokenFilterFactory does not make it the 
same as a single instance of "Pizza" ... per the docs...

	"Filters out any tokens which are at the same logical position 
	in the tokenstream as a previous token with the same text. ..."

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-b05ef0377d71df53b47b9dd9cc28c26d95097a0b

so it isn't removing any tokens in your situation because they do not 
existing in the same logical position.



-Hoss

Re: Question regarding Solr ranking

Posted by oleg_gnatovskiy <ol...@citysearch.com>.


Otis Gospodnetic wrote:
> 
> It's a little hard to read that message, but if I were you I'd go to the
> Solr admin page, analysis section, enter your query, and see what index
> and query time analyzers spit out.  I think that should at least give you
> some hints.
> 
> Otis 
> 
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
I am not really clear to what the analysis mode is supposed to give me. It
requires me to specify a field when I specify a query. What does that do?
Also, I don't see anything in the analyzer to explain the weighting of a
particular document.

Regardless, what I have it narrowed down to is that my locRvwText (defined
as multiple value text field) and it has a field that looks like this:
"Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza...
> > Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza... Pizza...
> > Pizza... Pizza... Pizza... Pizza... Pizza... ". Solr is counting this as
> 20 hits, but I was under the impression that the
> RemoveDuplicatesTokenFilterFactory should filter this result to have it
> count as just 1 hit. Am I understanding was
> RemoveDuplicatesTokenFilterFactory does incorrectly?
-- 
View this message in context: http://www.nabble.com/Question-regarding-Solr-ranking-tp15719752p15768743.html
Sent from the Solr - User mailing list archive at Nabble.com.