You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Weber <to...@rtl.lu> on 2006/07/26 10:28:14 UTC

Own Similarity Class in Solr

Hello,

   I would like to alter the similarity behaviour of solr to remove  
the fieldnorm factor in  the similarity calculations. As far as I  
read, I need to recreate my own similarity class and import it into  
solr using the <similarity> config in schema.xml.

   Has anybody already tweaked or played with this topic, and might  
give me some code or advices ?

   Thats would be really great

   Thanks

   Best Greetings,

   Tom

Re: Own Similarity Class in Solr

Posted by Chris Hostetter <ho...@fucit.org>.
:    I have a field "searchname" with a boost of "3.0" during the
: document.add. Another field "text" is a copyField of several entries,

index time "field boosts", "document boosts", and document length are all
factored into the "fieldNorm" value at indexing time -- so if you want to
use field boosts you'll have to skip the omitNorms="true" suggestion i
gave you earlier.

:    What I need is, a search, which will handle each document the
: same, regardless of the frequency and the size, it shall calculate
: the score only on the boost factors, so a document with a hight
: boostfactor and the same text in it as another one with less factor
: shall be before the others.

if you have your lengthNorm function returning 1.0 in all cases, then the
fieldNorm should be based entirely on your fieldBoost -- so i'm not sure
what you aren't getting the results you expect.  One thing you may not be
realizing though is that the Similarity.lengthNorm function is used when
the documents are being index (unlike all the other Similarity functions
that are used at query time) so if you changed the Similarity class
without rebuilding your index you won't see those norm changes.

the main way to make sense of your scores is to look at the Explanation
output you get for each doc ... in Solr you can turn this on for the
Standard and DisMax request handlers by adding debugQuery=1 to your URLs
... a new block of space indented text will be added for each doc
explaining why it got the scores it did.

If you still can't make sense of your scores after looking at the
Explanation output, then you may want to followup on the java-user lucene
list -- it's a much bigger audience then the solr list, and someone there
might be able to spot your problem (including the query toString and the
Explanations for your various docs will go a long way)


-Hoss


Re: Own Similarity Class in Solr

Posted by Tom Weber <to...@rtl.lu>.
Hi Chris,

   thanks for the details, I am meanwhile poking around with my own  
class which I defined in the schema.xml everything is working  
perfectly there.

   But I have still the problem with the normalization, I try to  
change several parameters to fix it to 1.0, this does indeed change  
the scoring but still not the real way I need it. It seems that it is  
always the "fieldNorm" which is playing, but where is this field  
really from ? In the Similarity Class I don't find this term to alter.

   Let me give a short example what goes wrong :

   I have a field "searchname" with a boost of "3.0" during the  
document.add. Another field "text" is a copyField of several entries,  
this one does not have a boost factor, but indeed more data in it. In  
this text is a copy of a field where the text searched is in there  
3times. This entry has the score : 5.5930133

   But I have also entries where the searchname has the same word in  
it, but this one have a score of 1.9975047.

   Currently my class is like this (I took the DefaultSimilarity as a  
basis);

   - lengthNorm is fixed to 1.0
   - tf fixed to 1.0
   - idf fixed to 1.0

   With these changes, might it be possible that I've deactivated the  
boost on the different Fields.

   What I need is, a search, which will handle each document the  
same, regardless of the frequency and the size, it shall calculate  
the score only on the boost factors, so a document with a hight  
boostfactor and the same text in it as another one with less factor  
shall be before the others.

   Something I do might be completely wrong, perhaps You have an idea ?

   Thanks,

    Tom

Re: Own Similarity Class in Solr

Posted by Chris Hostetter <ho...@fucit.org>.
:    I would like to alter the similarity behaviour of solr to remove
: the fieldnorm factor in  the similarity calculations. As far as I
: read, I need to recreate my own similarity class and import it into
: solr using the <similarity> config in schema.xml.
:
:    Has anybody already tweaked or played with this topic, and might
: give me some code or advices ?

as you're already noticed, you can specify the Similarity class at runtime
via the schema.xml -- the only Solr specific aspect of this making sure
your Similarity class is in your servlet containers classpath (exactly how
you do this depends on your servlet container)

searching the java-dev and java-user Lucene mailing lists is the best bet
for finding discussions on writing your own similarity, there are also
some examples in the main Lucene code base...

contrib/miscellaneous/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java
src/test/org/apache/lucene/search/TestDisjunctionMaxQuery.java

...if your main interest is just eliminating norms, there is a special
option for that in Lucene Fields called "Omit Norms" (it not only
eliminates the effects of norms on scoring, but it saves space in your
index as well) in Solr you can turn it on/off per <field> or <fieldtype>
using the omitNorms="true" option in the schema.xml



-Hoss