You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Raimon Bosch <ra...@gmail.com> on 2010/02/18 13:19:26 UTC

some scores to 0 using omitNorns=false


Hi,

We did some tests with omitNorms=false. We have seen that in the last
result's page we have some scores set to 0.0. This scores setted to 0 are
problematic to our sorters.

It could be some kind of bug?

Regrads,
Raimon Bosch.
-- 
View this message in context: http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637436.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: some scores to 0 using omitNorns=false

Posted by Lance Norskog <go...@gmail.com>.
http://wiki.apache.org/lucene-java/ConceptsAndDefinitions

On Thu, Feb 18, 2010 at 7:13 AM, Raimon Bosch <ra...@gmail.com> wrote:
>
>
> I am not an expert in lucene scoring formula, but omintNorms=false makes the
> scoring formula a little bit more complex, taking into account boosting for
> fields and documents. If I'm not wrong (if I am please, correct me) I think
> that with omitNorms=false take into account the queryNorm(q) and norm(t,d)
> from formula: score(q,d)   =   coord(q,d)  ·  queryNorm(q)  ·            ∑       (
> tf(t in d)  ·  idf(t)2  ·  t.getBoost() ·  norm(t,d)  ) so the formula will
> be more complex.
>
> See
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html,
> and
> http://old.nabble.com/scores-are-the-same-for-many-diferent-documents-td27623039.html#a27623039
>
> multiValued option is used to create fields with multiple values.
>
> We use it one of our indexed modifying the schema.xml, adding a new field
>
> ...
> <field name="s_similar_name"         type="text"            indexed="true"
> stored="true" multiValued="true"/>
> ...
>
> This field is processed in a specific UpdateRequestProcessorFactory (write
> by us) from a comma separated field called 's_similar_names':
> ...
> public void processAdd(AddUpdateCommand cmd) throws IOException {
>    SolrInputDocument doc = cmd.getSolrInputDocument();
>
>    String v = (String)doc.getFieldValue( "s_similar_names" );
>    if( v != null ) {
>      String s_similar_names[] = v.split(",");
>      for(String s_similar_name : s_similar_names){
>        if(!s_similar_name.equals(""))
>            doc.addField( "s_similar_name", s_similar_name );
>      }
>    }
>
>    // pass it up the chain
>    super.processAdd(cmd);
>  }
> ...
>
> A processofactory is specified in solrconfig.xml
>
> ...
> # <updateRequestProcessorChain name="mychain">
> #     <processor
> class="org.apache.solr.update.processor.MyUpdateProcessorFactory"/>
> #     <processor class="solr.LogUpdateProcessorFactory" />
> #     <processor class="solr.RunUpdateProcessorFactory" />
> #   </updateRequestProcessorChain>
> ...
>
> and adding this chain to XmlUpdateRequestHandler in solrconfig.xml:
>
> ...
> # <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" >
> #     <lst name="defaults">
> #        <str name="update.processor">mychain</str>
> #      </lst>
> #   </requestHandler>
> ...
>
> termVector is used to save more info about terns of a document in the index
> and save computational time in functions like MoreLikeThis.
> http://wiki.apache.org/solr/TermVectorComponent. We don't use it.
>
>
> adeelmahmood wrote:
>>
>> I was gonna ask a question about this but you seem like you might have the
>> answer for me .. wat exactly is the omitNorms field do (or is expected to
>> do) .. also if you could please help me understand what termVectors and
>> multiValued options do ??
>> Thanks for ur help
>>
>>
>> Raimon Bosch wrote:
>>>
>>>
>>> Hi,
>>>
>>> We did some tests with omitNorms=false. We have seen that in the last
>>> result's page we have some scores set to 0.0. This scores setted to 0 are
>>> problematic to our sorters.
>>>
>>> It could be some kind of bug?
>>>
>>> Regrads,
>>> Raimon Bosch.
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637827.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: some scores to 0 using omitNorns=false

Posted by Raimon Bosch <ra...@gmail.com>.

I am not an expert in lucene scoring formula, but omintNorms=false makes the
scoring formula a little bit more complex, taking into account boosting for
fields and documents. If I'm not wrong (if I am please, correct me) I think
that with omitNorms=false take into account the queryNorm(q) and norm(t,d)
from formula: score(q,d)   =   coord(q,d)  ·  queryNorm(q)  ·   	 ∑   	 ( 
tf(t in d)  ·  idf(t)2  ·  t.getBoost() ·  norm(t,d)  ) so the formula will
be more complex.

See
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html,
and
http://old.nabble.com/scores-are-the-same-for-many-diferent-documents-td27623039.html#a27623039

multiValued option is used to create fields with multiple values.

We use it one of our indexed modifying the schema.xml, adding a new field

...
<field name="s_similar_name"         type="text"            indexed="true"  
stored="true" multiValued="true"/>
...

This field is processed in a specific UpdateRequestProcessorFactory (write
by us) from a comma separated field called 's_similar_names':
...
public void processAdd(AddUpdateCommand cmd) throws IOException {
    SolrInputDocument doc = cmd.getSolrInputDocument();

    String v = (String)doc.getFieldValue( "s_similar_names" );
    if( v != null ) {
      String s_similar_names[] = v.split(",");
      for(String s_similar_name : s_similar_names){
        if(!s_similar_name.equals(""))
            doc.addField( "s_similar_name", s_similar_name );
      }
    }

    // pass it up the chain
    super.processAdd(cmd);
  }
...

A processofactory is specified in solrconfig.xml

...
# <updateRequestProcessorChain name="mychain">    
#     <processor
class="org.apache.solr.update.processor.MyUpdateProcessorFactory"/>  
#     <processor class="solr.LogUpdateProcessorFactory" />  
#     <processor class="solr.RunUpdateProcessorFactory" />  
#   </updateRequestProcessorChain>
...

and adding this chain to XmlUpdateRequestHandler in solrconfig.xml:

...
# <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" >  
#     <lst name="defaults">      
#        <str name="update.processor">mychain</str>      
#      </lst>  
#   </requestHandler>
...

termVector is used to save more info about terns of a document in the index
and save computational time in functions like MoreLikeThis.
http://wiki.apache.org/solr/TermVectorComponent. We don't use it.


adeelmahmood wrote:
> 
> I was gonna ask a question about this but you seem like you might have the
> answer for me .. wat exactly is the omitNorms field do (or is expected to
> do) .. also if you could please help me understand what termVectors and
> multiValued options do ??
> Thanks for ur help
> 
> 
> Raimon Bosch wrote:
>> 
>> 
>> Hi,
>> 
>> We did some tests with omitNorms=false. We have seen that in the last
>> result's page we have some scores set to 0.0. This scores setted to 0 are
>> problematic to our sorters.
>> 
>> It could be some kind of bug?
>> 
>> Regrads,
>> Raimon Bosch.
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637827.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: some scores to 0 using omitNorns=false

Posted by Chris Hostetter <ho...@fucit.org>.
: >> We did some tests with omitNorms=false. We have seen that in the last
: >> result's page we have some scores set to 0.0. This scores setted to 0 are
: >> problematic to our sorters.
: >> 
: >> It could be some kind of bug?

It could be, but it isn't neccessarily.

"0.0" is a perfectly legal score that can result from a query ... q=solr^0 
is an example of a query that will most certainly result in a score of 0 
for a matching document, but there are lots of other possibilities as 
well.  (Even negative scores are possible if you use function queries)

You would need to provide some details about your queries/docs/schema 
(including the explain output from debugQuery) to really have any idea 
whether the zero scores you are getting are "correct" or a bug.


-Hoss


Re: some scores to 0 using omitNorns=false

Posted by Raimon Bosch <ra...@gmail.com>.

We have just tested it with the last version of Solr and we still have
scores to 0.


adeelmahmood wrote:
> 
> I was gonna ask a question about this but you seem like you might have the
> answer for me .. wat exactly is the omitNorms field do (or is expected to
> do) .. also if you could please help me understand what termVectors and
> multiValued options do ??
> Thanks for ur help
> 
> 
> Raimon Bosch wrote:
>> 
>> 
>> Hi,
>> 
>> We did some tests with omitNorms=false. We have seen that in the last
>> result's page we have some scores set to 0.0. This scores setted to 0 are
>> problematic to our sorters.
>> 
>> It could be some kind of bug?
>> 
>> Regrads,
>> Raimon Bosch.
>> 
> 
> 

-- 
View this message in context: http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27714191.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: some scores to 0 using omitNorns=false

Posted by adeelmahmood <ad...@gmail.com>.
I was gonna ask a question about this but you seem like you might have the
answer for me .. wat exactly is the omitNorms field do (or is expected to
do) .. also if you could please help me understand what termVectors and
multiValued options do ??
Thanks for ur help


Raimon Bosch wrote:
> 
> 
> Hi,
> 
> We did some tests with omitNorms=false. We have seen that in the last
> result's page we have some scores set to 0.0. This scores setted to 0 are
> problematic to our sorters.
> 
> It could be some kind of bug?
> 
> Regrads,
> Raimon Bosch.
> 

-- 
View this message in context: http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637819.html
Sent from the Solr - User mailing list archive at Nabble.com.