You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Webster Homer <we...@sial.com> on 2017/08/08 19:38:35 UTC

Solr 6 and IDF

Our most common use for solr is searching for products, not text search. My
company is in the process of migrating away from an Endeca search engine,
 the goal to keep the business happy is to make sure that search results
from the different engines be fairly similar, one area that we have found
that suppresses a result from being as good as it was in the old system is
the idf.

We are using Solr 6. After moving to it, a lot of our results got better,
but idf still seems to deaden some results. Given that our focus is product
searching I really don't see a need for idf at all. Previous to Solr 6 you
could suppress idf by providing a custom similarity class. Looking over the
newer documentation a lot of things have improved, but I'm not sure I see a
simple way to turn off idf in Solr 6's BM25 searcher.

How do I disable IDF in Solr 6?

We also do have needs for text searching so it would be nice if we could
suppress IDF on a field or schema level

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: Solr 6 and IDF

Posted by Webster Homer <we...@sial.com>.
It appears that all I need to do is create a class that
extends BM25Similarity, and have the new class return 1 as the idf. Is that
correct?

On Tue, Aug 8, 2017 at 3:15 PM, Webster Homer <we...@sial.com>
wrote:

> I do want to use BM25, just disable IDF
>
> On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster <
> peter.lancaster@findmypast.com> wrote:
>
>> Hi Webster,
>>
>> If you're not worried about using BM25 searcher then you should just be
>> able to continue as you were before by providing your own similarity class
>> that extends ClassicSimilarity and then override the idf method to always
>> return 1,  then reference that in your schema
>> e.g.
>> <similarity class="brightsolid.solr.plugins.MyCustomSimilarity" />
>>
>> As far as I know you've been able to have different similarities per
>> field in solr for a while now. https://wiki.apache.org/solr/S
>> chemaXml#Similarity
>>
>> Cheers,
>> Peter Lancaster.
>>
>>
>> -----Original Message-----
>> From: Webster Homer [mailto:webster.homer@sial.com]
>> Sent: 08 August 2017 20:39
>> To: solr-user@lucene.apache.org
>> Subject: Solr 6 and IDF
>>
>> Our most common use for solr is searching for products, not text search.
>> My company is in the process of migrating away from an Endeca search
>> engine,  the goal to keep the business happy is to make sure that search
>> results from the different engines be fairly similar, one area that we have
>> found that suppresses a result from being as good as it was in the old
>> system is the idf.
>>
>> We are using Solr 6. After moving to it, a lot of our results got better,
>> but idf still seems to deaden some results. Given that our focus is product
>> searching I really don't see a need for idf at all. Previous to Solr 6 you
>> could suppress idf by providing a custom similarity class. Looking over the
>> newer documentation a lot of things have improved, but I'm not sure I see a
>> simple way to turn off idf in Solr 6's BM25 searcher.
>>
>> How do I disable IDF in Solr 6?
>>
>> We also do have needs for text searching so it would be nice if we could
>> suppress IDF on a field or schema level
>>
>> --
>>
>>
>> This message and any attachment are confidential and may be privileged or
>> otherwise protected from disclosure. If you are not the intended recipient,
>> you must not copy this message or attachment or disclose the contents to
>> any other person. If you have received this transmission in error, please
>> notify the sender immediately and delete the message and any attachment
>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>> subsidiaries do not accept liability for any omissions or errors in this
>> message which may arise as a result of E-Mail-transmission or for damages
>> resulting from any unauthorized changes of the content of this message and
>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> subsidiaries do not guarantee that this message is free of viruses and does
>> not accept liability for any damages caused by any virus transmitted
>> therewith.
>>
>> Click http://www.emdgroup.com/disclaimer to access the German, French,
>> Spanish and Portuguese versions of this disclaimer.
>> ________________________________
>>
>> This message is confidential and may contain privileged information. You
>> should not disclose its contents to any other person. If you are not the
>> intended recipient, please notify the sender named above immediately. It is
>> expressly declared that this e-mail does not constitute nor form part of a
>> contract or unilateral obligation. Opinions, conclusions and other
>> information in this message that do not relate to the official business of
>> findmypast shall be understood as neither given nor endorsed by it.
>> ________________________________
>>
>> ____________________________________________________________
>> ______________
>>
>> This email has been checked for virus and other malicious content prior
>> to leaving our network.
>> ____________________________________________________________
>> ______________
>>
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: Solr 6 and IDF

Posted by Webster Homer <we...@sial.com>.
I do want to use BM25, just disable IDF

On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster <
peter.lancaster@findmypast.com> wrote:

> Hi Webster,
>
> If you're not worried about using BM25 searcher then you should just be
> able to continue as you were before by providing your own similarity class
> that extends ClassicSimilarity and then override the idf method to always
> return 1,  then reference that in your schema
> e.g.
> <similarity class="brightsolid.solr.plugins.MyCustomSimilarity" />
>
> As far as I know you've been able to have different similarities per field
> in solr for a while now. https://wiki.apache.org/solr/SchemaXml#Similarity
>
> Cheers,
> Peter Lancaster.
>
>
> -----Original Message-----
> From: Webster Homer [mailto:webster.homer@sial.com]
> Sent: 08 August 2017 20:39
> To: solr-user@lucene.apache.org
> Subject: Solr 6 and IDF
>
> Our most common use for solr is searching for products, not text search.
> My company is in the process of migrating away from an Endeca search
> engine,  the goal to keep the business happy is to make sure that search
> results from the different engines be fairly similar, one area that we have
> found that suppresses a result from being as good as it was in the old
> system is the idf.
>
> We are using Solr 6. After moving to it, a lot of our results got better,
> but idf still seems to deaden some results. Given that our focus is product
> searching I really don't see a need for idf at all. Previous to Solr 6 you
> could suppress idf by providing a custom similarity class. Looking over the
> newer documentation a lot of things have improved, but I'm not sure I see a
> simple way to turn off idf in Solr 6's BM25 searcher.
>
> How do I disable IDF in Solr 6?
>
> We also do have needs for text searching so it would be nice if we could
> suppress IDF on a field or schema level
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
> ________________________________
>
> This message is confidential and may contain privileged information. You
> should not disclose its contents to any other person. If you are not the
> intended recipient, please notify the sender named above immediately. It is
> expressly declared that this e-mail does not constitute nor form part of a
> contract or unilateral obligation. Opinions, conclusions and other
> information in this message that do not relate to the official business of
> findmypast shall be understood as neither given nor endorsed by it.
> ________________________________
>
> __________________________________________________________________________
>
> This email has been checked for virus and other malicious content prior to
> leaving our network.
> __________________________________________________________________________
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

RE: Solr 6 and IDF

Posted by Peter Lancaster <pe...@findmypast.com>.
Hi Webster,

If you're not worried about using BM25 searcher then you should just be able to continue as you were before by providing your own similarity class that extends ClassicSimilarity and then override the idf method to always return 1,  then reference that in your schema
e.g.
<similarity class="brightsolid.solr.plugins.MyCustomSimilarity" />

As far as I know you've been able to have different similarities per field in solr for a while now. https://wiki.apache.org/solr/SchemaXml#Similarity

Cheers,
Peter Lancaster.


-----Original Message-----
From: Webster Homer [mailto:webster.homer@sial.com]
Sent: 08 August 2017 20:39
To: solr-user@lucene.apache.org
Subject: Solr 6 and IDF

Our most common use for solr is searching for products, not text search. My company is in the process of migrating away from an Endeca search engine,  the goal to keep the business happy is to make sure that search results from the different engines be fairly similar, one area that we have found that suppresses a result from being as good as it was in the old system is the idf.

We are using Solr 6. After moving to it, a lot of our results got better, but idf still seems to deaden some results. Given that our focus is product searching I really don't see a need for idf at all. Previous to Solr 6 you could suppress idf by providing a custom similarity class. Looking over the newer documentation a lot of things have improved, but I'm not sure I see a simple way to turn off idf in Solr 6's BM25 searcher.

How do I disable IDF in Solr 6?

We also do have needs for text searching so it would be nice if we could suppress IDF on a field or schema level

--


This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
________________________________

This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of findmypast shall be understood as neither given nor endorsed by it.
________________________________

__________________________________________________________________________

This email has been checked for virus and other malicious content prior to leaving our network.
__________________________________________________________________________