You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by Fiz N <fi...@gmail.com> on 2022/04/14 12:52:04 UTC

SOLR TF/IDF factor removal.

Hello Experts,

In our project we are using SOLR 8.11.1 in Standalone mode in Windows
server box.

We have implemented a search mechanism by using pure keyword match and
boosting the keywords as per business needs. For each search result, match
percentage is derived using the obtained SOLR document score. The SOLR
document score is the summation of individual keyword scores which is
derived by using boost factor and TF/IDF values of the keyword.

As per requirements, in our case the resultant scores should be dependent
only on the boost factor, whereas the implicit TF/IDF factor is causing
deviation in the expected results and also causing uncertainty in the
resultant ranking.

So we are looking for better approaches to eliminate/neutralize the SOLR
TF/IDF factor.

Please do let us know your suggestions in removal of TF/IDF factor or any
other solution approach that we can consider in this case.

Thanks
Fiz Fareedh.

Re: SOLR TF/IDF factor removal.

Posted by ca...@uca.es.

No es posible recibir su consulta por este medio. Para que nos llegue correctamente debe usar la aplicaci�n CAU: 

  http://cau.uca.es/


Agredeciendo su colaboraci�n.

        Saludos

----------------------------------------------------------------------------
Aplicaci�n CAU
http://cau.uca.es/

Re: SOLR TF/IDF factor removal.

Posted by Markus Jelsma <ma...@openindex.io>.

Hello Fiz,

Are you sure you are using TF*IDF? It is no longer the default function in
Solr 8. Anyway, if you do, you can implement a custom TFIDFSimilarity [1]
and return just 1.0 for the idf() and tf() functions.

Regards,
Markus

[1]
https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Op do 14 apr. 2022 om 14:52 schreef Fiz N <fi...@gmail.com>:

> Hello Experts,
>
> In our project we are using SOLR 8.11.1 in Standalone mode in Windows
> server box.
>
> We have implemented a search mechanism by using pure keyword match and
> boosting the keywords as per business needs. For each search result, match
> percentage is derived using the obtained SOLR document score. The SOLR
> document score is the summation of individual keyword scores which is
> derived by using boost factor and TF/IDF values of the keyword.
>
> As per requirements, in our case the resultant scores should be dependent
> only on the boost factor, whereas the implicit TF/IDF factor is causing
> deviation in the expected results and also causing uncertainty in the
> resultant ranking.
>
> So we are looking for better approaches to eliminate/neutralize the SOLR
> TF/IDF factor.
>
> Please do let us know your suggestions in removal of TF/IDF factor or any
> other solution approach that we can consider in this case.
>
> Thanks
> Fiz Fareedh.
>

Re: SOLR TF/IDF factor removal.

Posted by ca...@uca.es.

No es posible recibir su consulta por este medio. Para que nos llegue correctamente debe usar la aplicaci�n CAU: 

  http://cau.uca.es/


Agredeciendo su colaboraci�n.

        Saludos

----------------------------------------------------------------------------
Aplicaci�n CAU
http://cau.uca.es/

Re: SOLR TF/IDF factor removal.

Posted by Thomas Corthals <th...@klascement.net>.

You can tweak the parameters of BM25 similarity:
https://solr.apache.org/docs/8_1_1/solr-core/org/apache/solr/search/similarities/BM25SimilarityFactory.html

IIRC, the similarity becomes a constant with k1 = 0.

    <similarity class="solr.BM25SimilarityFactory">
        <float name="k1">0</float>
    </similarity>

Thomas

Op do 14 apr. 2022 om 15:51 schreef Vincenzo D'Amore <v....@gmail.com>:

> https://github.com/freedev/solr-constant-similarity
>
> this is just an implementation of what Markus was suggesting.
>
> Could be a good idea adding a constant similarity class into the solr
> standard distribution.
>
> On Thu, Apr 14, 2022 at 3:48 PM Vincenzo D'Amore <v....@gmail.com>
> wrote:
>
> > Hi,
> >
> > long time ago I wrote this, just trying to handle cases where there is no
> > need TF/IDF
> >
> > https://github.com/freedev/solr-constant-similarity
> >
> > There just two simple steps to follow:
> >
> >    1. Add this line in solrconfig.xml:
> >
> > <lib dir="../../../dist/" regex="constant-similarity-\d.*\.jar" />
> >
> >    1. And add this line into schema.xml:
> >
> > <similarity
> > class="it.damore.solr.similarity.ConstantTFSimilarity"></similarity>
> >
> >
> > <
> https://github.com/freedev/solr-constant-similarity#old-solr-versions-before-54
> >
> >
> > On Thu, Apr 14, 2022 at 3:30 PM Jeremy Buckley - IQ-C
> > <je...@gsa.gov.invalid> wrote:
> >
> >> You may be interested in the ^= modifier for queries.  From the
> reference
> >> guide:
> >>
> >> Constant Score with "^="
> >>
> >> Constant score queries are created with <query_clause>^=<score>, which
> >> sets
> >> the entire clause to the specified score for any documents matching that
> >> clause. This is desirable when you only care about matches for a
> >> particular
> >> clause and don’t want other relevancy factors such as term frequency
> (the
> >> number of times the term appears in the field) or inverse document
> >> frequency (a measure across the whole index for how rare a term is in a
> >> field).
> >>
> >> Example:
> >>
> >> (description:blue OR color:blue)^=1.0 text:shoes
> >>
> >>
> >> On Thu, Apr 14, 2022 at 8:52 AM Fiz N <fi...@gmail.com> wrote:
> >>
> >> > Hello Experts,
> >> >
> >> > In our project we are using SOLR 8.11.1 in Standalone mode in Windows
> >> > server box.
> >> >
> >> > We have implemented a search mechanism by using pure keyword match and
> >> > boosting the keywords as per business needs. For each search result,
> >> match
> >> > percentage is derived using the obtained SOLR document score. The SOLR
> >> > document score is the summation of individual keyword scores which is
> >> > derived by using boost factor and TF/IDF values of the keyword.
> >> >
> >> > As per requirements, in our case the resultant scores should be
> >> dependent
> >> > only on the boost factor, whereas the implicit TF/IDF factor is
> causing
> >> > deviation in the expected results and also causing uncertainty in the
> >> > resultant ranking.
> >> >
> >> > So we are looking for better approaches to eliminate/neutralize the
> SOLR
> >> > TF/IDF factor.
> >> >
> >> > Please do let us know your suggestions in removal of TF/IDF factor or
> >> any
> >> > other solution approach that we can consider in this case.
> >> >
> >> > Thanks
> >> > Fiz Fareedh.
> >> >
> >>
> >
> >
> > --
> > Vincenzo D'Amore
> >
> >
>
> --
> Vincenzo D'Amore
>

Re: SOLR TF/IDF factor removal.

Posted by Vincenzo D'Amore <v....@gmail.com>.

https://github.com/freedev/solr-constant-similarity

this is just an implementation of what Markus was suggesting.

Could be a good idea adding a constant similarity class into the solr
standard distribution.

On Thu, Apr 14, 2022 at 3:48 PM Vincenzo D'Amore <v....@gmail.com> wrote:

> Hi,
>
> long time ago I wrote this, just trying to handle cases where there is no
> need TF/IDF
>
> https://github.com/freedev/solr-constant-similarity
>
> There just two simple steps to follow:
>
>    1. Add this line in solrconfig.xml:
>
> <lib dir="../../../dist/" regex="constant-similarity-\d.*\.jar" />
>
>    1. And add this line into schema.xml:
>
> <similarity
> class="it.damore.solr.similarity.ConstantTFSimilarity"></similarity>
>
>
> <https://github.com/freedev/solr-constant-similarity#old-solr-versions-before-54>
>
> On Thu, Apr 14, 2022 at 3:30 PM Jeremy Buckley - IQ-C
> <je...@gsa.gov.invalid> wrote:
>
>> You may be interested in the ^= modifier for queries.  From the reference
>> guide:
>>
>> Constant Score with "^="
>>
>> Constant score queries are created with <query_clause>^=<score>, which
>> sets
>> the entire clause to the specified score for any documents matching that
>> clause. This is desirable when you only care about matches for a
>> particular
>> clause and don’t want other relevancy factors such as term frequency (the
>> number of times the term appears in the field) or inverse document
>> frequency (a measure across the whole index for how rare a term is in a
>> field).
>>
>> Example:
>>
>> (description:blue OR color:blue)^=1.0 text:shoes
>>
>>
>> On Thu, Apr 14, 2022 at 8:52 AM Fiz N <fi...@gmail.com> wrote:
>>
>> > Hello Experts,
>> >
>> > In our project we are using SOLR 8.11.1 in Standalone mode in Windows
>> > server box.
>> >
>> > We have implemented a search mechanism by using pure keyword match and
>> > boosting the keywords as per business needs. For each search result,
>> match
>> > percentage is derived using the obtained SOLR document score. The SOLR
>> > document score is the summation of individual keyword scores which is
>> > derived by using boost factor and TF/IDF values of the keyword.
>> >
>> > As per requirements, in our case the resultant scores should be
>> dependent
>> > only on the boost factor, whereas the implicit TF/IDF factor is causing
>> > deviation in the expected results and also causing uncertainty in the
>> > resultant ranking.
>> >
>> > So we are looking for better approaches to eliminate/neutralize the SOLR
>> > TF/IDF factor.
>> >
>> > Please do let us know your suggestions in removal of TF/IDF factor or
>> any
>> > other solution approach that we can consider in this case.
>> >
>> > Thanks
>> > Fiz Fareedh.
>> >
>>
>
>
> --
> Vincenzo D'Amore
>
>

-- 
Vincenzo D'Amore

Re: SOLR TF/IDF factor removal.

Posted by Vincenzo D'Amore <v....@gmail.com>.

Hi,

long time ago I wrote this, just trying to handle cases where there is no
need TF/IDF

https://github.com/freedev/solr-constant-similarity

There just two simple steps to follow:

   1. Add this line in solrconfig.xml:

<lib dir="../../../dist/" regex="constant-similarity-\d.*\.jar" />

   1. And add this line into schema.xml:

<similarity
class="it.damore.solr.similarity.ConstantTFSimilarity"></similarity>

<https://github.com/freedev/solr-constant-similarity#old-solr-versions-before-54>

On Thu, Apr 14, 2022 at 3:30 PM Jeremy Buckley - IQ-C
<je...@gsa.gov.invalid> wrote:

> You may be interested in the ^= modifier for queries.  From the reference
> guide:
>
> Constant Score with "^="
>
> Constant score queries are created with <query_clause>^=<score>, which sets
> the entire clause to the specified score for any documents matching that
> clause. This is desirable when you only care about matches for a particular
> clause and don’t want other relevancy factors such as term frequency (the
> number of times the term appears in the field) or inverse document
> frequency (a measure across the whole index for how rare a term is in a
> field).
>
> Example:
>
> (description:blue OR color:blue)^=1.0 text:shoes
>
>
> On Thu, Apr 14, 2022 at 8:52 AM Fiz N <fi...@gmail.com> wrote:
>
> > Hello Experts,
> >
> > In our project we are using SOLR 8.11.1 in Standalone mode in Windows
> > server box.
> >
> > We have implemented a search mechanism by using pure keyword match and
> > boosting the keywords as per business needs. For each search result,
> match
> > percentage is derived using the obtained SOLR document score. The SOLR
> > document score is the summation of individual keyword scores which is
> > derived by using boost factor and TF/IDF values of the keyword.
> >
> > As per requirements, in our case the resultant scores should be dependent
> > only on the boost factor, whereas the implicit TF/IDF factor is causing
> > deviation in the expected results and also causing uncertainty in the
> > resultant ranking.
> >
> > So we are looking for better approaches to eliminate/neutralize the SOLR
> > TF/IDF factor.
> >
> > Please do let us know your suggestions in removal of TF/IDF factor or any
> > other solution approach that we can consider in this case.
> >
> > Thanks
> > Fiz Fareedh.
> >
>


-- 
Vincenzo D'Amore

Re: SOLR TF/IDF factor removal.

Posted by ca...@uca.es.

No es posible recibir su consulta por este medio. Para que nos llegue correctamente debe usar la aplicaci�n CAU: 

  http://cau.uca.es/


Agredeciendo su colaboraci�n.

        Saludos

----------------------------------------------------------------------------
Aplicaci�n CAU
http://cau.uca.es/

Re: SOLR TF/IDF factor removal.

Posted by Jeremy Buckley - IQ-C <je...@gsa.gov.INVALID>.

You may be interested in the ^= modifier for queries.  From the reference
guide:

Constant Score with "^="

Constant score queries are created with <query_clause>^=<score>, which sets
the entire clause to the specified score for any documents matching that
clause. This is desirable when you only care about matches for a particular
clause and don’t want other relevancy factors such as term frequency (the
number of times the term appears in the field) or inverse document
frequency (a measure across the whole index for how rare a term is in a
field).

Example:

(description:blue OR color:blue)^=1.0 text:shoes

On Thu, Apr 14, 2022 at 8:52 AM Fiz N <fi...@gmail.com> wrote:

> Hello Experts,
>
> In our project we are using SOLR 8.11.1 in Standalone mode in Windows
> server box.
>
> We have implemented a search mechanism by using pure keyword match and
> boosting the keywords as per business needs. For each search result, match
> percentage is derived using the obtained SOLR document score. The SOLR
> document score is the summation of individual keyword scores which is
> derived by using boost factor and TF/IDF values of the keyword.
>
> As per requirements, in our case the resultant scores should be dependent
> only on the boost factor, whereas the implicit TF/IDF factor is causing
> deviation in the expected results and also causing uncertainty in the
> resultant ranking.
>
> So we are looking for better approaches to eliminate/neutralize the SOLR
> TF/IDF factor.
>
> Please do let us know your suggestions in removal of TF/IDF factor or any
> other solution approach that we can consider in this case.
>
> Thanks
> Fiz Fareedh.
>

Re: SOLR TF/IDF factor removal.

Posted by Andy Webb <an...@gmail.com>.

hi Fiz,

I think the BooleanSimilarityFactory is what you're looking for here - see
https://solr.apache.org/guide/8_11/other-schema-elements.html#similarity
and
https://solr.apache.org/docs/8_11_1/solr-core/index.html?org/apache/solr/search/similarities/BooleanSimilarityFactory.html


(I added this in https://issues.apache.org/jira/browse/SOLR-13751 but
haven't found a concrete use case for it yet other than for training
purposes - would be interested to know if it's useful to you!)

Andy


On Thu, 14 Apr 2022 at 13:52, Fiz N <fi...@gmail.com> wrote:

> Hello Experts,
>
> In our project we are using SOLR 8.11.1 in Standalone mode in Windows
> server box.
>
> We have implemented a search mechanism by using pure keyword match and
> boosting the keywords as per business needs. For each search result, match
> percentage is derived using the obtained SOLR document score. The SOLR
> document score is the summation of individual keyword scores which is
> derived by using boost factor and TF/IDF values of the keyword.
>
> As per requirements, in our case the resultant scores should be dependent
> only on the boost factor, whereas the implicit TF/IDF factor is causing
> deviation in the expected results and also causing uncertainty in the
> resultant ranking.
>
> So we are looking for better approaches to eliminate/neutralize the SOLR
> TF/IDF factor.
>
> Please do let us know your suggestions in removal of TF/IDF factor or any
> other solution approach that we can consider in this case.
>
> Thanks
> Fiz Fareedh.
>

Re: SOLR TF/IDF factor removal.

Posted by ca...@uca.es.

No es posible recibir su consulta por este medio. Para que nos llegue correctamente debe usar la aplicaci�n CAU: 

  http://cau.uca.es/


Agredeciendo su colaboraci�n.

        Saludos

----------------------------------------------------------------------------
Aplicaci�n CAU
http://cau.uca.es/