You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Brian Lamb <br...@journalexperts.com> on 2011/05/17 20:04:40 UTC

Disable IDF scoring on certain fields

Hi all,

I have a field defined in my schema.xml file as

<fieldType name="edgengram" class="solr.TextField"
positionIncrementGap="1000">
   <analyzer>
     <tokenizer class="solr.LowerCaseTokenizerFactory" />
     <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
maxGramSize="25" side="front" />
   </analyzer>
</fieldType>
<field name="myfield" multiValued="true" type="edgengram" indexed="true"
stored="true" required="false" omitNorms="true" />

I would like do disable IDF scoring on this field. I am not interested in
how rare the term is, I only care if the term is present or not. The idea is
that if a user does a search for "myfield:dog OR myfield:pony", that any
document containing dog or pony would be scored identically. In the case
that both showed up, that record would be moved to the top but all the
records where they both showed up would have the same score.

So long story short, how can I disable the idf score for this particular
field?

Thanks,

Brian Lamb

Re: Disable IDF scoring on certain fields

Posted by Robert Muir <rc...@gmail.com>.

On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
<ma...@openindex.io> wrote:
> If you still want IDF for other fields then i
> think you have a problem because Solr doesn't yet support per-field similarity.
>

it does in trunk: https://issues.apache.org/jira/browse/SOLR-2338

Re: Disable IDF scoring on certain fields

Posted by Brian Lamb <br...@journalexperts.com>.

I believe I have applied the patch correctly. However, I cannot seem to
figure out where the similarity class I create should reside. Any tips on
that?

Thanks,

Brian Lamb

On Tue, May 17, 2011 at 4:00 PM, Brian Lamb
<br...@journalexperts.com>wrote:

> Thank you Robert for pointing this out. This is not being used for
> autocomplete. I already have another core set up for that :-)
>
> The idea is like I outlined above. I just want a multivalued field that
> treats every term in the field the same so that the only way documents
> separate themselves is by an unrelated boost and/or matching on multiple
> terms in that field.
>
>
> On Tue, May 17, 2011 at 3:55 PM, Markus Jelsma <markus.jelsma@openindex.io
> > wrote:
>
>> Well, if you're experimental you can try trunk as Robert points out it has
>> been fixed there. If not, i guess you're stuck with creating another core.
>>
>> If this fieldType specifically used for auto-completion? If so, another
>> core,
>> preferably on another machine, is in my opinion the way to go.
>> Auto-completion
>> is tough in terms of performance.
>>
>> Thanks Robert for pointing to the Jira ticket.
>>
>> Cheers
>>
>> > Hi Markus,
>> >
>> > I was just looking at overriding DefaultSimilarity so your email was
>> well
>> > timed. The problem I have with it is as you mentioned, it does not seem
>> > possible to do it on a field by field basis. Has anyone had any luck
>> with
>> > doing some of the similarity functions on a field by field basis? I have
>> > need to do more than one of them and from what I can find, it seems that
>> > only computeNorm accounts for the name of the field.
>> >
>> > Thanks,
>> >
>> > Brian Lamb
>> >
>> > On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
>> >
>> > <ma...@openindex.io>wrote:
>> > > Hi,
>> > >
>> > > Although you can configure per field TF (by omitTermFreqAndPositions)
>> you
>> > > can't
>> > > do this for IDF. If you index is only used for this specific purpose
>> > > (seems like an auto-complete index) then you can override
>> > > DefaultSimilarity and return a static value for IDF. If you still want
>> > > IDF for other fields then i
>> > > think you have a problem because Solr doesn't yet support per-field
>> > > similarity.
>> > >
>> > >
>> > >
>> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav
>> > > a/org/apache/lucene/search/DefaultSimilarity.java?view=markup
>> > >
>> > > Cheers,
>> > >
>> > > > Hi all,
>> > > >
>> > > > I have a field defined in my schema.xml file as
>> > > >
>> > > > <fieldType name="edgengram" class="solr.TextField"
>> > > > positionIncrementGap="1000">
>> > > >
>> > > >    <analyzer>
>> > > >
>> > > >      <tokenizer class="solr.LowerCaseTokenizerFactory" />
>> > > >      <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
>> > > >
>> > > > maxGramSize="25" side="front" />
>> > > >
>> > > >    </analyzer>
>> > > >
>> > > > </fieldType>
>> > > > <field name="myfield" multiValued="true" type="edgengram"
>> > > > indexed="true" stored="true" required="false" omitNorms="true" />
>> > > >
>> > > > I would like do disable IDF scoring on this field. I am not
>> interested
>> > > > in how rare the term is, I only care if the term is present or not.
>> > > > The idea is that if a user does a search for "myfield:dog OR
>> > > > myfield:pony", that any document containing dog or pony would be
>> > > > scored identically. In the case that both showed up, that record
>> would
>> > > > be moved to the top but all the records where they both showed up
>> > > > would have the same score.
>> > > >
>> > > > So long story short, how can I disable the idf score for this
>> > > > particular field?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Brian Lamb
>>
>
>

Re: Disable IDF scoring on certain fields

Posted by Brian Lamb <br...@journalexperts.com>.

Thank you Robert for pointing this out. This is not being used for
autocomplete. I already have another core set up for that :-)

The idea is like I outlined above. I just want a multivalued field that
treats every term in the field the same so that the only way documents
separate themselves is by an unrelated boost and/or matching on multiple
terms in that field.


On Tue, May 17, 2011 at 3:55 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> Well, if you're experimental you can try trunk as Robert points out it has
> been fixed there. If not, i guess you're stuck with creating another core.
>
> If this fieldType specifically used for auto-completion? If so, another
> core,
> preferably on another machine, is in my opinion the way to go.
> Auto-completion
> is tough in terms of performance.
>
> Thanks Robert for pointing to the Jira ticket.
>
> Cheers
>
> > Hi Markus,
> >
> > I was just looking at overriding DefaultSimilarity so your email was well
> > timed. The problem I have with it is as you mentioned, it does not seem
> > possible to do it on a field by field basis. Has anyone had any luck with
> > doing some of the similarity functions on a field by field basis? I have
> > need to do more than one of them and from what I can find, it seems that
> > only computeNorm accounts for the name of the field.
> >
> > Thanks,
> >
> > Brian Lamb
> >
> > On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
> >
> > <ma...@openindex.io>wrote:
> > > Hi,
> > >
> > > Although you can configure per field TF (by omitTermFreqAndPositions)
> you
> > > can't
> > > do this for IDF. If you index is only used for this specific purpose
> > > (seems like an auto-complete index) then you can override
> > > DefaultSimilarity and return a static value for IDF. If you still want
> > > IDF for other fields then i
> > > think you have a problem because Solr doesn't yet support per-field
> > > similarity.
> > >
> > >
> > >
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav
> > > a/org/apache/lucene/search/DefaultSimilarity.java?view=markup
> > >
> > > Cheers,
> > >
> > > > Hi all,
> > > >
> > > > I have a field defined in my schema.xml file as
> > > >
> > > > <fieldType name="edgengram" class="solr.TextField"
> > > > positionIncrementGap="1000">
> > > >
> > > >    <analyzer>
> > > >
> > > >      <tokenizer class="solr.LowerCaseTokenizerFactory" />
> > > >      <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> > > >
> > > > maxGramSize="25" side="front" />
> > > >
> > > >    </analyzer>
> > > >
> > > > </fieldType>
> > > > <field name="myfield" multiValued="true" type="edgengram"
> > > > indexed="true" stored="true" required="false" omitNorms="true" />
> > > >
> > > > I would like do disable IDF scoring on this field. I am not
> interested
> > > > in how rare the term is, I only care if the term is present or not.
> > > > The idea is that if a user does a search for "myfield:dog OR
> > > > myfield:pony", that any document containing dog or pony would be
> > > > scored identically. In the case that both showed up, that record
> would
> > > > be moved to the top but all the records where they both showed up
> > > > would have the same score.
> > > >
> > > > So long story short, how can I disable the idf score for this
> > > > particular field?
> > > >
> > > > Thanks,
> > > >
> > > > Brian Lamb
>

Re: Disable IDF scoring on certain fields

Posted by Markus Jelsma <ma...@openindex.io>.

Well, if you're experimental you can try trunk as Robert points out it has 
been fixed there. If not, i guess you're stuck with creating another core.

If this fieldType specifically used for auto-completion? If so, another core, 
preferably on another machine, is in my opinion the way to go. Auto-completion 
is tough in terms of performance.

Thanks Robert for pointing to the Jira ticket.

Cheers

> Hi Markus,
> 
> I was just looking at overriding DefaultSimilarity so your email was well
> timed. The problem I have with it is as you mentioned, it does not seem
> possible to do it on a field by field basis. Has anyone had any luck with
> doing some of the similarity functions on a field by field basis? I have
> need to do more than one of them and from what I can find, it seems that
> only computeNorm accounts for the name of the field.
> 
> Thanks,
> 
> Brian Lamb
> 
> On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
> 
> <ma...@openindex.io>wrote:
> > Hi,
> > 
> > Although you can configure per field TF (by omitTermFreqAndPositions) you
> > can't
> > do this for IDF. If you index is only used for this specific purpose
> > (seems like an auto-complete index) then you can override
> > DefaultSimilarity and return a static value for IDF. If you still want
> > IDF for other fields then i
> > think you have a problem because Solr doesn't yet support per-field
> > similarity.
> > 
> > 
> > http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav
> > a/org/apache/lucene/search/DefaultSimilarity.java?view=markup
> > 
> > Cheers,
> > 
> > > Hi all,
> > > 
> > > I have a field defined in my schema.xml file as
> > > 
> > > <fieldType name="edgengram" class="solr.TextField"
> > > positionIncrementGap="1000">
> > > 
> > >    <analyzer>
> > >    
> > >      <tokenizer class="solr.LowerCaseTokenizerFactory" />
> > >      <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> > > 
> > > maxGramSize="25" side="front" />
> > > 
> > >    </analyzer>
> > > 
> > > </fieldType>
> > > <field name="myfield" multiValued="true" type="edgengram"
> > > indexed="true" stored="true" required="false" omitNorms="true" />
> > > 
> > > I would like do disable IDF scoring on this field. I am not interested
> > > in how rare the term is, I only care if the term is present or not.
> > > The idea is that if a user does a search for "myfield:dog OR
> > > myfield:pony", that any document containing dog or pony would be
> > > scored identically. In the case that both showed up, that record would
> > > be moved to the top but all the records where they both showed up
> > > would have the same score.
> > > 
> > > So long story short, how can I disable the idf score for this
> > > particular field?
> > > 
> > > Thanks,
> > > 
> > > Brian Lamb

Re: Disable IDF scoring on certain fields

Posted by Brian Lamb <br...@journalexperts.com>.

Hi Markus,

I was just looking at overriding DefaultSimilarity so your email was well
timed. The problem I have with it is as you mentioned, it does not seem
possible to do it on a field by field basis. Has anyone had any luck with
doing some of the similarity functions on a field by field basis? I have
need to do more than one of them and from what I can find, it seems that
only computeNorm accounts for the name of the field.

Thanks,

Brian Lamb

On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> Hi,
>
> Although you can configure per field TF (by omitTermFreqAndPositions) you
> can't
> do this for IDF. If you index is only used for this specific purpose (seems
> like an auto-complete index) then you can override DefaultSimilarity and
> return a static value for IDF. If you still want IDF for other fields then
> i
> think you have a problem because Solr doesn't yet support per-field
> similarity.
>
>
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/search/DefaultSimilarity.java?view=markup
>
> Cheers,
>
> > Hi all,
> >
> > I have a field defined in my schema.xml file as
> >
> > <fieldType name="edgengram" class="solr.TextField"
> > positionIncrementGap="1000">
> >    <analyzer>
> >      <tokenizer class="solr.LowerCaseTokenizerFactory" />
> >      <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> > maxGramSize="25" side="front" />
> >    </analyzer>
> > </fieldType>
> > <field name="myfield" multiValued="true" type="edgengram" indexed="true"
> > stored="true" required="false" omitNorms="true" />
> >
> > I would like do disable IDF scoring on this field. I am not interested in
> > how rare the term is, I only care if the term is present or not. The idea
> > is that if a user does a search for "myfield:dog OR myfield:pony", that
> > any document containing dog or pony would be scored identically. In the
> > case that both showed up, that record would be moved to the top but all
> > the records where they both showed up would have the same score.
> >
> > So long story short, how can I disable the idf score for this particular
> > field?
> >
> > Thanks,
> >
> > Brian Lamb
>

Re: Disable IDF scoring on certain fields

Posted by Markus Jelsma <ma...@openindex.io>.

Hi,

Although you can configure per field TF (by omitTermFreqAndPositions) you can't 
do this for IDF. If you index is only used for this specific purpose (seems 
like an auto-complete index) then you can override DefaultSimilarity and 
return a static value for IDF. If you still want IDF for other fields then i 
think you have a problem because Solr doesn't yet support per-field similarity.

http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/search/DefaultSimilarity.java?view=markup

Cheers,

> Hi all,
> 
> I have a field defined in my schema.xml file as
> 
> <fieldType name="edgengram" class="solr.TextField"
> positionIncrementGap="1000">
>    <analyzer>
>      <tokenizer class="solr.LowerCaseTokenizerFactory" />
>      <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="25" side="front" />
>    </analyzer>
> </fieldType>
> <field name="myfield" multiValued="true" type="edgengram" indexed="true"
> stored="true" required="false" omitNorms="true" />
> 
> I would like do disable IDF scoring on this field. I am not interested in
> how rare the term is, I only care if the term is present or not. The idea
> is that if a user does a search for "myfield:dog OR myfield:pony", that
> any document containing dog or pony would be scored identically. In the
> case that both showed up, that record would be moved to the top but all
> the records where they both showed up would have the same score.
> 
> So long story short, how can I disable the idf score for this particular
> field?
> 
> Thanks,
> 
> Brian Lamb