You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Joel Halbert <jo...@su3analytics.com> on 2011/10/08 09:37:01 UTC

Custom Similarity

Hi,

Does anyone have a modified scoring (Similarity) function they would
care to share?

I'm searching web page documents and find the default Similarity seems
to assign too much weight to documents with frequent occurrence of a
single term from the query and not enough weight to documents that
contain a greater overlap of the search query terms.

I've been playing around with overriding the default but wondering if
anyone has an implementation they have found to work well that they
would care to share.

Thanks in advance,
Joel


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Custom Similarity

Posted by ppp c <pe...@gmail.com>.
That's what phaseQuery does.
Try phaseQuery to match the overlap, i think

On Sat, Oct 8, 2011 at 3:37 PM, Joel Halbert <jo...@su3analytics.com> wrote:

> Hi,
>
> Does anyone have a modified scoring (Similarity) function they would
> care to share?
>
> I'm searching web page documents and find the default Similarity seems
> to assign too much weight to documents with frequent occurrence of a
> single term from the query and not enough weight to documents that
> contain a greater overlap of the search query terms.
>
> I've been playing around with overriding the default but wondering if
> anyone has an implementation they have found to work well that they
> would care to share.
>
> Thanks in advance,
> Joel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Custom Similarity

Posted by Robert Muir <rc...@gmail.com>.
On Sat, Oct 8, 2011 at 3:37 AM, Joel Halbert <jo...@su3analytics.com> wrote:
> Hi,
>
> Does anyone have a modified scoring (Similarity) function they would
> care to share?
>
> I'm searching web page documents and find the default Similarity seems
> to assign too much weight to documents with frequent occurrence of a
> single term from the query and not enough weight to documents that
> contain a greater overlap of the search query terms.
>
> I've been playing around with overriding the default but wondering if
> anyone has an implementation they have found to work well that they
> would care to share.
>

have a look at coord(), you might want to further punish documents
that don't contain all the query terms.

something like:

@Override
public float coord(int overlap, int maxOverlap) {
  return (overlap == maxOverlap)
  ? 1f
  : 0.5f * super.coord(overlap, maxOverlap);
}


-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org