You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ivan Brusic <iv...@brusic.com> on 2013/08/06 01:41:13 UTC

Omitting term frequencies while preserving positions

As the subject says, is it possible to omit the term frequencies for a
field, but still keep positions? Term frequencies are omitted for better
scoring under our model, but positions are required for span queries. Are
the two concepts related? Are they indexed in the same data structure?

One option is to use a custom similarity that ignores term frequencies, but
I was wondering if there was a cleaner solution.

Cheers,

Ivan

Re: Omitting term frequencies while preserving positions

Posted by Ivan Brusic <iv...@brusic.com>.
Thanks Simon. I did not imagine the dependency was as simple as that. Still
grokking all the Lucene 4.x changes, although this issue has always been
present in Lucene.

-- 
Ivan


On Mon, Aug 5, 2013 at 9:58 PM, Simon Willnauer
<si...@gmail.com>wrote:

> the reason why you can't omit it today is that $num_position ==
> $term_frequency ie. we need to store it anyways. Yet, I kind of agree
> that this is an impl detail so we could in theory return 1 as the TF
> from the DocsAndPosEnum but this would break our APIs as well since
> DocsAndPositionsEnum requires you to call nextPos() up to freq() times
> otherwise the behaviour is undefined. So essentially if you dont' want
> to take the TF into account in your scoring model you kind of left
> with changing your similarity.
>
> simon
>
> On Tue, Aug 6, 2013 at 1:41 AM, Ivan Brusic <iv...@brusic.com> wrote:
> > As the subject says, is it possible to omit the term frequencies for a
> > field, but still keep positions? Term frequencies are omitted for better
> > scoring under our model, but positions are required for span queries. Are
> > the two concepts related? Are they indexed in the same data structure?
> >
> > One option is to use a custom similarity that ignores term frequencies,
> but
> > I was wondering if there was a cleaner solution.
> >
> > Cheers,
> >
> > Ivan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Omitting term frequencies while preserving positions

Posted by Simon Willnauer <si...@gmail.com>.
the reason why you can't omit it today is that $num_position ==
$term_frequency ie. we need to store it anyways. Yet, I kind of agree
that this is an impl detail so we could in theory return 1 as the TF
from the DocsAndPosEnum but this would break our APIs as well since
DocsAndPositionsEnum requires you to call nextPos() up to freq() times
otherwise the behaviour is undefined. So essentially if you dont' want
to take the TF into account in your scoring model you kind of left
with changing your similarity.

simon

On Tue, Aug 6, 2013 at 1:41 AM, Ivan Brusic <iv...@brusic.com> wrote:
> As the subject says, is it possible to omit the term frequencies for a
> field, but still keep positions? Term frequencies are omitted for better
> scoring under our model, but positions are required for span queries. Are
> the two concepts related? Are they indexed in the same data structure?
>
> One option is to use a custom similarity that ignores term frequencies, but
> I was wondering if there was a cleaner solution.
>
> Cheers,
>
> Ivan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org