You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Dmitry Kan <dm...@gmail.com> on 2015/11/05 10:25:04 UTC

ways to affect on SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite

Hello,

Cross-posting the same question from solr mailing list, hopefully with
better luck.

Are there ways to affect on strategy
behind SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite ?

As it seems, at the moment, the rewrite method loads max N words that
maximize term score. How can this be changed to load top terms by
frequency, for example?

An example is for comp* to load "company", if it was among top N most
frequent terms in the index. And not less obvious words "comp'd, comp692,
compacta" etc.

Thanks,
Dmitry

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: ways to affect on SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite

Posted by Dmitry Kan <dm...@gmail.com>.
Hi Alan,

Thanks! That is already something. I will take a look at the code.

Dmitry
5 нояб. 2015 г. 11:44 AM пользователь "Alan Woodward" <al...@flax.co.uk>
написал:

> Hi Dmitry,
>
> This isn't quite as simple as it seems, unfortunately, because
> TopTermsRewrite expects the 'score' for each term to be the same across all
> segments, and that won't be the case with frequencies.
>
> I tried to come up with a solution in LUCENE-6513, but we didn't really
> come to a consensus on how best to do it.  But you could probably take the
> code in there and use it to write your own RewriteMethod.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 5 Nov 2015, at 09:25, Dmitry Kan wrote:
>
> Hello,
>
> Cross-posting the same question from solr mailing list, hopefully with
> better luck.
>
> Are there ways to affect on strategy
> behind SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite ?
>
> As it seems, at the moment, the rewrite method loads max N words that
> maximize term score. How can this be changed to load top terms by
> frequency, for example?
>
> An example is for comp* to load "company", if it was among top N most
> frequent terms in the index. And not less obvious words "comp'd, comp692,
> compacta" etc.
>
> Thanks,
> Dmitry
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>
>
>

Re: ways to affect on SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite

Posted by Alan Woodward <al...@flax.co.uk>.
Hi Dmitry,

This isn't quite as simple as it seems, unfortunately, because TopTermsRewrite expects the 'score' for each term to be the same across all segments, and that won't be the case with frequencies.

I tried to come up with a solution in LUCENE-6513, but we didn't really come to a consensus on how best to do it.  But you could probably take the code in there and use it to write your own RewriteMethod.

Alan Woodward
www.flax.co.uk


On 5 Nov 2015, at 09:25, Dmitry Kan wrote:

> Hello,
> 
> Cross-posting the same question from solr mailing list, hopefully with better luck.
> 
> Are there ways to affect on strategy behind SpanMultiTermQueryWrapper.TopTermsSpanBooleanQueryRewrite ?
> 
> As it seems, at the moment, the rewrite method loads max N words that maximize term score. How can this be changed to load top terms by frequency, for example?
> 
> 
> An example is for comp* to load "company", if it was among top N most frequent terms in the index. And not less obvious words "comp'd, comp692, compacta" etc.
> 
> Thanks,
> Dmitry
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info