You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/10/12 13:04:05 UTC

[jira] [Commented] (LUCENE-6276) Add matchCost() api to TwoPhaseDocIdSetIterator

    [ https://issues.apache.org/jira/browse/LUCENE-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952926#comment-14952926 ] 

Adrien Grand commented on LUCENE-6276:
--------------------------------------

I think it would make more sense to sum up {{totalTermFreq/docFreq}} for each term instead of {{totalTermFreq/conjunctionDISI.cost()}}, so that we get the average number of positions per document? But otherwise I think you got the intention right. Something else to be careful with is that {{TermStatistics.totalTermFreq()}} may return -1, so we need a fallback for that case. Maybe we could just assume 1 position per document?

A related question is what definition we should give to {{matchCost()}}. The patch does not have the issue yet since it only deals with phrase queries, but eventually we should be able to compare the cost of eg. a phrase query against a doc values range query even though they perform very different computations. Maybe the javadocs of matchCost could suggest a scale of costs of operations that implementors of matchCost() could use in order to compute the cost of matching the two-phase iterator. It could be something like 1 for nextDoc(), nextPosition(), comparisons and basic arithmetic operations and eg. 10 for advance()?

> Add matchCost() api to TwoPhaseDocIdSetIterator
> -----------------------------------------------
>
>                 Key: LUCENE-6276
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6276
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-6276-ExactPhraseOnly.patch
>
>
> We could add a method like TwoPhaseDISI.matchCost() defined as something like estimate of nanoseconds or similar. 
> ConjunctionScorer could use this method to sort its 'twoPhaseIterators' array so that cheaper ones are called first. Today it has no idea if one scorer is a simple phrase scorer on a short field vs another that might do some geo calculation or more expensive stuff.
> PhraseScorers could implement this based on index statistics (e.g. totalTermFreq/maxDoc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org