You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Karl Wettin <ka...@gmail.com> on 2009/06/02 14:04:19 UTC

Re: HitCollector#collect(int,float,Collection)

So, I've been sleeping on this for a few weeks. Would it be possible  
to solve this with a decorator? Perhaps a top level decorator that  
also decorates all subqueries at rewrite-time and then keeps the  
instantiated scorers bound to the top level decorator, i.e. makes the  
decorated query non resuable.

Query realQuery = ...
DecoratedQuery dq = new DecoratedQuery(realQuery);
searcher.search(dq, ..);
Map<Query, Float> dq.getScoringQueries();

Not quite sure if this is terrible or elegant.


     karl

7 apr 2009 kl. 12.17 skrev Michael McCandless:

> On Tue, Apr 7, 2009 at 6:13 AM, Karl Wettin <ka...@gmail.com>  
> wrote:
>>
>> 7 apr 2009 kl. 10.23 skrev Michael McCandless:
>>
>>> Do you mean tracking the "atomic queries" that caused a given hit to
>>> match (where "atomic query" is a query that actually uses
>>> TermDocs/Positions to check matching, vs other queries like
>>> BooleanQuery that "glomm together" sub-query matches)?
>>>
>>> EG for a boolean query w/ N clauses, which of those N clauses  
>>> matched?
>>
>> This is exactly what I mean. I do however think it makes sense to get
>> information about non atomic queries as it seems reasonble that the  
>> first
>> clause (boolean query '+(a b)') in '+(a b) -(+c +d)' is matching is  
>> more
>> interesting than only getting to know that one of the clauses of that
>> boolean query is matching.
>
> Ahh OK I agree.  So every query in the full tree should be able to
> state whether it matched the doc.
>
>>> A natural place to do this is Scorer API, ie extend it with a
>>> "getMatchingAtomicQueries" or some such.  Probably, for efficiency,
>>> each Query should be pre-assigned an int position, and then the
>>> matching is represented as a bit array, reused across matches.  Your
>>> collector could then ask the scorer for these bits if it wanted.
>>> There should be no performance cost for collectors that don't use  
>>> this
>>> functionality.
>>
>> I'll look in to it.
>>
>> Thanks for the feedback.
>>
>>
>>     karl
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: HitCollector#collect(int,float,Collection)

Posted by Michael McCandless <lu...@mikemccandless.com>.
My guess is such an approach could be made to work...

But I think I'd rather directly improve *Scorer so that they provide
such details (and you pay no performance cost if you don't ask for
these details).  Likewise for positional details of matching, which
highlighter could use.  And, then, we could absorb Span* back into
their primary counterparts.

Mike

On Tue, Jun 2, 2009 at 8:04 AM, Karl Wettin<ka...@gmail.com> wrote:
> So, I've been sleeping on this for a few weeks. Would it be possible to
> solve this with a decorator? Perhaps a top level decorator that also
> decorates all subqueries at rewrite-time and then keeps the instantiated
> scorers bound to the top level decorator, i.e. makes the decorated query non
> resuable.
>
> Query realQuery = ...
> DecoratedQuery dq = new DecoratedQuery(realQuery);
> searcher.search(dq, ..);
> Map<Query, Float> dq.getScoringQueries();
>
> Not quite sure if this is terrible or elegant.
>
>
>    karl
>
> 7 apr 2009 kl. 12.17 skrev Michael McCandless:
>
>> On Tue, Apr 7, 2009 at 6:13 AM, Karl Wettin <ka...@gmail.com> wrote:
>>>
>>> 7 apr 2009 kl. 10.23 skrev Michael McCandless:
>>>
>>>> Do you mean tracking the "atomic queries" that caused a given hit to
>>>> match (where "atomic query" is a query that actually uses
>>>> TermDocs/Positions to check matching, vs other queries like
>>>> BooleanQuery that "glomm together" sub-query matches)?
>>>>
>>>> EG for a boolean query w/ N clauses, which of those N clauses matched?
>>>
>>> This is exactly what I mean. I do however think it makes sense to get
>>> information about non atomic queries as it seems reasonble that the first
>>> clause (boolean query '+(a b)') in '+(a b) -(+c +d)' is matching is more
>>> interesting than only getting to know that one of the clauses of that
>>> boolean query is matching.
>>
>> Ahh OK I agree.  So every query in the full tree should be able to
>> state whether it matched the doc.
>>
>>>> A natural place to do this is Scorer API, ie extend it with a
>>>> "getMatchingAtomicQueries" or some such.  Probably, for efficiency,
>>>> each Query should be pre-assigned an int position, and then the
>>>> matching is represented as a bit array, reused across matches.  Your
>>>> collector could then ask the scorer for these bits if it wanted.
>>>> There should be no performance cost for collectors that don't use this
>>>> functionality.
>>>
>>> I'll look in to it.
>>>
>>> Thanks for the feedback.
>>>
>>>
>>>    karl
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org