You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Libbrecht <pa...@activemath.org> on 2010/05/11 11:52:14 UTC

Re: best way to interest two queries?

Dear lucene experts,

Let me try to make this precise since there was not answer.

I have a query that's, about,
   a & b & c
and I have a good search result.
Now I want to know:

a) for the first page, which matches are matches for a, b, or c
b) for the remaining results (for the "tail"), are there matches of a,  
b, or c

Thus far, I'd only know the usage of the highlighter to go to fields,  
it's not exactly the same and it's slow.
I know I could use termDocs or another search-result for a,b, and c,  
probably to annotate my initial results list; that could work well for  
a).

I still don't know what to do for b).

thanks for hints.

paul

Le 31-mars-10 à 23:00, Paul Libbrecht a écrit :
> I've been wandering around but I see no solution yet: I would like  
> to intersect two query results: going through the list of one query  
> and indicating which ones actually match the other query or, even  
> better, indicating that "passed this, nothing matches that query  
> anymore".
>
> What should be the strategy?



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: best way to interest two queries?

Posted by Paul Libbrecht <pa...@activemath.org>.
Le 12-mai-10 à 10:55, mark harwood a écrit :

>
>
>>> two terminology questions:
>
>>> - is multiplier in the mail mentioned there the same as boost?
>
> This factor controls how many decimal places precision is retained  
> in the adjusted scores. Pick to low a multiplier and scores that are  
> only differentiated by a very small value will appear equal. Pick  
> too high a multiplier and you start to lose the most significant  
> parts of the score. This trade-off is summarised here for various  
> settings of "multiplier":

sorry, had overseen. Yes, that seems realistic to me.

> - I intended to use prefix and fuzzyqueries. I believe this is  
> contradictory to this or?
>
> You can wrap any queries with this class - the only limitation is it  
> hides all match info in a single byte encoded into the score which  
> only allows for 8 bits or 8 match flags i.e. reports on max 8  
> clauses. You could try use > 8 bits encoded into the score but then  
> you lose more score precision again (see above).

I'm having an NPE with the query who's toString gives:

  QueryMatchMonitor (QueryMatchMonitor ((title-fr:segm~0.5 title- 
phonetic-fr:SKM~0.5^0.8)) QueryMatchMonitor (((title-en:segm~0.5 title- 
phonetic-en:SKM~0.5^0.8))^0.95) QueryMatchMonitor (((title-de:segm~0.5  
title-phonetic-de:SKM~0.5^0.8))^0.9025) QueryMatchMonitor (((title- 
en:segm~0.5 title-phonetic-en:SKM~0.5^0.8))^0.85737497))

at

java.lang.NullPointerException
	at org.apache.lucene.search.FlagRecordingQuery 
$FlagRecordingQueryWeight.scorer(FlagRecordingQuery.java:104)
	at org.apache.lucene.search.BooleanQuery 
$BooleanWeight.scorer(BooleanQuery.java:297)
	at org.apache.lucene.search.FlagCombiningQuery 
$FlagCombiningQueryWeight.scorer(FlagCombiningQuery.java:100)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java: 
246)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java: 
173)
	at org.apache.lucene.search.Searcher.search(Searcher.java:181)
	at org.apache.lucene.search.Searcher.search(Searcher.java:191)

(but note that I removed explain from my FlagCombiningQuery; what's  
null is the delegateScorer).
Any clue?

paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: best way to interest two queries?

Posted by mark harwood <ma...@yahoo.co.uk>.

>>two terminology questions:

>>- is multiplier in the mail mentioned there the same as boost?

This factor controls how many decimal places precision is retained in the adjusted scores. Pick to low a multiplier and scores that are only differentiated by a very small value will appear equal. Pick too high a multiplier and you start to lose the most significant parts of the score. This trade-off is summarised here for various settings of "multiplier":

multiplier       max score   fraction precision
======   ========   =============
10           838860         0.x
100         83886              0.xx
1000       8388             0.xxx
10000     838               0.xxxx

The default setting of 1000 seems like a safe setting for the typical scores generated by Lucene.

- I intended to use prefix and fuzzyqueries. I believe this is contradictory to this or?

You can wrap any queries with this class - the only limitation is it hides all match info in a single byte encoded into the score which only allows for 8 bits or 8 match flags i.e. reports on max 8 clauses. You could try use > 8 bits encoded into the score but then you lose more score precision again (see above).

Some thoughts on a less bit-twiddly, more robust approach:
Having played with the new Attribute stuff in 2.9/3.0 Analyzers recently I am intrigued with using a similar approach to capture low-level match metadata  i.e. clients decide what types of MatchAttributes are of interest and Query objects record match metadata in singleton MatchAttribute objects as they stream their way through result sets.
Result set streaming and tokenisation streams are similar problems and the Attribute design seems like it can apply here.

Cheers
Mark

Le 11-mai-10 à 12:02, mark harwood a écrit :

> See https://issues.apache.org/jira/browse/LUCENE-1999
> 
> 
> 
> ----- Original Message ----
> From: Paul Libbrecht <pa...@activemath.org>
> To: java-user@lucene.apache.org
> Sent: Tue, 11 May, 2010 10:52:14
> Subject: Re: best way to interest two queries?
> 
> Dear lucene experts,
> 
> Let me try to make this precise since there was not answer.
> 
> I have a query that's, about,
>  a & b & c
> and I have a good search result.
> Now I want to know:
> 
> a) for the first page, which matches are matches for a, b, or c
> b) for the remaining results (for the "tail"), are there matches of a, b, or c
> 
> Thus far, I'd only know the usage of the highlighter to go to fields, it's not exactly the same and it's slow.
> I know I could use termDocs or another search-result for a,b, and c, probably to annotate my initial results list; that could work well for a).
> 
> I still don't know what to do for b).
> 
> thanks for hints.
> 
> paul
> 
> Le 31-mars-10 à 23:00, Paul Libbrecht a écrit :
>> I've been wandering around but I see no solution yet: I would like to intersect two query results: going through the list of one query and indicating which ones actually match the other query or, even better, indicating that "passed this, nothing matches that query anymore".
>> 
>> What should be the strategy?
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: best way to interest two queries?

Posted by Paul Libbrecht <pa...@activemath.org>.
Very interesting, finding the field name is enough for me.
What's cute is to wrap in this Flag-queries because, indeed, I don't  
want to know the details of each matched query just some of them.

two terminology questions:

- is multiplier in the mail mentioned there the same as boost?

- I intended to use prefix and fuzzyqueries. I believe this is  
contradictory to this or?

paul


Le 11-mai-10 à 12:02, mark harwood a écrit :

> See https://issues.apache.org/jira/browse/LUCENE-1999
>
>
>
> ----- Original Message ----
> From: Paul Libbrecht <pa...@activemath.org>
> To: java-user@lucene.apache.org
> Sent: Tue, 11 May, 2010 10:52:14
> Subject: Re: best way to interest two queries?
>
> Dear lucene experts,
>
> Let me try to make this precise since there was not answer.
>
> I have a query that's, about,
>  a & b & c
> and I have a good search result.
> Now I want to know:
>
> a) for the first page, which matches are matches for a, b, or c
> b) for the remaining results (for the "tail"), are there matches of  
> a, b, or c
>
> Thus far, I'd only know the usage of the highlighter to go to  
> fields, it's not exactly the same and it's slow.
> I know I could use termDocs or another search-result for a,b, and c,  
> probably to annotate my initial results list; that could work well  
> for a).
>
> I still don't know what to do for b).
>
> thanks for hints.
>
> paul
>
> Le 31-mars-10 à 23:00, Paul Libbrecht a écrit :
>> I've been wandering around but I see no solution yet: I would like  
>> to intersect two query results: going through the list of one query  
>> and indicating which ones actually match the other query or, even  
>> better, indicating that "passed this, nothing matches that query  
>> anymore".
>>
>> What should be the strategy?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: best way to interest two queries?

Posted by mark harwood <ma...@yahoo.co.uk>.
See https://issues.apache.org/jira/browse/LUCENE-1999



----- Original Message ----
From: Paul Libbrecht <pa...@activemath.org>
To: java-user@lucene.apache.org
Sent: Tue, 11 May, 2010 10:52:14
Subject: Re: best way to interest two queries?

Dear lucene experts,

Let me try to make this precise since there was not answer.

I have a query that's, about,
  a & b & c
and I have a good search result.
Now I want to know:

a) for the first page, which matches are matches for a, b, or c
b) for the remaining results (for the "tail"), are there matches of a, b, or c

Thus far, I'd only know the usage of the highlighter to go to fields, it's not exactly the same and it's slow.
I know I could use termDocs or another search-result for a,b, and c, probably to annotate my initial results list; that could work well for a).

I still don't know what to do for b).

thanks for hints.

paul

Le 31-mars-10 à 23:00, Paul Libbrecht a écrit :
> I've been wandering around but I see no solution yet: I would like to intersect two query results: going through the list of one query and indicating which ones actually match the other query or, even better, indicating that "passed this, nothing matches that query anymore".
> 
> What should be the strategy?



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org