You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Alberto Squassabia <al...@optimus-corp.com> on 2005/09/27 00:46:45 UTC

query behavior

Hi!

I learnt from a mailing list archive that the following applies:


<quote>
---------------------------------
Tue, 06 Jan 2004 
[...]
I have a index with documents that have only 2 fields, the first
(unique) is 'very unique', in that most document have at least somewhat
varying terms, the second is a boolean that contains only (boolean)
'true' or 'false'. The index contains 100,000,000+ documents.
If I perform the following search "+unique:somevalue +boolean:true',
lucene with search on the first term, returning very few documents, but
then it will search the second term, returning possibly a million+
documents, then it will intersect the list, return 'hits' of only a few
documents.
[. . .]
This behavior has been observed with the 1.3 final code.
Robert Engels
---------------------------------
</quote>

Can anyone tell me if that is still true for 1.4?  Or if there are any
optimizations that is possible to hardcode in such a case (I have a
similar problem).

Cheers,

Alberto S
albertos_at_optimus-corp_dot_com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: query behavior

Posted by Paul Elschot <pa...@xs4all.nl>.

On Tuesday 27 September 2005 01:13, Chris Hostetter wrote:
> 
> I *believe* that because of the ConjunctionScorer in 1.9, BooleanQueries
> consisting of all required terms are now optimized for situations like
> this, the Scorer for the common clause won't be asked to score things that
> the un-common clause has allready given a score of 0.0.

ConjunctionScorer is already used in 1.4 for boolean queries with only
required clauses.

> 
> : Date: Mon, 26 Sep 2005 16:46:45 -0600
> : From: Alberto Squassabia <al...@optimus-corp.com>
> : Reply-To: java-user@lucene.apache.org
> : To: java-user@lucene.apache.org
> : Subject: query behavior
> :
> : Hi!
> :
> : I learnt from a mailing list archive that the following applies:
> :
> :
> : <quote>
> : ---------------------------------
> : Tue, 06 Jan 2004
> : [...]
> : I have a index with documents that have only 2 fields, the first
> : (unique) is 'very unique', in that most document have at least somewhat
> : varying terms, the second is a boolean that contains only (boolean)
> : 'true' or 'false'. The index contains 100,000,000+ documents.
> : If I perform the following search "+unique:somevalue +boolean:true',
> : lucene with search on the first term, returning very few documents, but
> : then it will search the second term, returning possibly a million+
> : documents, then it will intersect the list, return 'hits' of only a few
> : documents.
> : [. . .]
> : This behavior has been observed with the 1.3 final code.
> : Robert Engels
> : ---------------------------------
> : </quote>
> :
> : Can anyone tell me if that is still true for 1.4?  Or if there are any
> : optimizations that is possible to hardcode in such a case (I have a
> : similar problem).

The ConjunctionScorer in 1.4 will do this optimization in the case of
a query with only required clauses.
In what way is your problem similar? The development version
has "similar" facilities for required/optional and required/excluded
queries.

Regards,
Paul Elschot.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: query behavior

Posted by Chris Hostetter <ho...@fucit.org>.

I *believe* that because of the ConjunctionScorer in 1.9, BooleanQueries
consisting of all required terms are now optimized for situations like
this, the Scorer for the common clause won't be asked to score things that
the un-common clause has allready given a score of 0.0.


: Date: Mon, 26 Sep 2005 16:46:45 -0600
: From: Alberto Squassabia <al...@optimus-corp.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: query behavior
:
: Hi!
:
: I learnt from a mailing list archive that the following applies:
:
:
: <quote>
: ---------------------------------
: Tue, 06 Jan 2004
: [...]
: I have a index with documents that have only 2 fields, the first
: (unique) is 'very unique', in that most document have at least somewhat
: varying terms, the second is a boolean that contains only (boolean)
: 'true' or 'false'. The index contains 100,000,000+ documents.
: If I perform the following search "+unique:somevalue +boolean:true',
: lucene with search on the first term, returning very few documents, but
: then it will search the second term, returning possibly a million+
: documents, then it will intersect the list, return 'hits' of only a few
: documents.
: [. . .]
: This behavior has been observed with the 1.3 final code.
: Robert Engels
: ---------------------------------
: </quote>
:
: Can anyone tell me if that is still true for 1.4?  Or if there are any
: optimizations that is possible to hardcode in such a case (I have a
: similar problem).
:
: Cheers,
:
: Alberto S
: albertos_at_optimus-corp_dot_com
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org