You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicola Buso <nb...@ebi.ac.uk> on 2018/07/02 10:16:36 UTC

Re: TermInSetQuery keep terms order in results

Hi Uwe,

as said the sorting is calculated elsewhere upfront and the terms are
provided to Lucene in the order calculated (in any case in an not
ordered Set as by the query API).

I would like an API to keep the input order otherwise I will end up on
the usual problem that I can't re-order afterward because accessing the
results in a paginated way will make impossible this operation.


Nicola

On Mon, 2018-06-25 at 21:49 +0200, Uwe Schindler wrote:
> Hi Nicola,
> 
> if you sort it elsewhere, why do you care about sort order then? What
> you see as result is simple: As there is nothing available for
> scoring a constant score query returns the results in index order.
> That's wanted. There is no way to change this "default" order for a
> TermInSetQuery because it's missing information.
> 
> Uwe
> 
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> > -----Original Message-----
> > From: Nicola Buso <nb...@ebi.ac.uk>
> > Sent: Monday, June 25, 2018 5:09 PM
> > To: Uwe Schindler <uw...@thetaphi.de>; java-user@lucene.apache.org
> > Subject: Re: TermInSetQuery keep terms order in results
> > 
> > Hi Uwe,
> > 
> > thanks for the reply. TermInSetQuery cover most of my use case:
> > - thousands of term values (also 100,000)
> > - no need for scoring, because it's calculated elsewhere
> > - intersect with normal full text query for further filtering
> > 
> > Using a TermQuery do I risk to hit the
> > BooleanQuery.getMaxClauseCount()
> > limit?
> > 
> > Cheers,
> > 
> > 
> > Nicola
> > 
> > 
> > 
> > On Mon, 2018-06-25 at 16:52 +0200, Uwe Schindler wrote:
> > > Hi,
> > > 
> > > the TermInSetQuery is a so-called Constant Score Query. It is
> > > more
> > > meant as a filter, so you would need some "real" fulltext query
> > > in
> > > parallel. See the term-in-set query more like the SQL "IN"
> > > operator.
> > > It can be used to pass lots of identifiers to filter results
> > > (e.g.
> > > when you apply access rights or group policies for filtering
> > > users to
> > > your main query as a filter).
> > > 
> > > As it is a "set", which is by default unordered, the order of
> > > terms
> > > in the set is undefined. Internally TermInSetQuery reorders the
> > > terms
> > > to improve processing speed.
> > > 
> > > If you need scoring, use TermQuery wrapped by a BooleanQuery.
> > > Then
> > > you can apply some boosts to some terms to improve order (e.g.
> > > boost
> > > term queries coming first) and apply on a field without norms.
> > > 
> > > TermInSetQuery is fast because it neglects scoring and is just
> > > good
> > > at intersecting the terms dict with the given terms set.
> > > 
> > > Uwe
> > > 
> > > -----
> > > Uwe Schindler
> > > Achterdiek 19, D-28357 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > > 
> > > > -----Original Message-----
> > > > From: Nicola Buso <nb...@ebi.ac.uk>
> > > > Sent: Monday, June 25, 2018 1:23 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: TermInSetQuery keep terms order in results
> > > > 
> > > > Hi,
> > > > 
> > > > I need to use the TermInSetQuery, but I would like to keep the
> > > > sorting
> > > > of the results based on the term set order provided. Currently
> > > > seems
> > > > using a index documents insertion order in the results.
> > > > 
> > > > Is this already implemented somewhere or do I need to implement
> > > > a
> > > > CustomScoreQuery to calculate this score?
> > > > 
> > > > Cheers,
> > > > 
> > > > 
> > > > Nicola
> > > > 
> > > > 
> > > > --
> > > > Nicola Buso <nb...@ebi.ac.uk>
> > > > EMBL-EBI
> > > > 
> > > > -------------------------------------------------------------
> > > > ----
> > > > ----
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.o
> > > > rg
> > > 
> > > 
> > 
> > -----------------------------------------------------------------
> > ----
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
-- 
Nicola Buso <nb...@ebi.ac.uk>
EMBL-EBI

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TermInSetQuery keep terms order in results

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi Michael,

I have an index that contains the terms of the TermInSetQuery but the
score provided at query time, represented by the order in a List of
terms, is not known at indexing time; it depend from other calculations
done at runtime. What do you mean to index the ordinals?

I was wondering if I can wrap TermQuery in BoostQuery, where I boost
based on the ordinals I have and create a disjunction query of all the
terms; I was wondering how much slower than TermInSetQuery it can be.


Nicola



On Mon, 2018-07-02 at 06:41 -0400, Michael Sokolov wrote:
> Since you have the terms ordered, why not index their ordinals, and
> then sort by that?
> 
> On Mon, Jul 2, 2018, 6:16 AM Nicola Buso <nb...@ebi.ac.uk> wrote:
> > Hi Uwe,
> > 
> > as said the sorting is calculated elsewhere upfront and the terms
> > are
> > provided to Lucene in the order calculated (in any case in an not
> > ordered Set as by the query API).
> > 
> > I would like an API to keep the input order otherwise I will end up
> > on
> > the usual problem that I can't re-order afterward because accessing
> > the
> > results in a paginated way will make impossible this operation.
> > 
> > 
> > Nicola
> > 
> > On Mon, 2018-06-25 at 21:49 +0200, Uwe Schindler wrote:
> > > Hi Nicola,
> > > 
> > > if you sort it elsewhere, why do you care about sort order then?
> > What
> > > you see as result is simple: As there is nothing available for
> > > scoring a constant score query returns the results in index
> > order.
> > > That's wanted. There is no way to change this "default" order for
> > a
> > > TermInSetQuery because it's missing information.
> > > 
> > > Uwe
> > > 
> > > -----
> > > Uwe Schindler
> > > Achterdiek 19, D-28357 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > > 
> > > > -----Original Message-----
> > > > From: Nicola Buso <nb...@ebi.ac.uk>
> > > > Sent: Monday, June 25, 2018 5:09 PM
> > > > To: Uwe Schindler <uw...@thetaphi.de>; java-user@lucene.apache.or
> > g
> > > > Subject: Re: TermInSetQuery keep terms order in results
> > > > 
> > > > Hi Uwe,
> > > > 
> > > > thanks for the reply. TermInSetQuery cover most of my use case:
> > > > - thousands of term values (also 100,000)
> > > > - no need for scoring, because it's calculated elsewhere
> > > > - intersect with normal full text query for further filtering
> > > > 
> > > > Using a TermQuery do I risk to hit the
> > > > BooleanQuery.getMaxClauseCount()
> > > > limit?
> > > > 
> > > > Cheers,
> > > > 
> > > > 
> > > > Nicola
> > > > 
> > > > 
> > > > 
> > > > On Mon, 2018-06-25 at 16:52 +0200, Uwe Schindler wrote:
> > > > > Hi,
> > > > > 
> > > > > the TermInSetQuery is a so-called Constant Score Query. It is
> > > > > more
> > > > > meant as a filter, so you would need some "real" fulltext
> > query
> > > > > in
> > > > > parallel. See the term-in-set query more like the SQL "IN"
> > > > > operator.
> > > > > It can be used to pass lots of identifiers to filter results
> > > > > (e.g.
> > > > > when you apply access rights or group policies for filtering
> > > > > users to
> > > > > your main query as a filter).
> > > > > 
> > > > > As it is a "set", which is by default unordered, the order of
> > > > > terms
> > > > > in the set is undefined. Internally TermInSetQuery reorders
> > the
> > > > > terms
> > > > > to improve processing speed.
> > > > > 
> > > > > If you need scoring, use TermQuery wrapped by a BooleanQuery.
> > > > > Then
> > > > > you can apply some boosts to some terms to improve order
> > (e.g.
> > > > > boost
> > > > > term queries coming first) and apply on a field without
> > norms.
> > > > > 
> > > > > TermInSetQuery is fast because it neglects scoring and is
> > just
> > > > > good
> > > > > at intersecting the terms dict with the given terms set.
> > > > > 
> > > > > Uwe
> > > > > 
> > > > > -----
> > > > > Uwe Schindler
> > > > > Achterdiek 19, D-28357 Bremen
> > > > > http://www.thetaphi.de
> > > > > eMail: uwe@thetaphi.de
> > > > > 
> > > > > > -----Original Message-----
> > > > > > From: Nicola Buso <nb...@ebi.ac.uk>
> > > > > > Sent: Monday, June 25, 2018 1:23 PM
> > > > > > To: java-user@lucene.apache.org
> > > > > > Subject: TermInSetQuery keep terms order in results
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > I need to use the TermInSetQuery, but I would like to keep
> > the
> > > > > > sorting
> > > > > > of the results based on the term set order provided.
> > Currently
> > > > > > seems
> > > > > > using a index documents insertion order in the results.
> > > > > > 
> > > > > > Is this already implemented somewhere or do I need to
> > implement
> > > > > > a
> > > > > > CustomScoreQuery to calculate this score?
> > > > > > 
> > > > > > Cheers,
> > > > > > 
> > > > > > 
> > > > > > Nicola
> > > > > > 
> > > > > > 
> > > > > > --
> > > > > > Nicola Buso <nb...@ebi.ac.uk>
> > > > > > EMBL-EBI
> > > > > > 
> > > > > > ---------------------------------------------------------
> > ----
> > > > > > ----
> > > > > > ----
> > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache
> > .org
> > > > > > For additional commands, e-mail: java-user-help@lucene.apac
> > he.o
> > > > > > rg
> > > > > 
> > > > > 
> > > > 
> > > > -------------------------------------------------------------
> > ----
> > > > ----
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.o
> > rg
> > > 
> > > 
-- 
Nicola Buso <nb...@ebi.ac.uk>
EMBL-EBI

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TermInSetQuery keep terms order in results

Posted by Michael Sokolov <ms...@gmail.com>.
Since you have the terms ordered, why not index their ordinals, and then
sort by that?

On Mon, Jul 2, 2018, 6:16 AM Nicola Buso <nb...@ebi.ac.uk> wrote:

> Hi Uwe,
>
> as said the sorting is calculated elsewhere upfront and the terms are
> provided to Lucene in the order calculated (in any case in an not
> ordered Set as by the query API).
>
> I would like an API to keep the input order otherwise I will end up on
> the usual problem that I can't re-order afterward because accessing the
> results in a paginated way will make impossible this operation.
>
>
> Nicola
>
> On Mon, 2018-06-25 at 21:49 +0200, Uwe Schindler wrote:
> > Hi Nicola,
> >
> > if you sort it elsewhere, why do you care about sort order then? What
> > you see as result is simple: As there is nothing available for
> > scoring a constant score query returns the results in index order.
> > That's wanted. There is no way to change this "default" order for a
> > TermInSetQuery because it's missing information.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > > -----Original Message-----
> > > From: Nicola Buso <nb...@ebi.ac.uk>
> > > Sent: Monday, June 25, 2018 5:09 PM
> > > To: Uwe Schindler <uw...@thetaphi.de>; java-user@lucene.apache.org
> > > Subject: Re: TermInSetQuery keep terms order in results
> > >
> > > Hi Uwe,
> > >
> > > thanks for the reply. TermInSetQuery cover most of my use case:
> > > - thousands of term values (also 100,000)
> > > - no need for scoring, because it's calculated elsewhere
> > > - intersect with normal full text query for further filtering
> > >
> > > Using a TermQuery do I risk to hit the
> > > BooleanQuery.getMaxClauseCount()
> > > limit?
> > >
> > > Cheers,
> > >
> > >
> > > Nicola
> > >
> > >
> > >
> > > On Mon, 2018-06-25 at 16:52 +0200, Uwe Schindler wrote:
> > > > Hi,
> > > >
> > > > the TermInSetQuery is a so-called Constant Score Query. It is
> > > > more
> > > > meant as a filter, so you would need some "real" fulltext query
> > > > in
> > > > parallel. See the term-in-set query more like the SQL "IN"
> > > > operator.
> > > > It can be used to pass lots of identifiers to filter results
> > > > (e.g.
> > > > when you apply access rights or group policies for filtering
> > > > users to
> > > > your main query as a filter).
> > > >
> > > > As it is a "set", which is by default unordered, the order of
> > > > terms
> > > > in the set is undefined. Internally TermInSetQuery reorders the
> > > > terms
> > > > to improve processing speed.
> > > >
> > > > If you need scoring, use TermQuery wrapped by a BooleanQuery.
> > > > Then
> > > > you can apply some boosts to some terms to improve order (e.g.
> > > > boost
> > > > term queries coming first) and apply on a field without norms.
> > > >
> > > > TermInSetQuery is fast because it neglects scoring and is just
> > > > good
> > > > at intersecting the terms dict with the given terms set.
> > > >
> > > > Uwe
> > > >
> > > > -----
> > > > Uwe Schindler
> > > > Achterdiek 19, D-28357 Bremen
> > > > http://www.thetaphi.de
> > > > eMail: uwe@thetaphi.de
> > > >
> > > > > -----Original Message-----
> > > > > From: Nicola Buso <nb...@ebi.ac.uk>
> > > > > Sent: Monday, June 25, 2018 1:23 PM
> > > > > To: java-user@lucene.apache.org
> > > > > Subject: TermInSetQuery keep terms order in results
> > > > >
> > > > > Hi,
> > > > >
> > > > > I need to use the TermInSetQuery, but I would like to keep the
> > > > > sorting
> > > > > of the results based on the term set order provided. Currently
> > > > > seems
> > > > > using a index documents insertion order in the results.
> > > > >
> > > > > Is this already implemented somewhere or do I need to implement
> > > > > a
> > > > > CustomScoreQuery to calculate this score?
> > > > >
> > > > > Cheers,
> > > > >
> > > > >
> > > > > Nicola
> > > > >
> > > > >
> > > > > --
> > > > > Nicola Buso <nb...@ebi.ac.uk>
> > > > > EMBL-EBI
> > > > >
> > > > > -------------------------------------------------------------
> > > > > ----
> > > > > ----
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.o
> > > > > rg
> > > >
> > > >
> > >
> > > -----------------------------------------------------------------
> > > ----
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> --
> Nicola Buso <nb...@ebi.ac.uk>
> EMBL-EBI
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>