You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kumaran Ramasubramanian <ku...@gmail.com> on 2017/07/17 14:28:03 UTC

Filters Vs queries - for terms more than 1024

Hi All,

i am using lucene 4.10.4

In lucene search, i know we have 1024 limitation in number of boolean query
clauses. i know we can increase this limit.. but i want to understand
queries vs filter in lucene 4.10.4...

i want to make queries larger than 1024.. Relevance is not needed for
me. What are the best possible options?

1. using boolean filters is working for even 1lakh Filter Clauses in
booleanFilter... is there any consequence using filters in this case? shall
i proceed with this?

2. if i am giving very less memory for filters, it is managed to complete a
search after so much GC cycles.. Why cannot we do the same for query
clauses too? What is the actual technical reason for 1024 limitation in
boolean query?

3. if i disable scoring process using ConstantScoreQuery, is it possible
give more than 1024 query clauses?
       i tried this.. But still getting java.lang.OutOfMemoryError.. Why ?

java.lang.OutOfMemoryError: Java heap space
>
> at
>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
>
> at
>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:254)
>
> at
>> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs(SegmentTermsEnum.java:999)
>
> at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
>
> at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
>
> at
>> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
>
> at
>> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:164)
>
> at
>> org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
>
> at
>> org.apache.lucene.search.FilteredQuery$FilterStrategy.filteredBulkScorer(FilteredQuery.java:504)
>
> at
>> org.apache.lucene.search.FilteredQuery$1.bulkScorer(FilteredQuery.java:150)
>
>



Any pointers are much appreciated... Thank you..



--
Kumaran R

Re: Filters Vs queries - for terms more than 1024

Posted by Adrien Grand <jp...@gmail.com>.
BooleanQuery is subject to the 1024 limit on the number of clauses, so you
can't use it in that case. You should use TermsQuery/TermsFilter instead.

Le mer. 19 juil. 2017 à 13:52, Kumaran Ramasubramanian <ku...@gmail.com>
a écrit :

> Hi Adrien
>
>
> i have tried
> ​
> BooleanQuery with ConstantScoreQuery based suggestion from this link,
>
> http://lucene.472066.n3.nabble.com/BooleanFilter-vs-BooleanQuery-performance-td4106920.html
>
> If you want it fast, use
> > ​​
> > BooleanQuery and wrap it with ConstantScoreQuery. Then there is also no
> > scoring done (in most cases, older BooleanQuery sometimes still
> calculated
> > the score).
>
>
>
>
> 3. if i disable scoring process using ConstantScoreQuery, is it possible
> > give more than 1024 query clauses?
> >        i tried this.. But still getting java.lang.OutOfMemoryError.. Why
> ?
>
>
> java.lang.OutOfMemoryError: Java heap space
> > at
> >
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > at
> >
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:254)
> > at
> >
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs(SegmentTermsEnum.java:999)
> > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > at
> org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> > at
> >
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
> > at
> >
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:164)
> > at
> >
> org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
> > at
> >
> org.apache.lucene.search.FilteredQuery$FilterStrategy.filteredBulkScorer(FilteredQuery.java:504)
> > at
> >
> org.apache.lucene.search.FilteredQuery$1.bulkScorer(FilteredQuery.java:150)
>
>
>
>
> If i use BooleanQuery and wrap it with ConstantScoreQuery, shall i use 1
> lakh boolean clauses in booleanquery ?
>
>
>
>
>
> -
> ​-
> Kumaran R
>
> ​
>
> On Wed, Jul 19, 2017 at 8:26 AM, Kumaran Ramasubramanian <
> kums.134@gmail.com
> > wrote:
>
> >
> >
> > Thank you Adrien :-)
> >
> >
> >
> > On 18-Jul-2017 3:21 PM, "Adrien Grand" <jp...@gmail.com> wrote:
> >
> > Sorry for the confusion, I keep saying query in all cases because queries
> > and filters got merged in Lucene 5.0. If you are using BooleanFilter
> rather
> > than BooleanQuery with Lucene 4 then things should be mostly ok if you
> have
> > many clauses. But like TermsQuery, BooleanFilter always consume all
> > matching documents from all its clauses. So if you intersect it with a
> > selective query, it is wasteful.
> >
> > Le mar. 18 juil. 2017 à 11:42, Kumaran Ramasubramanian <
> kums.134@gmail.com
> > >
> > a écrit :
> >
> > > ​Hi Adrien,
> > >
> > > Thanks for your input...
> > >
> > > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > > booleanFilter... is there any consequence using filters in this case?
> > > shall
> > > > i proceed with this?
> > >
> > >
> > > ​code snippet i used for this statement 1.. ​
> > >
> > >                 for (int i = 0; i < 10
> > > > ​00​
> > > > 00; i++)
> > > >                 {
> > > >                     Term term = new Term("
> > > > ​key
> > > > "
> > > > ​+i​
> > > > , "
> > > > ​value
> > > > "
> > > > ​+i​
> > > > );
> > > >                     TermsFilter filter = new
> > > > ​​
> > > > TermsFilter(term);
> > > >                     FilterClause filterClause = new
> > FilterClause(filter,
> > > > BooleanClause.Occur.SHOULD);
> > > >                     boolFilter.add(filterClause);
> > > >                 }
> > >
> > >
> > >
> > > Do you see any problem in using
> > > ​
> > > TermsFilter over TermsQuery?
> > >
> > > btw, i will test with TermsQuery and let you know.
> > >
> > >
> > >
> > > ​--
> > > Kumaran ​R
> > >
> > >
> > >
> > >
> > > On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <jp...@gmail.com>
> wrote:
> > >
> > > > Could you use TermInSetQuery (TermsQuery in older Lucene versions)?
> It
> > is
> > > > worse at skipping over matches than a BooleanQuery but keeps memory
> > > > usage low and disk access sequential, on the contrary to large
> boolean
> > > > queries.
> > > >
> > > > Otherwise you would probably need to rethink how you design your
> > > documents
> > > > in order to be able to run simpler queries.
> > > >
> > > > Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <
> > > kums.134@gmail.com
> > > > >
> > > > a écrit :
> > > >
> > > > > Hi All,
> > > > >
> > > > > i am using lucene 4.10.4
> > > > >
> > > > > In lucene search, i know we have 1024 limitation in number of
> boolean
> > > > query
> > > > > clauses. i know we can increase this limit.. but i want to
> understand
> > > > > queries vs filter in lucene 4.10.4...
> > > > >
> > > > > i want to make queries larger than 1024.. Relevance is not needed
> for
> > > > > me. What are the best possible options?
> > > > >
> > > > > 1. using boolean filters is working for even 1lakh Filter Clauses
> in
> > > > > booleanFilter... is there any consequence using filters in this
> case?
> > > > shall
> > > > > i proceed with this?
> > > > >
> > > > > 2. if i am giving very less memory for filters, it is managed to
> > > > complete a
> > > > > search after so much GC cycles.. Why cannot we do the same for
> query
> > > > > clauses too? What is the actual technical reason for 1024
> limitation
> > in
> > > > > boolean query?
> > > > >
> > > > > 3. if i disable scoring process using ConstantScoreQuery, is it
> > > possible
> > > > > give more than 1024 query clauses?
> > > > >        i tried this.. But still getting
> java.lang.OutOfMemoryError..
> > > Why
> > > > ?
> > > > >
> > > > > java.lang.OutOfMemoryError: Java heap space
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> > > > BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> > > > Lucene41PostingsReader.java:254)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> > > > docs(SegmentTermsEnum.java:999)
> > > > > >
> > > > > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > > > > >
> > > > > > at
> > > > > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQue
> > ry.java:84)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> > > > scorer(BooleanQuery.java:356)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> > > > ConstantScoreQuery.java:164)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> > > > filteredScorer(FilteredQuery.java:542)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> > > > filteredBulkScorer(FilteredQuery.java:504)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> > > > FilteredQuery.java:150)
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > Any pointers are much appreciated... Thank you..
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Kumaran R
> > > > >
> > > >
> > >
> >
> >
> >
>

Re: Filters Vs queries - for terms more than 1024

Posted by Kumaran Ramasubramanian <ku...@gmail.com>.
Hi Adrien


i have tried
​
BooleanQuery with ConstantScoreQuery based suggestion from this link,
http://lucene.472066.n3.nabble.com/BooleanFilter-vs-BooleanQuery-performance-td4106920.html

If you want it fast, use
> ​​
> BooleanQuery and wrap it with ConstantScoreQuery. Then there is also no
> scoring done (in most cases, older BooleanQuery sometimes still calculated
> the score).




3. if i disable scoring process using ConstantScoreQuery, is it possible
> give more than 1024 query clauses?
>        i tried this.. But still getting java.lang.OutOfMemoryError.. Why ?


java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> at
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:254)
> at
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs(SegmentTermsEnum.java:999)
> at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> at
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
> at
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:164)
> at
> org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
> at
> org.apache.lucene.search.FilteredQuery$FilterStrategy.filteredBulkScorer(FilteredQuery.java:504)
> at
> org.apache.lucene.search.FilteredQuery$1.bulkScorer(FilteredQuery.java:150)




If i use BooleanQuery and wrap it with ConstantScoreQuery, shall i use 1
lakh boolean clauses in booleanquery ?





-
​-
Kumaran R

​

On Wed, Jul 19, 2017 at 8:26 AM, Kumaran Ramasubramanian <kums.134@gmail.com
> wrote:

>
>
> Thank you Adrien :-)
>
>
>
> On 18-Jul-2017 3:21 PM, "Adrien Grand" <jp...@gmail.com> wrote:
>
> Sorry for the confusion, I keep saying query in all cases because queries
> and filters got merged in Lucene 5.0. If you are using BooleanFilter rather
> than BooleanQuery with Lucene 4 then things should be mostly ok if you have
> many clauses. But like TermsQuery, BooleanFilter always consume all
> matching documents from all its clauses. So if you intersect it with a
> selective query, it is wasteful.
>
> Le mar. 18 juil. 2017 à 11:42, Kumaran Ramasubramanian <kums.134@gmail.com
> >
> a écrit :
>
> > ​Hi Adrien,
> >
> > Thanks for your input...
> >
> > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > booleanFilter... is there any consequence using filters in this case?
> > shall
> > > i proceed with this?
> >
> >
> > ​code snippet i used for this statement 1.. ​
> >
> >                 for (int i = 0; i < 10
> > > ​00​
> > > 00; i++)
> > >                 {
> > >                     Term term = new Term("
> > > ​key
> > > "
> > > ​+i​
> > > , "
> > > ​value
> > > "
> > > ​+i​
> > > );
> > >                     TermsFilter filter = new
> > > ​​
> > > TermsFilter(term);
> > >                     FilterClause filterClause = new
> FilterClause(filter,
> > > BooleanClause.Occur.SHOULD);
> > >                     boolFilter.add(filterClause);
> > >                 }
> >
> >
> >
> > Do you see any problem in using
> > ​
> > TermsFilter over TermsQuery?
> >
> > btw, i will test with TermsQuery and let you know.
> >
> >
> >
> > ​--
> > Kumaran ​R
> >
> >
> >
> >
> > On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <jp...@gmail.com> wrote:
> >
> > > Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It
> is
> > > worse at skipping over matches than a BooleanQuery but keeps memory
> > > usage low and disk access sequential, on the contrary to large boolean
> > > queries.
> > >
> > > Otherwise you would probably need to rethink how you design your
> > documents
> > > in order to be able to run simpler queries.
> > >
> > > Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <
> > kums.134@gmail.com
> > > >
> > > a écrit :
> > >
> > > > Hi All,
> > > >
> > > > i am using lucene 4.10.4
> > > >
> > > > In lucene search, i know we have 1024 limitation in number of boolean
> > > query
> > > > clauses. i know we can increase this limit.. but i want to understand
> > > > queries vs filter in lucene 4.10.4...
> > > >
> > > > i want to make queries larger than 1024.. Relevance is not needed for
> > > > me. What are the best possible options?
> > > >
> > > > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > > booleanFilter... is there any consequence using filters in this case?
> > > shall
> > > > i proceed with this?
> > > >
> > > > 2. if i am giving very less memory for filters, it is managed to
> > > complete a
> > > > search after so much GC cycles.. Why cannot we do the same for query
> > > > clauses too? What is the actual technical reason for 1024 limitation
> in
> > > > boolean query?
> > > >
> > > > 3. if i disable scoring process using ConstantScoreQuery, is it
> > possible
> > > > give more than 1024 query clauses?
> > > >        i tried this.. But still getting java.lang.OutOfMemoryError..
> > Why
> > > ?
> > > >
> > > > java.lang.OutOfMemoryError: Java heap space
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> > > BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> > > Lucene41PostingsReader.java:254)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> > > docs(SegmentTermsEnum.java:999)
> > > > >
> > > > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > > > >
> > > > > at
> > > > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQue
> ry.java:84)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> > > scorer(BooleanQuery.java:356)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> > > ConstantScoreQuery.java:164)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> > > filteredScorer(FilteredQuery.java:542)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> > > filteredBulkScorer(FilteredQuery.java:504)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> > > FilteredQuery.java:150)
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > Any pointers are much appreciated... Thank you..
> > > >
> > > >
> > > >
> > > > --
> > > > Kumaran R
> > > >
> > >
> >
>
>
>

Re: Filters Vs queries - for terms more than 1024

Posted by Kumaran Ramasubramanian <ku...@gmail.com>.
Thank you Adrien :-)



On 18-Jul-2017 3:21 PM, "Adrien Grand" <jp...@gmail.com> wrote:

Sorry for the confusion, I keep saying query in all cases because queries
and filters got merged in Lucene 5.0. If you are using BooleanFilter rather
than BooleanQuery with Lucene 4 then things should be mostly ok if you have
many clauses. But like TermsQuery, BooleanFilter always consume all
matching documents from all its clauses. So if you intersect it with a
selective query, it is wasteful.

Le mar. 18 juil. 2017 à 11:42, Kumaran Ramasubramanian <ku...@gmail.com>
a écrit :

> ​Hi Adrien,
>
> Thanks for your input...
>
> 1. using boolean filters is working for even 1lakh Filter Clauses in
> > booleanFilter... is there any consequence using filters in this case?
> shall
> > i proceed with this?
>
>
> ​code snippet i used for this statement 1.. ​
>
>                 for (int i = 0; i < 10
> > ​00​
> > 00; i++)
> >                 {
> >                     Term term = new Term("
> > ​key
> > "
> > ​+i​
> > , "
> > ​value
> > "
> > ​+i​
> > );
> >                     TermsFilter filter = new
> > ​​
> > TermsFilter(term);
> >                     FilterClause filterClause = new FilterClause(filter,
> > BooleanClause.Occur.SHOULD);
> >                     boolFilter.add(filterClause);
> >                 }
>
>
>
> Do you see any problem in using
> ​
> TermsFilter over TermsQuery?
>
> btw, i will test with TermsQuery and let you know.
>
>
>
> ​--
> Kumaran ​R
>
>
>
>
> On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <jp...@gmail.com> wrote:
>
> > Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It
is
> > worse at skipping over matches than a BooleanQuery but keeps memory
> > usage low and disk access sequential, on the contrary to large boolean
> > queries.
> >
> > Otherwise you would probably need to rethink how you design your
> documents
> > in order to be able to run simpler queries.
> >
> > Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <
> kums.134@gmail.com
> > >
> > a écrit :
> >
> > > Hi All,
> > >
> > > i am using lucene 4.10.4
> > >
> > > In lucene search, i know we have 1024 limitation in number of boolean
> > query
> > > clauses. i know we can increase this limit.. but i want to understand
> > > queries vs filter in lucene 4.10.4...
> > >
> > > i want to make queries larger than 1024.. Relevance is not needed for
> > > me. What are the best possible options?
> > >
> > > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > booleanFilter... is there any consequence using filters in this case?
> > shall
> > > i proceed with this?
> > >
> > > 2. if i am giving very less memory for filters, it is managed to
> > complete a
> > > search after so much GC cycles.. Why cannot we do the same for query
> > > clauses too? What is the actual technical reason for 1024 limitation
in
> > > boolean query?
> > >
> > > 3. if i disable scoring process using ConstantScoreQuery, is it
> possible
> > > give more than 1024 query clauses?
> > >        i tried this.. But still getting java.lang.OutOfMemoryError..
> Why
> > ?
> > >
> > > java.lang.OutOfMemoryError: Java heap space
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> > BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> > Lucene41PostingsReader.java:254)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> > docs(SegmentTermsEnum.java:999)
> > > >
> > > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > > >
> > > > at
> > > org.apache.lucene.search.TermQuery$TermWeight.scorer(
TermQuery.java:84)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> > scorer(BooleanQuery.java:356)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> > ConstantScoreQuery.java:164)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> > filteredScorer(FilteredQuery.java:542)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> > filteredBulkScorer(FilteredQuery.java:504)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> > FilteredQuery.java:150)
> > > >
> > > >
> > >
> > >
> > >
> > > Any pointers are much appreciated... Thank you..
> > >
> > >
> > >
> > > --
> > > Kumaran R
> > >
> >
>

Re: Filters Vs queries - for terms more than 1024

Posted by Adrien Grand <jp...@gmail.com>.
Sorry for the confusion, I keep saying query in all cases because queries
and filters got merged in Lucene 5.0. If you are using BooleanFilter rather
than BooleanQuery with Lucene 4 then things should be mostly ok if you have
many clauses. But like TermsQuery, BooleanFilter always consume all
matching documents from all its clauses. So if you intersect it with a
selective query, it is wasteful.

Le mar. 18 juil. 2017 à 11:42, Kumaran Ramasubramanian <ku...@gmail.com>
a écrit :

> ​Hi Adrien,
>
> Thanks for your input...
>
> 1. using boolean filters is working for even 1lakh Filter Clauses in
> > booleanFilter... is there any consequence using filters in this case?
> shall
> > i proceed with this?
>
>
> ​code snippet i used for this statement 1.. ​
>
>                 for (int i = 0; i < 10
> > ​00​
> > 00; i++)
> >                 {
> >                     Term term = new Term("
> > ​key
> > "
> > ​+i​
> > , "
> > ​value
> > "
> > ​+i​
> > );
> >                     TermsFilter filter = new
> > ​​
> > TermsFilter(term);
> >                     FilterClause filterClause = new FilterClause(filter,
> > BooleanClause.Occur.SHOULD);
> >                     boolFilter.add(filterClause);
> >                 }
>
>
>
> Do you see any problem in using
> ​
> TermsFilter over TermsQuery?
>
> btw, i will test with TermsQuery and let you know.
>
>
>
> ​--
> Kumaran ​R
>
>
>
>
> On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <jp...@gmail.com> wrote:
>
> > Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It is
> > worse at skipping over matches than a BooleanQuery but keeps memory
> > usage low and disk access sequential, on the contrary to large boolean
> > queries.
> >
> > Otherwise you would probably need to rethink how you design your
> documents
> > in order to be able to run simpler queries.
> >
> > Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <
> kums.134@gmail.com
> > >
> > a écrit :
> >
> > > Hi All,
> > >
> > > i am using lucene 4.10.4
> > >
> > > In lucene search, i know we have 1024 limitation in number of boolean
> > query
> > > clauses. i know we can increase this limit.. but i want to understand
> > > queries vs filter in lucene 4.10.4...
> > >
> > > i want to make queries larger than 1024.. Relevance is not needed for
> > > me. What are the best possible options?
> > >
> > > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > booleanFilter... is there any consequence using filters in this case?
> > shall
> > > i proceed with this?
> > >
> > > 2. if i am giving very less memory for filters, it is managed to
> > complete a
> > > search after so much GC cycles.. Why cannot we do the same for query
> > > clauses too? What is the actual technical reason for 1024 limitation in
> > > boolean query?
> > >
> > > 3. if i disable scoring process using ConstantScoreQuery, is it
> possible
> > > give more than 1024 query clauses?
> > >        i tried this.. But still getting java.lang.OutOfMemoryError..
> Why
> > ?
> > >
> > > java.lang.OutOfMemoryError: Java heap space
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> > BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> > Lucene41PostingsReader.java:254)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> > docs(SegmentTermsEnum.java:999)
> > > >
> > > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > > >
> > > > at
> > > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> > scorer(BooleanQuery.java:356)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> > ConstantScoreQuery.java:164)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> > filteredScorer(FilteredQuery.java:542)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> > filteredBulkScorer(FilteredQuery.java:504)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> > FilteredQuery.java:150)
> > > >
> > > >
> > >
> > >
> > >
> > > Any pointers are much appreciated... Thank you..
> > >
> > >
> > >
> > > --
> > > Kumaran R
> > >
> >
>

Re: Filters Vs queries - for terms more than 1024

Posted by Kumaran Ramasubramanian <ku...@gmail.com>.
​Hi Adrien,

Thanks for your input...

1. using boolean filters is working for even 1lakh Filter Clauses in
> booleanFilter... is there any consequence using filters in this case? shall
> i proceed with this?


​code snippet i used for this statement 1.. ​

                for (int i = 0; i < 10
> ​00​
> 00; i++)
>                 {
>                     Term term = new Term("
> ​key
> "
> ​+i​
> , "
> ​value
> "
> ​+i​
> );
>                     TermsFilter filter = new
> ​​
> TermsFilter(term);
>                     FilterClause filterClause = new FilterClause(filter,
> BooleanClause.Occur.SHOULD);
>                     boolFilter.add(filterClause);
>                 }



Do you see any problem in using
​
TermsFilter over TermsQuery?

btw, i will test with TermsQuery and let you know.



​--
Kumaran ​R




On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <jp...@gmail.com> wrote:

> Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It is
> worse at skipping over matches than a BooleanQuery but keeps memory
> usage low and disk access sequential, on the contrary to large boolean
> queries.
>
> Otherwise you would probably need to rethink how you design your documents
> in order to be able to run simpler queries.
>
> Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <kums.134@gmail.com
> >
> a écrit :
>
> > Hi All,
> >
> > i am using lucene 4.10.4
> >
> > In lucene search, i know we have 1024 limitation in number of boolean
> query
> > clauses. i know we can increase this limit.. but i want to understand
> > queries vs filter in lucene 4.10.4...
> >
> > i want to make queries larger than 1024.. Relevance is not needed for
> > me. What are the best possible options?
> >
> > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > booleanFilter... is there any consequence using filters in this case?
> shall
> > i proceed with this?
> >
> > 2. if i am giving very less memory for filters, it is managed to
> complete a
> > search after so much GC cycles.. Why cannot we do the same for query
> > clauses too? What is the actual technical reason for 1024 limitation in
> > boolean query?
> >
> > 3. if i disable scoring process using ConstantScoreQuery, is it possible
> > give more than 1024 query clauses?
> >        i tried this.. But still getting java.lang.OutOfMemoryError.. Why
> ?
> >
> > java.lang.OutOfMemoryError: Java heap space
> > >
> > > at
> > >>
> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > >
> > > at
> > >>
> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> Lucene41PostingsReader.java:254)
> > >
> > > at
> > >>
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> docs(SegmentTermsEnum.java:999)
> > >
> > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > >
> > > at
> > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> > >
> > > at
> > >>
> > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> scorer(BooleanQuery.java:356)
> > >
> > > at
> > >>
> > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> ConstantScoreQuery.java:164)
> > >
> > > at
> > >>
> > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> filteredScorer(FilteredQuery.java:542)
> > >
> > > at
> > >>
> > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> filteredBulkScorer(FilteredQuery.java:504)
> > >
> > > at
> > >>
> > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> FilteredQuery.java:150)
> > >
> > >
> >
> >
> >
> > Any pointers are much appreciated... Thank you..
> >
> >
> >
> > --
> > Kumaran R
> >
>

Re: Filters Vs queries - for terms more than 1024

Posted by Adrien Grand <jp...@gmail.com>.
Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It is
worse at skipping over matches than a BooleanQuery but keeps memory
usage low and disk access sequential, on the contrary to large boolean
queries.

Otherwise you would probably need to rethink how you design your documents
in order to be able to run simpler queries.

Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <ku...@gmail.com>
a écrit :

> Hi All,
>
> i am using lucene 4.10.4
>
> In lucene search, i know we have 1024 limitation in number of boolean query
> clauses. i know we can increase this limit.. but i want to understand
> queries vs filter in lucene 4.10.4...
>
> i want to make queries larger than 1024.. Relevance is not needed for
> me. What are the best possible options?
>
> 1. using boolean filters is working for even 1lakh Filter Clauses in
> booleanFilter... is there any consequence using filters in this case? shall
> i proceed with this?
>
> 2. if i am giving very less memory for filters, it is managed to complete a
> search after so much GC cycles.. Why cannot we do the same for query
> clauses too? What is the actual technical reason for 1024 limitation in
> boolean query?
>
> 3. if i disable scoring process using ConstantScoreQuery, is it possible
> give more than 1024 query clauses?
>        i tried this.. But still getting java.lang.OutOfMemoryError.. Why ?
>
> java.lang.OutOfMemoryError: Java heap space
> >
> > at
> >>
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> >
> > at
> >>
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:254)
> >
> > at
> >>
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs(SegmentTermsEnum.java:999)
> >
> > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> >
> > at
> org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> >
> > at
> >>
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
> >
> > at
> >>
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:164)
> >
> > at
> >>
> org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
> >
> > at
> >>
> org.apache.lucene.search.FilteredQuery$FilterStrategy.filteredBulkScorer(FilteredQuery.java:504)
> >
> > at
> >>
> org.apache.lucene.search.FilteredQuery$1.bulkScorer(FilteredQuery.java:150)
> >
> >
>
>
>
> Any pointers are much appreciated... Thank you..
>
>
>
> --
> Kumaran R
>