You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by manuj singh <s....@gmail.com> on 2018/05/07 15:51:47 UTC

Must clause with filter queries

Hi all,
I am kind of confused how must clause(+) behaves with the filter queries.
e.g i have below query:
q=*:*&fq=+{!frange cost=200 l=NOW-179DAYS u=NOW/DAY+1DAY incl=true
incu=false}date

So i am filtering documents which are less then 179 old days.
So e.g if now is May 7th, 10.23 cst,2018, i should only see documents which
have date > Nov 9th, 10.23 cst, 2017.

However with the above query i am also seeing documents which are done on
Nov 5th,2017 (which seems like it is returning some docs from filter cache.
which is wired because in my date range for the start date  i am using
NOW-179DAYS and
Now is changing every time, so it shouldn't go to filtercache as every new
request will have  a different time stamp. )

However if i remove the + from the filter query it seems to work fine.

I am mostly thinking it seems to be a filtercache issue but not sure how i
prove that.

Our auto soft commit is 500 ms , so every 0.5 second we should have a new
searcher open and cache should be flushed.

Something is not right and i am not able to figure out what. Has some one
seen this kind of issue before ?

If i move the query from fq to q then also it works fine.

One more thing when i put debug query i see the following in the parse query


*"QParser": "LuceneQParser", "filter_queries": [ "+{!frange cost=200
l=NOW-179DAYS u=NOW/DAY+1DAY incl=true incu=false}date", "-_parent_:F" ],
"parsed_filter_queries": [
"+FunctionRangeQuery(ConstantScore(frange(date(date)):[NOW-179DAYS TO
NOW/DAY+1DAY}))", "-_parent_:false" ]*

So in the above i do not see the date getting resolved to an actual time
stamp.

However if i change the syntax of the query to not use frange and local
params i see the transaction date resolving into correct timestamp.

So for the following query
q=*:*&fq=+date:[NOW-179DAYS TO NOW/DAY+1DAY]

i see the following in the debug query, and see the actualy timestamp:
"QParser": "LuceneQParser", "filter_queries": [ "date:[NOW-179DAYS TO
NOW/DAY+1DAY]", "-_parent_:F" ], "parsed_filter_queries": [
"date:[1510242067383
TO 1525737600000]", "-_parent_:false" ],


Not sure if its just a red herring ?

Re: Must clause with filter queries

Posted by root23 <s....@gmail.com>.
hey shawn i tried debugging actual solr code in local with the following two
different forms for frange. So to see if solr is somehow parsing it wrong.
But i seed the parsed query that gets put in the filter query pretty much
same.

query1 -> +_val_:{!frange cost=200 l=30 u=100 incl=true incu=false}price
the above get parsed into following:
+ConstantScore(frange(float(price)):[30 TO 100})

qyery2: {!frange cost=200 l=30 u=100 incl=true incu=false}price
this gets parsed into the following:
ConstantScore(frange(float(price)):[30 TO 100})

As you can see the only difference is the leading +(Must clause). But since
this is single clause i assume it doesnt make any difference.


I saw in Qparser.getParser() method there is a check for localparams as
shown below
*if (allowLocalParams && qstr != null &&
qstr.startsWith(QueryParsing.LOCALPARAM_START))*

But it seems like eventually solr figures out how to correctly parse it. Not
sure if i am missing something.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Must clause with filter queries

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/9/2018 12:56 PM, root23 wrote:
> Thanks for the explanation shawn. I will look at our autowarming time. 
> Looking at your response i am thinking i might be doing few more things
> wrong
> 1. Does Must clause with any of the filter query makes any sense or is
> automatically implied.
>   e.g if i want all the docs with firstName:michael and lastname:jordan,
> which of the following queries makes sense or both are equivalent
>     a) q=*:*&fq=name:michael&fq=lastname:jordan
>     b) q=*:*&fq=+name:michael&fq=+lastname:jordan

Because both of those filters you're using are single-clause, query a
and query b should produce identical results.  In that situation, the +
is not necessary.

If you had this instead, which is a two-clause query, you probably want
to use + markers:

q=*:*&fq=name:michael lastname:jordan

Depending on what the default operator (q.op) is, of course.  If you use
q.op=AND, then adding the + markers would be handled automatically for
all clauses that do not explicitly use OR.

> 2.Does Must clause also implied with the join query. so in the following
> query i am joining between 2 cores, on field:id. It should filter first from
> the index "search" where title is full and then join on id and then only get
> the docs which also has status set to monitor.
>
>      a ) q=*:*&fq=+{!join from=id to=id fromIndex=search
> force=true}title:full&fq=+status:monitor
>     
>      b) q=*:*&fq={!join from=id to=id fromIndex=search
> force=true}title:full&fq=status:monitor
>
> so of the above which one is accurate a) or b)

I have absolutely zero experience with joins in Solr.  That said, query
a has something (a plus sign) before the "{!join" ... which might cause
Solr to *not* interpret the query as a join.  The localparams syntax
normally must be the very first thing in a query string.  The actual
query "title:full" is a single clause query, so it should not need the +. 

Thanks,
Shawn


Re: Must clause with filter queries

Posted by Susheel Kumar <su...@gmail.com>.
1. a) is accurate while 2. b) is accurate.

if query 1. a) is just for example then its fine but otherwise usually want
to use filter on fields which has low cardinality like state, country,
gender etc. Name is a high cardinality column and using filter query
wouldn't be efficient and also doesn't help with caching.

Thnx

On Wed, May 9, 2018 at 2:56 PM, root23 <s....@gmail.com> wrote:

> Thanks for the explanation shawn. I will look at our autowarming time.
> Looking at your response i am thinking i might be doing few more things
> wrong
> 1. Does Must clause with any of the filter query makes any sense or is
> automatically implied.
>   e.g if i want all the docs with firstName:michael and lastname:jordan,
> which of the following queries makes sense or both are equivalent
>     a) q=*:*&fq=name:michael&fq=lastname:jordan
>     b) q=*:*&fq=+name:michael&fq=+lastname:jordan
>
>
> 2.Does Must clause also implied with the join query. so in the following
> query i am joining between 2 cores, on field:id. It should filter first
> from
> the index "search" where title is full and then join on id and then only
> get
> the docs which also has status set to monitor.
>
>      a ) q=*:*&fq=+{!join from=id to=id fromIndex=search
> force=true}title:full&fq=+status:monitor
>
>      b) q=*:*&fq={!join from=id to=id fromIndex=search
> force=true}title:full&fq=status:monitor
>
> so of the above which one is accurate a) or b)
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Re: Must clause with filter queries

Posted by root23 <s....@gmail.com>.
Thanks for the explanation shawn. I will look at our autowarming time. 
Looking at your response i am thinking i might be doing few more things
wrong
1. Does Must clause with any of the filter query makes any sense or is
automatically implied.
  e.g if i want all the docs with firstName:michael and lastname:jordan,
which of the following queries makes sense or both are equivalent
    a) q=*:*&fq=name:michael&fq=lastname:jordan
    b) q=*:*&fq=+name:michael&fq=+lastname:jordan


2.Does Must clause also implied with the join query. so in the following
query i am joining between 2 cores, on field:id. It should filter first from
the index "search" where title is full and then join on id and then only get
the docs which also has status set to monitor.

     a ) q=*:*&fq=+{!join from=id to=id fromIndex=search
force=true}title:full&fq=+status:monitor
    
     b) q=*:*&fq={!join from=id to=id fromIndex=search
force=true}title:full&fq=status:monitor

so of the above which one is accurate a) or b)





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Must clause with filter queries

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/8/2018 9:58 AM, root23 wrote:
> In case of frange query how do we specify the Must clause ?

Looking at how frange works, I'm pretty sure that all queries with
frange are going to be effectively single-clause.  So you don't need to
specify MUST -- it's implied.

> the reason we are using frange instead of the normal syntax is that we need
> to add a cost to this clause. Since this will return a lot of documents, we
> want to calculate at the end of all the clauses. That is why we are using
> frange with a cost of 200.

Ah, you want it to be a postFilter, which frange supports, but the
standard lucene parser doesn't.  FYI, to actually achieve a postFilter,
you need to set cache=false in addition to a cost of 100 or higher. 
It's not possible to cache postFilters because of how they work, so they
must be uncached.  Which also means you don't need to worry about using
NOW/DAY date rounding.

See the "Expensive Filters" section on this blog post for an example
with frange that includes cache=false and cost=200:

https://lucidworks.com/2012/02/10/advanced-filter-caching-in-solr/

The requirement for cache=false is not mentioned on the blog post
above.  It was this post that alerted me to that requirement:

https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/

> We have near real time requirements and that is the reason we are using 500
> ms in the autosoft commit.
> We have autowarmCount="60%" for filter cache.

What is the size of the filterCache?  Chances are very good that this
translates to a fairly high autowarmCount, and that it is making your
automatic soft commits take far longer than 500 milliseconds.  If the
warming is slow, then you're not getting the half-second latency anyway,
so configuring it is at best a waste of resources, and at worst a big
performance problem.

Achieving NRT indexing requires turning off all warming.  To see how
long it took to warm the searcher on the last commit, go to the admin
UI.  Choose your index from the dropdown, click on Plugins/Stats, click
on CORE, then open the "searcher" entry.  In the displayed information
will be "warmupTime", with a value in milliseconds.  I'm betting that
this number will be larger than 500.  If I'm wrongabout that, then you
might not have anything to worry about.

You can also see warmup times for the individual caches with the CACHE
entry in Plugins/Stats.  Typically it's filterCache that takes the longest.

https://www.dropbox.com/s/izwad4h2vl1z752/solr-filtercache-stats.png?dl=0

A long time ago, I was having issues on my servers with commits taking a
minute or more.  I discovered that it was autowarming on the filterCache
that caused it.  So I reduced autowarmCount on that cache. Eventually I
got to an autowarmCount of *four*.  Not 4 percent, I am literally doing
warming from the top 4 cache entries.  Even with the count that low,
commits still sometimes take 10 seconds or more, and the vast majority
of that time is spent executing those four warming queries from the
filterCache.

Thanks,
Shawn


Re: Must clause with filter queries

Posted by root23 <s....@gmail.com>.
Hi Shawn,
Thanks for the repsonse. We have multiple clauses. I was just giving an bare
bone example. Usually all our queries will have more then one clause.

In case of frange query how do we specify the Must clause ?

the reason we are using frange instead of the normal syntax is that we need
to add a cost to this clause. Since this will return a lot of documents, we
want to calculate at the end of all the clauses. That is why we are using
frange with a cost of 200.

We have near real time requirements and that is the reason we are using 500
ms in the autosoft commit.
We have autowarmCount="60%" for filter cache.

We are using solr 6.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Must clause with filter queries

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/7/2018 9:51 AM, manuj singh wrote:
> I am kind of confused how must clause(+) behaves with the filter queries.
> e.g i have below query:
> q=*:*&fq=+{!frange cost=200 l=NOW-179DAYS u=NOW/DAY+1DAY incl=true
> incu=false}date
>
> So i am filtering documents which are less then 179 old days.
> So e.g if now is May 7th, 10.23 cst,2018, i should only see documents which
> have date > Nov 9th, 10.23 cst, 2017.
>
> However with the above query i am also seeing documents which are done on
> Nov 5th,2017 (which seems like it is returning some docs from filter cache.
> which is wired because in my date range for the start date  i am using
> NOW-179DAYS and
> Now is changing every time, so it shouldn't go to filtercache as every new
> request will have  a different time stamp. )
>
> However if i remove the + from the filter query it seems to work fine.

I'm not sure that trying to use the + with the frange query makes any
sense.  For one thing, putting anything before the localparams (which is
the {!stuff otherstuff} syntax) probably causes Solr to not correctly
interpret the localparams syntax.  Typically localparams must be at the
very beginning of the query.  Adding a plus to a single-clause query
like that is not necessary.  Queries with one clause will effectively be
interpreted as having the +/MUST on that clause.

> I am mostly thinking it seems to be a filtercache issue but not sure how i
> prove that.
>
> Our auto soft commit is 500 ms , so every 0.5 second we should have a new
> searcher open and cache should be flushed.

A commit interval that low could result in some big problems.  I hope
the autowarmCount setting on all your caches is zero.  If it's not,
you're going to want to have a much longer interval than 500 milliseconds.

> Something is not right and i am not able to figure out what. Has some one
> seen this kind of issue before ?
>
> If i move the query from fq to q then also it works fine.
>
> One more thing when i put debug query i see the following in the parse query
>
> *"QParser": "LuceneQParser", "filter_queries": [ "+{!frange cost=200
> l=NOW-179DAYS u=NOW/DAY+1DAY incl=true incu=false}date", "-_parent_:F" ],
> "parsed_filter_queries": [
> "+FunctionRangeQuery(ConstantScore(frange(date(date)):[NOW-179DAYS TO
> NOW/DAY+1DAY}))", "-_parent_:false" ]*
>
> So in the above i do not see the date getting resolved to an actual time
> stamp.
>
> However if i change the syntax of the query to not use frange and local
> params i see the transaction date resolving into correct timestamp.
>
> So for the following query
> q=*:*&fq=+date:[NOW-179DAYS TO NOW/DAY+1DAY]
>
> i see the following in the debug query, and see the actualy timestamp:
> "QParser": "LuceneQParser", "filter_queries": [ "date:[NOW-179DAYS TO
> NOW/DAY+1DAY]", "-_parent_:F" ], "parsed_filter_queries": [
> "date:[1510242067383
> TO 1525737600000]", "-_parent_:false" ],

If the filter you're trying to use is this kind of simple date range, I
would stick with lucene and not use localparams to switch to another
parser.  I would also set the low value of the range to NOW/DAY-179DAYS
so there's at least a chance that caching will be effective.  Also, as
mentioned, because this example only has one query clause, adding + is
unnecessary.  It might become necessary if you have multiple query
clauses ... but in that case, you're not likely to be using something
like frange.

Thanks,
Shawn