You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Changcheng Shao <ch...@gmail.com> on 2023/01/06 08:00:30 UTC

Some questions about Solr NOT query

Hi, Solr team
I am using Solr 8.11, and I want to ask some questions about NOT query.

My original query is:
network_id:379619 AND (object_type:("ssp_deal" OR "ssp_buyer_group" OR
"ssp_buyer")) AND (network_id:("379619"))  AND  (*:* NOT user_id: 0)
And I opened the debugQuery, the log is:
    "rawquerystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
\"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
 (*:* NOT user_id: 0)",
    "querystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
\"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
 (*:* NOT user_id: 0)",
    "parsedquery":"+network_id:379619 +(object_type:ssp_deal
object_type:ssp_buyer_group object_type:ssp_buyer) +network_id:379619
+(MatchAllDocsQuery(*:*) -user_id:0)",


Then I changed the query to:
network_id:379619 AND (object_type:("ssp_deal" OR "ssp_buyer_group" OR
"ssp_buyer")) AND (network_id:("379619"))  AND NOT user_id: 0
And the debug log is:
    "rawquerystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
\"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
 NOT user_id: 0",
    "querystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
\"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
 NOT user_id: 0",
    "parsedquery":"+network_id:379619 +(object_type:ssp_deal
object_type:ssp_buyer_group object_type:ssp_buyer) +network_id:379619
-user_id:0",

And the difference of queries are  (*:* NOT user_id: 0)  and NOT user_id: 0
the parsedquery are :  +(MatchAllDocsQuery(*:*) -user_id:0)   and
-user_id:0.
And the results of both queries are the same.

Then I test, the query which used (NOT user_id: 0) took less time than used
(*:* NOT user_id: 0)
So I want to ask, I hope you can analyze theoretically which query takes
less time to select?

Hope to get your response, thanks!

Re: Some questions about Solr NOT query

Posted by Thomas Corthals <th...@klascement.net>.
Hi

A negative query is a subtraction from a set of matched documents. With
(*:* NOT user_id:0) you are subtracting from the set of all documents in
the index first, then intersecting with the documents that match the other
clauses. With (NOT user_id:0) you are subtracting directly from the smaller
set of documents that match the other query clauses.

It might be interesting to compare your benchmarks with a filter query for
-user_id:0 instead, especially if that's a clause that's used repeatedly in
different queries.

Thomas

Op vr 6 jan. 2023 om 09:00 schreef Changcheng Shao <ch...@gmail.com>:

> Hi, Solr team
> I am using Solr 8.11, and I want to ask some questions about NOT query.
>
> My original query is:
> network_id:379619 AND (object_type:("ssp_deal" OR "ssp_buyer_group" OR
> "ssp_buyer")) AND (network_id:("379619"))  AND  (*:* NOT user_id: 0)
> And I opened the debugQuery, the log is:
>     "rawquerystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
> \"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
>  (*:* NOT user_id: 0)",
>     "querystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
> \"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
>  (*:* NOT user_id: 0)",
>     "parsedquery":"+network_id:379619 +(object_type:ssp_deal
> object_type:ssp_buyer_group object_type:ssp_buyer) +network_id:379619
> +(MatchAllDocsQuery(*:*) -user_id:0)",
>
>
> Then I changed the query to:
> network_id:379619 AND (object_type:("ssp_deal" OR "ssp_buyer_group" OR
> "ssp_buyer")) AND (network_id:("379619"))  AND NOT user_id: 0
> And the debug log is:
>     "rawquerystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
> \"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
>  NOT user_id: 0",
>     "querystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
> \"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
>  NOT user_id: 0",
>     "parsedquery":"+network_id:379619 +(object_type:ssp_deal
> object_type:ssp_buyer_group object_type:ssp_buyer) +network_id:379619
> -user_id:0",
>
> And the difference of queries are  (*:* NOT user_id: 0)  and NOT user_id: 0
> the parsedquery are :  +(MatchAllDocsQuery(*:*) -user_id:0)   and
> -user_id:0.
> And the results of both queries are the same.
>
> Then I test, the query which used (NOT user_id: 0) took less time than used
> (*:* NOT user_id: 0)
> So I want to ask, I hope you can analyze theoretically which query takes
> less time to select?
>
> Hope to get your response, thanks!
>

Re: Some questions about Solr NOT query

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello,
I think these are relevant
https://cwiki.apache.org/confluence/display/solr/NegativeQueryProblems
https://lucidworks.com/post/solr-boolean-operators/
I think
...)) -user_id:0
is perfect syntax.

On Fri, Jan 6, 2023 at 1:01 PM Changcheng Shao <ch...@gmail.com>
wrote:

> Hi, Solr team
> I am using Solr 8.11, and I want to ask some questions about NOT query.
>
> My original query is:
> network_id:379619 AND (object_type:("ssp_deal" OR "ssp_buyer_group" OR
> "ssp_buyer")) AND (network_id:("379619"))  AND  (*:* NOT user_id: 0)
> And I opened the debugQuery, the log is:
>     "rawquerystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
> \"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
>  (*:* NOT user_id: 0)",
>     "querystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
> \"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
>  (*:* NOT user_id: 0)",
>     "parsedquery":"+network_id:379619 +(object_type:ssp_deal
> object_type:ssp_buyer_group object_type:ssp_buyer) +network_id:379619
> +(MatchAllDocsQuery(*:*) -user_id:0)",
>
>
> Then I changed the query to:
> network_id:379619 AND (object_type:("ssp_deal" OR "ssp_buyer_group" OR
> "ssp_buyer")) AND (network_id:("379619"))  AND NOT user_id: 0
> And the debug log is:
>     "rawquerystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
> \"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
>  NOT user_id: 0",
>     "querystring":"network_id:379619 AND (object_type:(\"ssp_deal\" OR
> \"ssp_buyer_group\" OR \"ssp_buyer\")) AND (network_id:(\"379619\"))  AND
>  NOT user_id: 0",
>     "parsedquery":"+network_id:379619 +(object_type:ssp_deal
> object_type:ssp_buyer_group object_type:ssp_buyer) +network_id:379619
> -user_id:0",
>
> And the difference of queries are  (*:* NOT user_id: 0)  and NOT user_id: 0
> the parsedquery are :  +(MatchAllDocsQuery(*:*) -user_id:0)   and
> -user_id:0.
> And the results of both queries are the same.
>
> Then I test, the query which used (NOT user_id: 0) took less time than used
> (*:* NOT user_id: 0)
> So I want to ask, I hope you can analyze theoretically which query takes
> less time to select?
>
> Hope to get your response, thanks!
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!