You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bram Biesbrouck <br...@reinvention.be> on 2019/06/17 10:46:16 UTC

Question regarding negated block join queries

Dear all,

I'm new to this list, so let me introduce myself. I'm Bram, author of a
linked data framework called Stralo. We're working toward version 1.0, in
which we're integrating Solr indexing and querying of RDF triples (
https://github.com/republic-of-reinvention/com.stralo.framework/milestone/3)

I'm running to inconsistent results regarding block join queries and I
wondered if any of you could help me out. We're indexing our parent-child
relationships using a field called "parentUri". The field contains the URI
(the id of the document) of the parent document, is just omitted when the
document itself if a parent.

Here's an example of a child document:

{
        "language":"en",
        "resource":"/resource/1130494009577889453",
        "parentUri":"/en/blah",
        "uri":"/resource/1130494009577889453",
        "label":"Label of the object",
        "description":"Example of some sub text",
        "typeOf":"ror:Page",
        "rdf:type":["ror:Page"],
        "rdfs:label":["Label of the object"],
        "ror:text":["Example of some sub text"],
        "ror:testNumber":[4],
        "ror:testDate":["2019-05-10T00:00:00Z"],
        "_version_":1636582287436939264
}

(Please ignore the CURIE syntax we're using as field names. We know it's
slightly illegal in Solr, but it works just fine and it makes our lives
indexing tripes so much more convenient)

Here's it's parent document:

{
        "language":"en",
        "resource":"/resource/1106177060466942658",
        "uri":"/en/blah",
        "label":"rdfs label test 3",
        "description":"Hi, we are the Republic \n        we do video
technology",
        "typeOf":"ror:BlogPost",
        "rdf:type":["ror:BlogPost"],
        "rdfs:label":["rdfs label test 3"],
        "meta:created":["2019-04-04T09:08:35.736Z"],
        "meta:creator":["/users/2"],
        "meta:modified":["2019-06-17T10:14:54.134Z"],
        "meta:contributor":["/users/2",
          "/users/1"],
        "ror:testEditor":["Blah, dit is inhoud van test editor"],
        "ror:testEnum":["af"],
        "ror:testDate":["2019-05-31T00:00:00Z"],
        "ror:testResource":["/resource/Page/800895161299715471"],
        "ror:testObject":["/resource/1130494009577889453"],
        "ror:text":["Hi, we are the Republic we do video technology"],
        "_version_":1636582287436939264
}

As said, we're struggling with block joins, because we don't have a clear
field that contains "this" for parent documents and "that" for child
documents. Instead, it's omitted for parent documents. So, to fire a block
join child query, we use this approach (just an example):

q={!parent which=-(parentUri:*)}*:*

What we expect is that the allParents filter selects all those documents
where the "parentUri" field doesn't exist using a negated wildcard query
(which works just fine when used alone). The someParents fitler just
selects everything since this is an example. Alas, this doesn't yield any
results.

Since the docs say:
When subordinate clause (<someParents>) is omitted, it’s parsed as a
segmented and cached filter for children documents. More precisely,
q={!child of=<allParents>} is equivalent to q=*:* -<allParents>.

I tried to run this query (assuming a double negation becomes a plus):

*:* +(parentUri:*)

And this yields correct results, so I'm assuming it's possible, but I'm
overlooking something in my block join children query syntax.

Could anyone put me in the right direction to use block join queries with
non-existent or existent fields?

all the best,

b.

Re: Question regarding negated block join queries

Posted by Erick Erickson <er...@gmail.com>.
Bram:

Here’s a fuller explanation that you might be interested in:

https://lucidworks.com/2011/12/28/why-not-and-or-and-not/

Best,
Erick

> On Jun 17, 2019, at 11:32 AM, Bram Biesbrouck <br...@reinvention.be> wrote:
> 
> On Mon, Jun 17, 2019 at 7:11 PM Shawn Heisey <ap...@elyograg.org> wrote:
> 
>> On 6/17/2019 4:46 AM, Bram Biesbrouck wrote:
>>> q={!parent which=-(parentUri:*)}*:*
>> 
>> Pure negative queries do not work in Lucene.  Sometimes, when you do a
>> single-clause negative query, Solr is able to detect the problem and
>> automatically make an adjustment so the query works.  This happens
>> transparently so you never notice.
>> 
>> In essence, what your negative query tells Lucene is "start with
>> nothing, and then subtract docs that match this query."  Since you
>> started with nothing and then subtracted, you get nothing.
>> 
>> Also, that's a wilcard query.  Which could be very slow if the possible
>> number of values in parentUri is more than a few.  If that field can
>> only contain a very small number of values, then a wildcard query might
>> be fast.
>> 
>> The following query solves both problems -- starting with all docs and
>> then subtracting things that match the query clause after that:
>> 
>> *:* -parentUri:[* TO *]
>> 
>> This will return all documents that do not have the parentUri field
>> defined.  The [* TO *] syntax is an all-inclusive range query.
>> 
> 
> Hi Shawn,
> 
> Awesome elaborate explanation, thank you. Also thanks for the optimization
> hint. I found both approaches online, but didn't realize there was a
> performance difference .
> Digging deeper, I've found this SO post, basically explaining why it worked
> some of the time, but not in all cases:
> https://stackoverflow.com/questions/10651548/negation-in-solr-query
> 
> best,
> 
> b.


Re: Question regarding negated block join queries

Posted by Bram Biesbrouck <br...@reinvention.be>.
On Mon, Jun 17, 2019 at 7:11 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 6/17/2019 4:46 AM, Bram Biesbrouck wrote:
> > q={!parent which=-(parentUri:*)}*:*
>
> Pure negative queries do not work in Lucene.  Sometimes, when you do a
> single-clause negative query, Solr is able to detect the problem and
> automatically make an adjustment so the query works.  This happens
> transparently so you never notice.
>
> In essence, what your negative query tells Lucene is "start with
> nothing, and then subtract docs that match this query."  Since you
> started with nothing and then subtracted, you get nothing.
>
> Also, that's a wilcard query.  Which could be very slow if the possible
> number of values in parentUri is more than a few.  If that field can
> only contain a very small number of values, then a wildcard query might
> be fast.
>
> The following query solves both problems -- starting with all docs and
> then subtracting things that match the query clause after that:
>
> *:* -parentUri:[* TO *]
>
> This will return all documents that do not have the parentUri field
> defined.  The [* TO *] syntax is an all-inclusive range query.
>

Hi Shawn,

Awesome elaborate explanation, thank you. Also thanks for the optimization
hint. I found both approaches online, but didn't realize there was a
performance difference .
Digging deeper, I've found this SO post, basically explaining why it worked
some of the time, but not in all cases:
https://stackoverflow.com/questions/10651548/negation-in-solr-query

best,

b.

Re: Question regarding negated block join queries

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/17/2019 4:46 AM, Bram Biesbrouck wrote:
> q={!parent which=-(parentUri:*)}*:*

Pure negative queries do not work in Lucene.  Sometimes, when you do a 
single-clause negative query, Solr is able to detect the problem and 
automatically make an adjustment so the query works.  This happens 
transparently so you never notice.

In essence, what your negative query tells Lucene is "start with 
nothing, and then subtract docs that match this query."  Since you 
started with nothing and then subtracted, you get nothing.

Also, that's a wilcard query.  Which could be very slow if the possible 
number of values in parentUri is more than a few.  If that field can 
only contain a very small number of values, then a wildcard query might 
be fast.

The following query solves both problems -- starting with all docs and 
then subtracting things that match the query clause after that:

*:* -parentUri:[* TO *]

This will return all documents that do not have the parentUri field 
defined.  The [* TO *] syntax is an all-inclusive range query.

Thanks,
Shawn