You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dragos Bogdan <dr...@yahoo.com.INVALID> on 2016/11/10 14:17:38 UTC

Filter nested index - remove empty parents

Hello,
I am new to SOLR and at the first glance, I can say this is a very good service. Very helpful and fast.
I am trying to filter docs based on some criteria but I have few issues obtaining the final results.The main objective is to have one query that is able to offer a list of Persons with specific Profiles that have specific Experiences.

I think I managed to obtain such list, but the issue is that I still have in the results Persons with no Profiles, or Profiles with no Experiences. I would need a clean list with optim execution time.


What I have - types of docs:
Parents - Persons:{      "FIRSTNAME": "Ruth",        "CONTENT_TYPE_ID": "parentDocument",
        "id": "-3631097568311640064"}
Children - Profiles {        "PROFILEID": "548",          "CONTENT_TYPE_ID": "firstChildDocument",
          "id": "-3631097568311640064",          "PROFILECOMPETENCYID": "553"}
Children of Profiles are Experiences {        "EXPERIENCEID": "8158200356237475840",          "CONTENT_TYPE_ID": "secondChildDocument",
          "id": "-3631097568311640064",
          "PROFILE_PROFILEID": "548"}


Variant 1:
q=id:"-3631097568311640064" AND +{!parent which=CONTENT_TYPE_ID:parentDocument v=CONTENT_TYPE_ID:firstChildDocument}&
fl=*,experiences:[subquery]&
experiences.q=(CONTENT_TYPE_ID:secondChildDocument AND EXPERIENCEID:"-3884425047351230464")&experiences.fq={!terms f=PROFILE_PROFILEID v=$row.PROFILEID}&expand.field=_root_&expand=true&expand.q=CONTENT_TYPE_ID:firstChildDocument
This approach group and filter Profiles for every Person and create a subquery of desired Experiences for each Profile.The issue is that I have "empty" Profiles with no Experiences in the results, and implicitly Persons with any Experience.
Example result attached: Example1.json
Variant 2:
q=CONTENT_TYPE_ID:"parentDocument" AND id:"-3631097568311640064"&fl=*,profiles:[subquery]&
profiles.q=*:*&
profiles.fq=(CONTENT_TYPE_ID:"firstChildDocument" AND {!terms f=id v=$row.id})&
profiles.fl=*,experiences:[subquery]&
profiles.experiences.q=*:*&
profiles.experiences.fq=((CONTENT_TYPE_ID:"secondChildDocument" AND EXPERIENCEID:"-3884425047351230464") AND {!terms f=PROFILE_PROFILEID v=$row.PROFILEID})

This approach just simple create subqueries with the desired Experiences, but I have two issues:- The subqueries are executed for documents that is not necessary for example: tries to find experiences for Persons, but Experiences exists only for Profiles- And the same issue, the results contains Persons with no Experiences or Profiles with no Experiences. The "empty" Persons and "empty" Profiles should be removed.(Somehow filter all results that have numFound: 0 ?)
Example result attached: Example2.json


Questions:
1. Is there any solution to fix the issues with any of the above queries so we have the desired results?Is there any optimization can be done to have the best timings?
Or
2. Is any other approach in order to obtain the desired results? Other type of joins? 

kind regards,
Dragos

Re: Filter nested index - remove empty parents

Posted by Dragos Bogdan <dr...@yahoo.com.INVALID>.
This seems to be a good approach. I will try!Thank you!
Dragos


      From: Erick Erickson <er...@gmail.com>
 To: solr-user <so...@lucene.apache.org>; Dragos Bogdan <dr...@yahoo.com> 
 Sent: Thursday, November 10, 2016 6:02 PM
 Subject: Re: Filter nested index - remove empty parents
   
It looks like you're trying to just index tables from some DB and then
search them in Solr as you would the DB.

Solr join queries aren't like DB joins, especially you can't return
_fields_ from the "from" table.

The usual recommendation, if at all possible, is to flatten your data.
This runs counter to the RDMBS reflex to normalize, normalize,
normalize..... However, Solr specialized in searching and handles lots
and lots of data so de-normalizing is often a viable solution.

Best,
Erick

On Thu, Nov 10, 2016 at 6:17 AM, Dragos Bogdan
<dr...@yahoo.com.invalid> wrote:
> Hello,
>
> I am new to SOLR and at the first glance, I can say this is a very good
> service. Very helpful and fast.
>
> I am trying to filter docs based on some criteria but I have few issues
> obtaining the final results.
> The main objective is to have one query that is able to offer a list of
> Persons with specific Profiles that have specific Experiences.
>
> I think I managed to obtain such list, but the issue is that I still have in
> the results Persons with no Profiles, or Profiles with no Experiences. I
> would need a clean list with optim execution time.
>
>
> What I have - types of docs:
>
> Parents - Persons:
> {      "FIRSTNAME": "Ruth",
>        "CONTENT_TYPE_ID": "parentDocument",
>        "id": "-3631097568311640064"}
>
> Children - Profiles
> {        "PROFILEID": "548",
>          "CONTENT_TYPE_ID": "firstChildDocument",
>          "id": "-3631097568311640064",
>          "PROFILECOMPETENCYID": "553"}
>
> Children of Profiles are Experiences
>  {        "EXPERIENCEID": "8158200356237475840",
>          "CONTENT_TYPE_ID": "secondChildDocument",
>          "id": "-3631097568311640064",
>          "PROFILE_PROFILEID": "548"}
>
>
>
> Variant 1:
>
> q=id:"-3631097568311640064" AND +{!parent
> which=CONTENT_TYPE_ID:parentDocument v=CONTENT_TYPE_ID:firstChildDocument}&
> fl=*,experiences:[subquery]&
> experiences.q=(CONTENT_TYPE_ID:secondChildDocument AND
> EXPERIENCEID:"-3884425047351230464")&
> experiences.fq={!terms f=PROFILE_PROFILEID v=$row.PROFILEID}&
> expand.field=_root_&expand=true&expand.q=CONTENT_TYPE_ID:firstChildDocument
>
> This approach group and filter Profiles for every Person and create a
> subquery of desired Experiences for each Profile.
> The issue is that I have "empty" Profiles with no Experiences in the
> results, and implicitly Persons with any Experience.
>
> Example result attached: Example1.json
>
> Variant 2:
>
> q=CONTENT_TYPE_ID:"parentDocument" AND id:"-3631097568311640064"&
> fl=*,profiles:[subquery]&
> profiles.q=*:*&
> profiles.fq=(CONTENT_TYPE_ID:"firstChildDocument" AND {!terms f=id
> v=$row.id})&
> profiles.fl=*,experiences:[subquery]&
> profiles.experiences.q=*:*&
> profiles.experiences.fq=((CONTENT_TYPE_ID:"secondChildDocument" AND
> EXPERIENCEID:"-3884425047351230464") AND {!terms f=PROFILE_PROFILEID
> v=$row.PROFILEID})
>
> This approach just simple create subqueries with the desired Experiences,
> but I have two issues:
> - The subqueries are executed for documents that is not necessary for
> example: tries to find experiences for Persons, but Experiences exists only
> for Profiles
> - And the same issue, the results contains Persons with no Experiences or
> Profiles with no Experiences. The "empty" Persons and "empty" Profiles
> should be removed.
> (Somehow filter all results that have numFound: 0 ?)
>
> Example result attached: Example2.json
>
>
> Questions:
>
> 1. Is there any solution to fix the issues with any of the above queries so
> we have the desired results?
> Is there any optimization can be done to have the best timings?
>
> Or
>
> 2. Is any other approach in order to obtain the desired results? Other type
> of joins?
>
>
> kind regards,
> Dragos


   

Re: Filter nested index - remove empty parents

Posted by Erick Erickson <er...@gmail.com>.
It looks like you're trying to just index tables from some DB and then
search them in Solr as you would the DB.

Solr join queries aren't like DB joins, especially you can't return
_fields_ from the "from" table.

The usual recommendation, if at all possible, is to flatten your data.
This runs counter to the RDMBS reflex to normalize, normalize,
normalize..... However, Solr specialized in searching and handles lots
and lots of data so de-normalizing is often a viable solution.

Best,
Erick

On Thu, Nov 10, 2016 at 6:17 AM, Dragos Bogdan
<dr...@yahoo.com.invalid> wrote:
> Hello,
>
> I am new to SOLR and at the first glance, I can say this is a very good
> service. Very helpful and fast.
>
> I am trying to filter docs based on some criteria but I have few issues
> obtaining the final results.
> The main objective is to have one query that is able to offer a list of
> Persons with specific Profiles that have specific Experiences.
>
> I think I managed to obtain such list, but the issue is that I still have in
> the results Persons with no Profiles, or Profiles with no Experiences. I
> would need a clean list with optim execution time.
>
>
> What I have - types of docs:
>
> Parents - Persons:
> {      "FIRSTNAME": "Ruth",
>         "CONTENT_TYPE_ID": "parentDocument",
>         "id": "-3631097568311640064"}
>
> Children - Profiles
> {        "PROFILEID": "548",
>           "CONTENT_TYPE_ID": "firstChildDocument",
>           "id": "-3631097568311640064",
>           "PROFILECOMPETENCYID": "553"}
>
> Children of Profiles are Experiences
>  {        "EXPERIENCEID": "8158200356237475840",
>           "CONTENT_TYPE_ID": "secondChildDocument",
>           "id": "-3631097568311640064",
>           "PROFILE_PROFILEID": "548"}
>
>
>
> Variant 1:
>
> q=id:"-3631097568311640064" AND +{!parent
> which=CONTENT_TYPE_ID:parentDocument v=CONTENT_TYPE_ID:firstChildDocument}&
> fl=*,experiences:[subquery]&
> experiences.q=(CONTENT_TYPE_ID:secondChildDocument AND
> EXPERIENCEID:"-3884425047351230464")&
> experiences.fq={!terms f=PROFILE_PROFILEID v=$row.PROFILEID}&
> expand.field=_root_&expand=true&expand.q=CONTENT_TYPE_ID:firstChildDocument
>
> This approach group and filter Profiles for every Person and create a
> subquery of desired Experiences for each Profile.
> The issue is that I have "empty" Profiles with no Experiences in the
> results, and implicitly Persons with any Experience.
>
> Example result attached: Example1.json
>
> Variant 2:
>
> q=CONTENT_TYPE_ID:"parentDocument" AND id:"-3631097568311640064"&
> fl=*,profiles:[subquery]&
> profiles.q=*:*&
> profiles.fq=(CONTENT_TYPE_ID:"firstChildDocument" AND {!terms f=id
> v=$row.id})&
> profiles.fl=*,experiences:[subquery]&
> profiles.experiences.q=*:*&
> profiles.experiences.fq=((CONTENT_TYPE_ID:"secondChildDocument" AND
> EXPERIENCEID:"-3884425047351230464") AND {!terms f=PROFILE_PROFILEID
> v=$row.PROFILEID})
>
> This approach just simple create subqueries with the desired Experiences,
> but I have two issues:
> - The subqueries are executed for documents that is not necessary for
> example: tries to find experiences for Persons, but Experiences exists only
> for Profiles
> - And the same issue, the results contains Persons with no Experiences or
> Profiles with no Experiences. The "empty" Persons and "empty" Profiles
> should be removed.
> (Somehow filter all results that have numFound: 0 ?)
>
> Example result attached: Example2.json
>
>
> Questions:
>
> 1. Is there any solution to fix the issues with any of the above queries so
> we have the desired results?
> Is there any optimization can be done to have the best timings?
>
> Or
>
> 2. Is any other approach in order to obtain the desired results? Other type
> of joins?
>
>
> kind regards,
> Dragos