You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sambasivarao giddaluri <sa...@gmail.com> on 2020/02/07 19:23:31 UTC

Stream InnerJoin to merge hierarchal data

Hi All,

Our dataset is of 50M records and we are using complex graph query and now
trying to do innerjoin on the records and facing the below issue .
This is a critical issue .

Parent
{
parentId:"1"
parent.name:"foo"
type:"parent"

}
Child
{
childId:"2"
parentId:"1"
child.name:"bar"
type:"child"
}
GrandChild
{
grandId:"3"
childId:"2"
parentId:"1"
grandchild.name:"too"
type:"grandchild"
}

innerJoin(search(collection_name, q="type:grandchild", qt="/export", fl="
grandchild.name,grandId,childId,parentId", sort="childId asc"),
search(collection_name, q="type:child", qt="/export",
fl="child.name,childId,parentId",
sort="childId asc"),
on="childId")

this works and gives result
{
        "parentId": "1",
        "childId": "2",
        "grandId: "3",
        "grandchild.name": "too",
        "child.name": "bar"
     }

but if i try to join the parent as well with another innerjoin this gives
error

innerJoin(
innerJoin(search(collection_name, q="type:grandchild", qt="/export", fl="
grandchild.name,grandId,childId,parentId", sort="childId asc"),
search(collection_name, q="type:child", qt="/export",
fl="child.name,childId,parentId",
sort="childId asc"),
on="childId"),
search(collection_name, q="type:parent", qt="/export", fl="parent.name,
parentId", sort="parentId asc"),on="parentId")

ERROR
{
  "result-set": {
    "docs": [
      {
        "EXCEPTION": "Invalid JoinStream - all incoming stream comparators
(sort) must be a superset of this stream's equalitor.",
        "EOF": true
      }
    ]
  }
}


If we change the key parentId in child doc to childParentId and similarly
childId,parentId in grandchild doc to grandchildId,grandParentId then query
will work but this is a big change in schema..
i also refered this issue https://issues.apache.org/jira/browse/SOLR-10512

Thanks
sam

Re: Stream InnerJoin to merge hierarchal data

Posted by Joel Bernstein <jo...@gmail.com>.
This is working as designed I believe. I issue is that innerJoin relies on
the sort order of the streams in order to perform streaming merge join. The
first join works because the sorts line up on childId.

  innerJoin(search(collection_name,
                                    q="type:grandchild",
                                    qt="/export",
                                    fl="grandchild. name, grandId, childId,
parentId",
                                    sort="childId asc"),
                       search(collection_name,
                                    q="type:child",
                                   qt="/export",
                                   fl="child.name, childId, parentId",
                                  sort="childId asc")

The second join though is attempting join on parentId but the sorts do not
allow that as one of the joins is sorted on childid.

One possible solution is to use fetch to retrieve the parent for the child:
https://lucene.apache.org/solr/guide/8_0/stream-decorator-reference.html#fetch


Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Feb 7, 2020 at 2:23 PM sambasivarao giddaluri <
sambasiva.giddaluri@gmail.com> wrote:

> Hi All,
>
> Our dataset is of 50M records and we are using complex graph query and now
> trying to do innerjoin on the records and facing the below issue .
> This is a critical issue .
>
> Parent
> {
> parentId:"1"
> parent.name:"foo"
> type:"parent"
>
> }
> Child
> {
> childId:"2"
> parentId:"1"
> child.name:"bar"
> type:"child"
> }
> GrandChild
> {
> grandId:"3"
> childId:"2"
> parentId:"1"
> grandchild.name:"too"
> type:"grandchild"
> }
>
> innerJoin(search(collection_name, q="type:grandchild", qt="/export", fl="
> grandchild.name,grandId,childId,parentId", sort="childId asc"),
> search(collection_name, q="type:child", qt="/export",
> fl="child.name,childId,parentId",
> sort="childId asc"),
> on="childId")
>
> this works and gives result
> {
>         "parentId": "1",
>         "childId": "2",
>         "grandId: "3",
>         "grandchild.name": "too",
>         "child.name": "bar"
>      }
>
> but if i try to join the parent as well with another innerjoin this gives
> error
>
> innerJoin(
> innerJoin(search(collection_name, q="type:grandchild", qt="/export", fl="
> grandchild.name,grandId,childId,parentId", sort="childId asc"),
> search(collection_name, q="type:child", qt="/export",
> fl="child.name,childId,parentId",
> sort="childId asc"),
> on="childId"),
> search(collection_name, q="type:parent", qt="/export", fl="parent.name,
> parentId", sort="parentId asc"),on="parentId")
>
> ERROR
> {
>   "result-set": {
>     "docs": [
>       {
>         "EXCEPTION": "Invalid JoinStream - all incoming stream comparators
> (sort) must be a superset of this stream's equalitor.",
>         "EOF": true
>       }
>     ]
>   }
> }
>
>
> If we change the key parentId in child doc to childParentId and similarly
> childId,parentId in grandchild doc to grandchildId,grandParentId then query
> will work but this is a big change in schema..
> i also refered this issue https://issues.apache.org/jira/browse/SOLR-10512
>
> Thanks
> sam
>