You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marc Röttig <ma...@vico-research.com> on 2017/09/06 08:08:08 UTC

Streaming expression API innerJoin on multi-valued field

Dear SOLR users,

I want to use streaming expression innerJoin using a multi-valued field to do the join by equality, that is having any child documents  (of type "child") and
one parent document (of type "parent") join these according to equality of id_s and children_ids

Parent
* id_s = "p123"
* type_s = "parent"
* children_ids_ss = "c1,c2"

Child
* id_s = "c1"
* type_s = "child"

Child
* id_s = "c2"
* type_s = "child"

innerJoin(
   search(collection,q="type_s:child",fl="id_s",sort="id_s ASC"),
   search(collection,q="type_s:parent",fl="id_s,children_ids_ss",sort="id_s ASC"),
   on="id_s=children_ids_ss"
)

This seems to be impossible, I am getting the following exception "java.util.ArrayList cannot be cast to java.lang.Comparable". Using a GraphQuery with from and to
this relationship traversal along multi-valued fields worked (however not between shards, this is why I switched to streaming expressions).

Is there any mechanism to flatten the tuples with the multi-valued field into new tuples with single-valued fields to get the join working ? Or any other tweak.

Note: The relationship between Parent and Child is many-to-many, thus moving the foreign-keys to the children as single-valued fields is not possible.

The issue is related tot he following issue: http://lucene.472066.n3.nabble.com/Using-multi-valued-field-in-solr-cloud-Graph-Traversal-Query-td4324379.html

Thanks a lot in advance for any assistance,
Marc


Dr. Marc Röttig
Software Developer
EMail: marc.roettig@vico-research.com
Telefon: +49(0)711. 78 78 29-290
Fax +49(0)711. 78 78 29-10

VICO Research & Consulting GmbH
Friedrich-List-Strasse 46 / 70771 Leinfelden-Echterdingen

Homepage:         www.vico-research.com/
Blog:                       www.vico-research.com/expert-talk
Twitter:                 www.twitter.com/vico_news
Facebook:            www.facebook.com/vico.friend
Sitz der Gesellschaft: Leinfelden-Echterdingen
Amtsgericht Stuttgart, HRB 720896
Geschäftsführer: Marc Trömel


AW: Streaming expression API innerJoin on multi-valued field

Posted by Marc Röttig <ma...@vico-research.com>.
Dear Mr. Bernstein, SOLR-users, 

thanks a lot for your valuable hint regarding the cartesianProduct operator.

The following streaming expression gives me the desired result tuples:

hashJoin(
  search(collection,q="type_s:child",fl="id_s",sort="id_s ASC",rows=1000000),
  hashed=cartesianProduct(
     search(collection,q="type_s:parent AND id_s:p123",fl="children_ids_ss,id_s",sort="id_s ASC"),
     children_ids_ss
   ),
   on="id_s=children_ids_ss"
)

where the inner 

search(collection,q="id_s:p123",fl="children_ids_ss,id_s",sort="id_s ASC")

ideally delivers 1 to a few (say 1000) tuples, hopefully not making the HashJoin
slow or even impossible, and the outer search will yield quite a lot tuples. Which
should be fine though.

Cheers,
Marc


-----Ursprüngliche Nachricht-----
Von: Marc Röttig [mailto:marc.roettig@vico-research.com] 
Gesendet: Mittwoch, 6. September 2017 10:08
An: solr-user@lucene.apache.org
Betreff: Streaming expression API innerJoin on multi-valued field

Dear SOLR users,

I want to use streaming expression innerJoin using a multi-valued field to do the join by equality, that is having any child documents  (of type "child") and one parent document (of type "parent") join these according to equality of id_s and children_ids

Parent
* id_s = "p123"
* type_s = "parent"
* children_ids_ss = "c1,c2"

Child
* id_s = "c1"
* type_s = "child"

Child
* id_s = "c2"
* type_s = "child"

innerJoin(
   search(collection,q="type_s:child",fl="id_s",sort="id_s ASC"),
   search(collection,q="type_s:parent",fl="id_s,children_ids_ss",sort="id_s ASC"),
   on="id_s=children_ids_ss"
)

This seems to be impossible, I am getting the following exception "java.util.ArrayList cannot be cast to java.lang.Comparable". Using a GraphQuery with from and to this relationship traversal along multi-valued fields worked (however not between shards, this is why I switched to streaming expressions).

Is there any mechanism to flatten the tuples with the multi-valued field into new tuples with single-valued fields to get the join working ? Or any other tweak.

Note: The relationship between Parent and Child is many-to-many, thus moving the foreign-keys to the children as single-valued fields is not possible.

The issue is related tot he following issue: http://lucene.472066.n3.nabble.com/Using-multi-valued-field-in-solr-cloud-Graph-Traversal-Query-td4324379.html

Thanks a lot in advance for any assistance, Marc


Dr. Marc Röttig
Software Developer
EMail: marc.roettig@vico-research.com
Telefon: +49(0)711. 78 78 29-290
Fax +49(0)711. 78 78 29-10

VICO Research & Consulting GmbH
Friedrich-List-Strasse 46 / 70771 Leinfelden-Echterdingen

Homepage:         www.vico-research.com/
Blog:                       www.vico-research.com/expert-talk
Twitter:                 www.twitter.com/vico_news
Facebook:            www.facebook.com/vico.friend Sitz der Gesellschaft: Leinfelden-Echterdingen Amtsgericht Stuttgart, HRB 720896
Geschäftsführer: Marc Trömel


Re: Streaming expression API innerJoin on multi-valued field

Posted by Joel Bernstein <jo...@gmail.com>.
The cartesianProduct Stream can be wrapped around the stream with the
multi-value field. The cartesianProduct function is available in Solr 6.6
but since this was a late addition the documentation does not appear to
Solr 7.0.

Here is a link to the docs in github:
https://github.com/apache/lucene-solr/blob/branch_7_0/solr/solr-ref-guide/src/stream-decorators.adoc

The first stream decorator is the docs the cartesianProduct.

Since you can't sort on the multi-valued field though you'll have use a
hashJoin to do the join.


Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Sep 6, 2017 at 4:08 AM, Marc Röttig <ma...@vico-research.com>
wrote:

> Dear SOLR users,
>
> I want to use streaming expression innerJoin using a multi-valued field to
> do the join by equality, that is having any child documents  (of type
> "child") and
> one parent document (of type "parent") join these according to equality of
> id_s and children_ids
>
> Parent
> * id_s = "p123"
> * type_s = "parent"
> * children_ids_ss = "c1,c2"
>
> Child
> * id_s = "c1"
> * type_s = "child"
>
> Child
> * id_s = "c2"
> * type_s = "child"
>
> innerJoin(
>    search(collection,q="type_s:child",fl="id_s",sort="id_s ASC"),
>    search(collection,q="type_s:parent",fl="id_s,children_ids_ss",sort="id_s
> ASC"),
>    on="id_s=children_ids_ss"
> )
>
> This seems to be impossible, I am getting the following exception
> "java.util.ArrayList cannot be cast to java.lang.Comparable". Using a
> GraphQuery with from and to
> this relationship traversal along multi-valued fields worked (however not
> between shards, this is why I switched to streaming expressions).
>
> Is there any mechanism to flatten the tuples with the multi-valued field
> into new tuples with single-valued fields to get the join working ? Or any
> other tweak.
>
> Note: The relationship between Parent and Child is many-to-many, thus
> moving the foreign-keys to the children as single-valued fields is not
> possible.
>
> The issue is related tot he following issue: http://lucene.472066.n3.
> nabble.com/Using-multi-valued-field-in-solr-cloud-Graph-
> Traversal-Query-td4324379.html
>
> Thanks a lot in advance for any assistance,
> Marc
>
>
> Dr. Marc Röttig
> Software Developer
> EMail: marc.roettig@vico-research.com
> Telefon: +49(0)711. 78 78 29-290
> Fax +49(0)711. 78 78 29-10
>
> VICO Research & Consulting GmbH
> Friedrich-List-Strasse 46 / 70771 Leinfelden-Echterdingen
>
> Homepage:         www.vico-research.com/
> Blog:                       www.vico-research.com/expert-talk
> Twitter:                 www.twitter.com/vico_news
> Facebook:            www.facebook.com/vico.friend
> Sitz der Gesellschaft: Leinfelden-Echterdingen
> Amtsgericht Stuttgart, HRB 720896
> Geschäftsführer: Marc Trömel
>
>