You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Markus Kalkbrenner (JIRA)" <ji...@apache.org> on 2018/06/20 08:05:00 UTC

[jira] [Commented] (SOLR-10512) Innerjoin streaming expressions - Invalid JoinStream error

    [ https://issues.apache.org/jira/browse/SOLR-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517895#comment-16517895 ] 

Markus Kalkbrenner commented on SOLR-10512:
-------------------------------------------

We now did a lot of complex streaming expressions. Whenever you "combine" two streams (innerJoin, intersect, merge, ...) you have to ensure that both streams are sorted the same way!
If one stream isn't sorted the required way you have to wrap it in a sort() command, just like in my last comment.

In the Solr PHP libraries I maintain we now provide convenience methods that ensure that the two streams are sorted correctly by the fields used in the "on" clause.

I think that we can convert this bug report into a documentation task. I can't find the importance of sorts and left/right fields described in the handbook.

> Innerjoin streaming expressions - Invalid JoinStream error
> ----------------------------------------------------------
>
>                 Key: SOLR-10512
>                 URL: https://issues.apache.org/jira/browse/SOLR-10512
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>    Affects Versions: 6.4.2, 6.5
>         Environment: Debian Jessie
>            Reporter: Dominique Béjean
>            Priority: Major
>
> It looks like innerJoin streaming expression do not work as explained in documentation. An invalid JoinStream error occurs.
> {noformat}
> curl --data-urlencode 'expr=innerJoin(
>                                 search(books, 
>                                    q="*:*", 
>                                    fl="id", 
>                                    sort="id asc"),
>                                 searchreviews, 
>                                    q="*:*", 
>                                    fl="id_book_s", 
>                                    sort="id_book_s asc"),     
>                                 on="id=id_books_s"
>                             )' http://localhost:8983/solr/books/stream
> 							
> {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all incoming stream comparators (sort) must be a superset of this stream's equalitor.","EOF":true}]}}			
> {noformat}
> It is tottaly similar to the documentation example
> 
> {noformat}
> innerJoin(
>   search(people, q=*:*, fl="personId,name", sort="personId asc"),
>   search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
>   on="personId=ownerId"
> )
> {noformat}
> Queries on each collection give :
> {noformat}
> $ curl --data-urlencode 'expr=search(books, 
>                                    q="*:*", 
>                                    fl="id, title_s, pubyear_i", 
>                                    sort="pubyear_i asc", 
>                                    qt="/export")' http://localhost:8983/solr/books/stream
> {
>   "result-set": {
>     "docs": [
>       {
>         "title_s": "Friends",
>         "pubyear_i": 1994,
>         "id": "book2"
>       },
>       {
>         "title_s": "The Way of Kings",
>         "pubyear_i": 2010,
>         "id": "book1"
>       },
>       {
>         "EOF": true,
>         "RESPONSE_TIME": 16
>       }
>     ]
>   }
> }
> $ curl --data-urlencode 'expr=search(reviews, 
>                                    q="author_s:d*", 
>                                    fl="id, id_book_s, stars_i, review_dt", 
>                                    sort="id_book_s asc", 
>                                    qt="/export")' http://localhost:8983/solr/reviews/stream
> 								   
> {
>   "result-set": {
>     "docs": [
>       {
>         "stars_i": 3,
>         "id": "book1_c2",
>         "id_book_s": "book1",
>         "review_dt": "2014-03-15T12:00:00Z"
>       },
>       {
>         "stars_i": 4,
>         "id": "book1_c3",
>         "id_book_s": "book1",
>         "review_dt": "2014-12-15T12:00:00Z"
>       },
>       {
>         "stars_i": 3,
>         "id": "book2_c2",
>         "id_book_s": "book2",
>         "review_dt": "1994-03-15T12:00:00Z"
>       },
>       {
>         "stars_i": 4,
>         "id": "book2_c3",
>         "id_book_s": "book2",
>         "review_dt": "1994-12-15T12:00:00Z"
>       },
>       {
>         "EOF": true,
>         "RESPONSE_TIME": 47
>       }
>     ]
>   }
> }
> {noformat}
> After more tests, I just had to invert the "on" clause to make it work
> {noformat}
> curl --data-urlencode 'expr=innerJoin(
>                                 search(books, 
>                                    q="*:*", 
>                                    fl="id", 
>                                    sort="id asc"),
>                                 searchreviews, 
>                                    q="*:*", 
>                                    fl="id_book_s", 
>                                    sort="id_book_s asc"),     
>                                 on="id_books_s=id"
>                             )' http://localhost:8983/solr/books/stream
> 
> {
>   "result-set": {
>     "docs": [
>       {
>         "title_s": "The Way of Kings",
>         "pubyear_i": 2010,
>         "stars_i": 5,
>         "id": "book1",
>         "id_book_s": "book1",
>         "review_dt": "2015-01-03T14:30:00Z"
>       },
>       {
>         "title_s": "The Way of Kings",
>         "pubyear_i": 2010,
>         "stars_i": 3,
>         "id": "book1",
>         "id_book_s": "book1",
>         "review_dt": "2014-03-15T12:00:00Z"
>       },
>       {
>         "title_s": "The Way of Kings",
>         "pubyear_i": 2010,
>         "stars_i": 4,
>         "id": "book1",
>         "id_book_s": "book1",
>         "review_dt": "2014-12-15T12:00:00Z"
>       },
>       {
>         "title_s": "Friends",
>         "pubyear_i": 1994,
>         "stars_i": 5,
>         "id": "book2",
>         "id_book_s": "book2",
>         "review_dt": "1995-01-03T14:30:00Z"
>       },
>       {
>         "title_s": "Friends",
>         "pubyear_i": 1994,
>         "stars_i": 3,
>         "id": "book2",
>         "id_book_s": "book2",
>         "review_dt": "1994-03-15T12:00:00Z"
>       },
>       {
>         "title_s": "Friends",
>         "pubyear_i": 1994,
>         "stars_i": 4,
>         "id": "book2",
>         "id_book_s": "book2",
>         "review_dt": "1994-12-15T12:00:00Z"
>       },
>       {
>         "EOF": true,
>         "RESPONSE_TIME": 35
>       }
>     ]
>   }
> }
> {noformat}
> However, I don't understand the reason as in debug mode I see the isValidTupleOrder method should return true in both case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org