You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dennis Gove (JIRA)" <ji...@apache.org> on 2015/09/16 17:24:46 UTC

[jira] [Comment Edited] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

    [ https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790546#comment-14790546 ] 

Dennis Gove edited comment on SOLR-7584 at 9/16/15 3:23 PM:
------------------------------------------------------------

This supports joining any incoming set of streams. If you have a FacetStream instance (SOLR-7903) then you could absolutely join it with some other stream instance. 

Due to current use of merge-join style it is a requirement that the incoming streams be sorted in a similar order. That said, a hash-join style can relatively easily be added in which case the ordering requirement will go away. I think a hash-join would make a lot of sense for a FacetStream (or really any kind of aggregation stream).

The result of the join is just another stream so you can then feed that into any other stream for further processing (including aggregation for functions like sum and avg). 


was (Author: dpgove):
This supports joining any incoming set of streams. If you have a FacetStream instance (SOLR-7903) then you could absolutely join it with some other stream instance. 

Due to current use of merge-join style it is a requirement that the incoming streams be sorted in a similar order. That said, a hash-join style can relatively easily be added in which case the ordering requirement will go away. I think a hash-join would make a lot of sense for a FacetStream (or really any kind of aggregation stream).

Using the feature added in SOLR-7669 (Add SelectStream to Streaming API) you will be able to apply functions (called operations in that ticket) on the joined data. Currently the only included operation

> Add Joins to the Streaming API and Streaming Expressions
> --------------------------------------------------------
>
>                 Key: SOLR-7584
>                 URL: https://issues.apache.org/jira/browse/SOLR-7584
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrJ
>            Reporter: Dennis Gove
>            Priority: Minor
>              Labels: Streaming
>         Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch
>
>
> Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the Streaming API to allow for joining between sub-streams.
> At its basic, it would look something like this
> {code}
> innerJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
>   on="fieldA=fieldA"
> )
> {code}
> or with multi-field on clauses
> {code}
> innerJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
>   on="fieldA=fieldA, fieldB=fieldD"
> )
> {code}
> I'd also like to support the option of doing a hash join instead of the default merge join but I haven't yet figured out the best way to express that. I'd like to let the user tell us which sub-stream should be hashed (the least-cost one).
> Also, I've been thinking about field aliasing and might want to add a SelectStream which serves the purpose of allowing us to limit the fields coming out and rename fields.
> Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org