You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Mugeesh Husain <mu...@gmail.com> on 2015/12/07 13:42:02 UTC

Joins with SolrCloud

I have create 3 cores  on same machine using solrlcoud.
core: Restaurant,User,Review 
each of core has only 1 shards and 2 replicas.

Question
1.) It is possible to use join among 3 of cores on same machine( or
different machine)
2.)I am struggling how to use join among 3 of core in solrlcoud mode.

Client: is not interested to de-normalized data.

Give some suggestion how to solved that problem.

Thanks
Mugeesh



--
View this message in context: http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Joins with SolrCloud

Posted by Joel Bernstein <jo...@gmail.com>.

You can also do the innerJoin in parallel across worker nodes using the
parallel function:

hashJoin(
                parallel(workerCollection,
                            innerJoin(
                                            search(users, q="*:*",
fl="userId, full_name, hometown", sort="userId asc", zkHost="zk2:2345",
qt="/export" partitionKeys="userId"),
                                            search(reviews, q="*:*",
fl="userId, review, score", sort="userId asc", zkHost="zk1:2345",
qt="/export" partitionKeys="userId"),
                                            on="userId"
                                            ),
                             workers="20",
                             zkHost="zk1:2345",
                             sort="userId asc"
                             ),
               hashed=search(restaurants, q="city:nyc",
fl="restaurantId, restaurantName",
sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
               on="restaurantId"
)

The parallel function will return the tuples from the innerJoin which is
performed on 20 workers in this example. The worker nodes will be selected
from "workerCollection" which can be any SolrCloud collection with enough
nodes. The "partitionKeys" parameter has been added to searches so that
results with the same userId are shuffled to the same worker node.



Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Dec 11, 2015 at 11:00 AM, Dennis Gove <dp...@gmail.com> wrote:

> Mugeesh,
>
> You can use Streaming Aggregation to provide various types of
> cross-collection joins. This is currently available in trunk and will be a
> part of Solr 6.
>
> To follow with your example, let's assume the following setup:
> Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345
> Users: avail on machine2:8983 with 2 shards, zk at zk2:2345
> Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345
>
> You could send a streaming query to solr that would return all reviews for
> restaurants in NYC and include the user's hometown
>
> hashJoin(
>   innerJoin(
>     search(users, q="*:*", fl="userId, full_name, hometown", sort="userId
> asc", zkHost="zk2:2345", qt="/export"),
>     search(reviews, q="*:*", fl="userId, review, score", sort="userId asc",
> zkHost="zk1:2345", qt="/export"),
>     on="userId"
>   ),
>   hashed=search(restaurants, q="city:nyc", fl="restaurantId,
> restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
>   on="restaurantId"
> )
>
> Note that the # of shards doesn't matter and doesn't need to be considered
> as a part of your query. Were you to send this off to a url for result,
> it'd look like this
>
> http://machine1:8983/solr/users/stream?stream=
> <http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers
> >[the
> expression above]
>
> Additional information about Streaming API, Streaming Aggregation, and
> Streaming Expressions can be found at
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions,
> though this is currently incomplete as a lot of the new features have yet
> to be added to the documentation.
>
> For those interested, joins were added under tickets
> https://issues.apache.org/jira/browse/SOLR-7584 and
> https://issues.apache.org/jira/browse/SOLR-8188.
>
> - Dennis
>
>
> On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain <mu...@gmail.com> wrote:
>
> > I have create 3 cores  on same machine using solrlcoud.
> > core: Restaurant,User,Review
> > each of core has only 1 shards and 2 replicas.
> >
> > Question
> > 1.) It is possible to use join among 3 of cores on same machine( or
> > different machine)
> > 2.)I am struggling how to use join among 3 of core in solrlcoud mode.
> >
> > Client: is not interested to de-normalized data.
> >
> > Give some suggestion how to solved that problem.
> >
> > Thanks
> > Mugeesh
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: Joins with SolrCloud

Posted by Dennis Gove <dp...@gmail.com>.

Something I forgot to mention - the collection shards can live on any
number of machines, anywhere in the world. As long as the clusterstate in
zk knows where the shard can be found (ie, a basis of SolrCloud) then
everything will work. The example I gave had the shards living on the same
machine but that is not a requirement.

On Fri, Dec 11, 2015 at 11:00 AM, Dennis Gove <dp...@gmail.com> wrote:

> Mugeesh,
>
> You can use Streaming Aggregation to provide various types of
> cross-collection joins. This is currently available in trunk and will be a
> part of Solr 6.
>
> To follow with your example, let's assume the following setup:
> Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345
> Users: avail on machine2:8983 with 2 shards, zk at zk2:2345
> Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345
>
> You could send a streaming query to solr that would return all reviews for
> restaurants in NYC and include the user's hometown
>
> hashJoin(
>   innerJoin(
>     search(users, q="*:*", fl="userId, full_name, hometown", sort="userId
> asc", zkHost="zk2:2345", qt="/export"),
>     search(reviews, q="*:*", fl="userId, review, score", sort="userId
> asc", zkHost="zk1:2345", qt="/export"),
>     on="userId"
>   ),
>   hashed=search(restaurants, q="city:nyc", fl="restaurantId,
> restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
>   on="restaurantId"
> )
>
> Note that the # of shards doesn't matter and doesn't need to be considered
> as a part of your query. Were you to send this off to a url for result,
> it'd look like this
>
> http://machine1:8983/solr/users/stream?stream=
> <http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers>[the
> expression above]
>
> Additional information about Streaming API, Streaming Aggregation, and
> Streaming Expressions can be found at
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions,
> though this is currently incomplete as a lot of the new features have yet
> to be added to the documentation.
>
> For those interested, joins were added under tickets
> https://issues.apache.org/jira/browse/SOLR-7584 and
> https://issues.apache.org/jira/browse/SOLR-8188.
>
> - Dennis
>
>
> On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain <mu...@gmail.com> wrote:
>
>> I have create 3 cores  on same machine using solrlcoud.
>> core: Restaurant,User,Review
>> each of core has only 1 shards and 2 replicas.
>>
>> Question
>> 1.) It is possible to use join among 3 of cores on same machine( or
>> different machine)
>> 2.)I am struggling how to use join among 3 of core in solrlcoud mode.
>>
>> Client: is not interested to de-normalized data.
>>
>> Give some suggestion how to solved that problem.
>>
>> Thanks
>> Mugeesh
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>

Re: Joins with SolrCloud

Posted by Dennis Gove <dp...@gmail.com>.

Mugeesh,

You can use Streaming Aggregation to provide various types of
cross-collection joins. This is currently available in trunk and will be a
part of Solr 6.

To follow with your example, let's assume the following setup:
Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345
Users: avail on machine2:8983 with 2 shards, zk at zk2:2345
Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345

You could send a streaming query to solr that would return all reviews for
restaurants in NYC and include the user's hometown

hashJoin(
  innerJoin(
    search(users, q="*:*", fl="userId, full_name, hometown", sort="userId
asc", zkHost="zk2:2345", qt="/export"),
    search(reviews, q="*:*", fl="userId, review, score", sort="userId asc",
zkHost="zk1:2345", qt="/export"),
    on="userId"
  ),
  hashed=search(restaurants, q="city:nyc", fl="restaurantId,
restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
  on="restaurantId"
)

Note that the # of shards doesn't matter and doesn't need to be considered
as a part of your query. Were you to send this off to a url for result,
it'd look like this

http://machine1:8983/solr/users/stream?stream=
<http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers>[the
expression above]

Additional information about Streaming API, Streaming Aggregation, and
Streaming Expressions can be found at
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions,
though this is currently incomplete as a lot of the new features have yet
to be added to the documentation.

For those interested, joins were added under tickets
https://issues.apache.org/jira/browse/SOLR-7584 and
https://issues.apache.org/jira/browse/SOLR-8188.

- Dennis

On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain <mu...@gmail.com> wrote:

> I have create 3 cores  on same machine using solrlcoud.
> core: Restaurant,User,Review
> each of core has only 1 shards and 2 replicas.
>
> Question
> 1.) It is possible to use join among 3 of cores on same machine( or
> different machine)
> 2.)I am struggling how to use join among 3 of core in solrlcoud mode.
>
> Client: is not interested to de-normalized data.
>
> Give some suggestion how to solved that problem.
>
> Thanks
> Mugeesh
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>