You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Steve Davids <sd...@gmail.com> on 2014/11/06 02:08:21 UTC

Solr Cloud Cross-Core Joins

I have a use-case where I would like to capture click events for individual
users so I can answer questions like show me everything with x text and
that I have clicked before + the inverse of show me everything with x text
that I have *not* clicked. I am currently doing this by sticking the event
into the main index which resides with the rest of the document.

We have recently made some modifications to make a smaller "sub collection"
of the main document index but still would like to ask the same questions,
so I thought a cross-core join with a "click" metadata collection could be
a decent trick so that we can consolidate this quickly changing data in a
separate collection without needing to worry about merging this information
into multiple document collections.

I am trying to write some unit tests to simply stand up multiple
collections (doc + click) in SolrCloud via the collections CREATE API
though this assigns unique core names so a query of "{!join ...
fromIndex=clicks} user=foo" doesn't work since no core name is actually
called "clicks" but rather "clicks_shard1_replica1",
"clicks_shard2_replica1", etc. The CREATE request doesn't allow you to
specify core names, so I attempted to rename the cores via the cores rename
API and it failed
on various rename operations saying Leader Election - Fatal Error, SolrCore
not found: clicks_shard1_replica1  in [clicks]. So since that didn't seem
to work I tried the collections DELETEREPLICA followed by a subsequent
ADDREPLICA, unfortunately when I attempt to set properties for the core ("
property.name=value") on the request it doesn't appear to actually get set
since the core refuses to load due to a solrconfig.xml property
substitution failure.

Long story short, I have been banging my head against the wall to get a
consistent core name via API calls and keep running into gotchyas. I would
like to take a step back and ask if this approach (cross-core metadata
join) is even reasonable in the SolrCloud architecture. If it is still
reasonable does anyone have ideas on how a common core name can be achieved
via API calls? If it isn't an advised approach are there suggestions on an
optimal indexing strategy for this particular scenario?

Thanks for the help,

-Steve

Re: Solr Cloud Cross-Core Joins

Posted by Walter Underwood <wu...@wunderwood.org>.
I am curious why you are trying to do this with Solr. This is straightforward with other systems. I would use HBase for this. This could be really hard with Solr.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/

On Nov 5, 2014, at 5:08 PM, Steve Davids <sd...@gmail.com> wrote:

> I have a use-case where I would like to capture click events for individual
> users so I can answer questions like show me everything with x text and
> that I have clicked before + the inverse of show me everything with x text
> that I have *not* clicked. I am currently doing this by sticking the event
> into the main index which resides with the rest of the document.
> 
> We have recently made some modifications to make a smaller "sub collection"
> of the main document index but still would like to ask the same questions,
> so I thought a cross-core join with a "click" metadata collection could be
> a decent trick so that we can consolidate this quickly changing data in a
> separate collection without needing to worry about merging this information
> into multiple document collections.
> 
> I am trying to write some unit tests to simply stand up multiple
> collections (doc + click) in SolrCloud via the collections CREATE API
> though this assigns unique core names so a query of "{!join ...
> fromIndex=clicks} user=foo" doesn't work since no core name is actually
> called "clicks" but rather "clicks_shard1_replica1",
> "clicks_shard2_replica1", etc. The CREATE request doesn't allow you to
> specify core names, so I attempted to rename the cores via the cores rename
> API and it failed
> on various rename operations saying Leader Election - Fatal Error, SolrCore
> not found: clicks_shard1_replica1  in [clicks]. So since that didn't seem
> to work I tried the collections DELETEREPLICA followed by a subsequent
> ADDREPLICA, unfortunately when I attempt to set properties for the core ("
> property.name=value") on the request it doesn't appear to actually get set
> since the core refuses to load due to a solrconfig.xml property
> substitution failure.
> 
> Long story short, I have been banging my head against the wall to get a
> consistent core name via API calls and keep running into gotchyas. I would
> like to take a step back and ask if this approach (cross-core metadata
> join) is even reasonable in the SolrCloud architecture. If it is still
> reasonable does anyone have ideas on how a common core name can be achieved
> via API calls? If it isn't an advised approach are there suggestions on an
> optimal indexing strategy for this particular scenario?
> 
> Thanks for the help,
> 
> -Steve