You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Santamauro <da...@gmail.com> on 2014/01/02 13:07:33 UTC

combining cores into a collection

Hi,

I have a few cores on the same machine that share the schema.xml and 
solrconfig.xml from an earlier setup. Basically from the older 
distribution method of using
   shards=localhost:1234/core1,localhost:1234/core2[,etc]
for searching.

They are unique sets of documents, i.e., no overlap of uniqueId between 
cores and they were indexed with SOLR 4.1.

Is there a way to combine those cores into a collection, maybe through 
the collections API? They are loaded with a lot of data so avoiding a 
reload is of the utmost importance.

thanks,

David

Re: combining cores into a collection

Posted by Chris Hostetter <ho...@fucit.org>.
: I managed to assign the individual cores to a collection using the collection
: API to create the collection and then the solr.xml to define the core(s) and
: it's collection. This *seemed* to work. I even test indexed a set of documents
: checking totals before and after as well as content. Again, this *seemed* to
: work.

You've basically created a collection alias but done it in a way that solr 
doesn't realize it's a collection alias, and thinks it's a real collection 
that uses normal routing...

: Did I get lucky that all 5k documents were coincidentally found in the
: appropriate core(s)? Have I possibly corrupted one or more cores? They are a
: working copy so nothing would be lost.

you've gotten lucky in the sense that you haven't attempted any update 
operations yet -- basic queries don't care about the doc routing, so no 
code paths have been run that will freak out yet.  AS soon as you start 
trying to update the index (adding new docs, replacing docs, etc...) you 
are going to start getting inconsistencies ... docs with the same 
unqiueKey on differnet shards, being unable to delete docs by uniqueKey 
(because the delete request only gets forwarded to one shard, and it's not 
the one where you have the doc), etc...

: Yes this works but isn't this really just a convenient way to avoid the shard
: parameter on /select?

basically - that's pretty much the whole point of collection aliases.  
it's a server side control to save the client from needing to specify an 
list of collections at query time.  you can centrally decide what an 
alias means, and when to redefine it w/o your clients knowing anything 
changed.



-Hoss
http://www.lucidworks.com/

Re: combining cores into a collection

Posted by David Santamauro <da...@gmail.com>.
On 01/02/2014 12:44 PM, Chris Hostetter wrote:
>
> : Not really ... uptime is irrelevant because they aren't in production. I just
> : don't want to spend the time reloading 1TB of documents.
>
> terminologiy confusion: you mean you don't wnat to *reindex* all of the
> documents ... in solr "reloading" a core means something specific &
> different from what you are talking about, and is what michael.boom was
> refering to.

quite correct, sorry. reindex the core(s), not reload the core(s).

> : I want to bring them all into a cloud collection. Assume I have 3 cores/shards
> :
> :   core1
> :   core2
> :   core3
>
> You can't convert arbitrary cores into shards of a new collection, because
> the document routing logic (which dictates what shard a doc lives in based
> on it's uniqueKey) won't make sense.

I guess this is the heart if the issue.

I managed to assign the individual cores to a collection using the 
collection API to create the collection and then the solr.xml to define 
the core(s) and it's collection. This *seemed* to work. I even test 
indexed a set of documents checking totals before and after as well as 
content. Again, this *seemed* to work.

Did I get lucky that all 5k documents were coincidentally found in the 
appropriate core(s)? Have I possibly corrupted one or more cores? They 
are a working copy so nothing would be lost.

> : I want to be able to address all three as if they were shards of a collection,
> : something like.
>
> w/o reindexing, one thing you could do is create a single collection for
> each of your cores, and then create a collection alias over all three of
> these collections...
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection

Yes this works but isn't this really just a convenient way to avoid the 
shard parameter on /select?

> if you want to just be able to shove docs to a single "collection" in solr
> cloud and have them replace docs with the same uniqueKey then you're

yes, this is what I was hoping I could do.

> going to need to either: re-index using SolrCloud so the default document
> routing is done properly up front; implement a custom doc router that
> knows about whatever rules you used to decide what would be in core1,
> core2, core3.

I was afraid of that, but see question above about what I've done and 
index consistency.

Thanks for the insight.

David


Re: combining cores into a collection

Posted by Chris Hostetter <ho...@fucit.org>.
: Not really ... uptime is irrelevant because they aren't in production. I just
: don't want to spend the time reloading 1TB of documents.

terminologiy confusion: you mean you don't wnat to *reindex* all of the 
documents ... in solr "reloading" a core means something specific & 
different from what you are talking about, and is what michael.boom was 
refering to.


: I want to bring them all into a cloud collection. Assume I have 3 cores/shards
: 
:   core1
:   core2
:   core3

You can't convert arbitrary cores into shards of a new collection, because 
the document routing logic (which dictates what shard a doc lives in based 
on it's uniqueKey) won't make sense.

you could theoretically implement a custom router class that knows about 
whatever rules you've used in the past to decide docs go in core1, core2, 
core3, etc... but that would probably be fairly complicated.

: I want to be able to address all three as if they were shards of a collection,
: something like.

w/o reindexing, one thing you could do is create a single collection for 
each of your cores, and then create a collection alias over all three of 
these collections...

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection

: I want to be able to load to collection1. search collection1 etc.

can you elaborate on what you mean by "load to collection1" ... being 
about to "search collection1" (where collection1 is an alias for 
core1_collection, core2_collection, core3_collection) would be easy ... 
but understanding what your goal is moving forward with "loading" is 
important.

if you want to just be able to shove docs to a single "collection" in solr 
cloud and have them replace docs with the same uniqueKey then you're 
going to need to either: re-index using SolrCloud so the default document 
routing is done properly up front; implement a custom doc router that 
knows about whatever rules you used to decide what would be in core1, 
core2, core3.


-Hoss
http://www.lucidworks.com/

Re: combining cores into a collection

Posted by David Santamauro <da...@gmail.com>.
On 01/02/2014 08:29 AM, michael.boom wrote:
> Hi David,
>
> "They are loaded with a lot of data so avoiding a reload is of the utmost
> importance."
> Well, reloading a core won't cause any data loss. Is it 100% availability
> during the process is what you need?

Not really ... uptime is irrelevant because they aren't in production. I 
just don't want to spend the time reloading 1TB of documents.

Basically, I have a bunch of (previously known as ... ) shards on one 
machine (I'd like them to stay on one machine) that aren't associated 
with a SolrCloud. I query them using

   shards=localhost:1234/core1,localhost:1234/core2[,etc...]

My current loading logic doesn't matter but rest assured, there are no 
duplicate uniqueIds across each shard.

I want to bring them all into a cloud collection. Assume I have 3 
cores/shards

   core1
   core2
   core3

as above, I currently query them as:

   /core1?q=*:*&shards=localhost:1234/core2,localhost:1234/core3

I want to be able to address all three as if they were shards of a 
collection, something like.

collection1
  => shard1 (was core1)
  => shard2 (was core2)
  => shard3 (was core3)

I want to be able to load to collection1. search collection1 etc.

I've tried

/collections?action=CREATE&name=collection1&shards=core1,core2,core3

.. but it doesn't actually recognize the existing cores.

thanks



Re: combining cores into a collection

Posted by "michael.boom" <my...@yahoo.com>.
Hi David,

"They are loaded with a lot of data so avoiding a reload is of the utmost
importance."
Well, reloading a core won't cause any data loss. Is it 100% availability
during the process is what you need? 



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/combining-cores-into-a-collection-tp4109090p4109101.html
Sent from the Solr - User mailing list archive at Nabble.com.