You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Santamauro <da...@gmail.com> on 2014/01/02 13:07:33 UTC
combining cores into a collection
Hi,
I have a few cores on the same machine that share the schema.xml and
solrconfig.xml from an earlier setup. Basically from the older
distribution method of using
shards=localhost:1234/core1,localhost:1234/core2[,etc]
for searching.
They are unique sets of documents, i.e., no overlap of uniqueId between
cores and they were indexed with SOLR 4.1.
Is there a way to combine those cores into a collection, maybe through
the collections API? They are loaded with a lot of data so avoiding a
reload is of the utmost importance.
thanks,
David
Re: combining cores into a collection
Posted by Chris Hostetter <ho...@fucit.org>.
: I managed to assign the individual cores to a collection using the collection
: API to create the collection and then the solr.xml to define the core(s) and
: it's collection. This *seemed* to work. I even test indexed a set of documents
: checking totals before and after as well as content. Again, this *seemed* to
: work.
You've basically created a collection alias but done it in a way that solr
doesn't realize it's a collection alias, and thinks it's a real collection
that uses normal routing...
: Did I get lucky that all 5k documents were coincidentally found in the
: appropriate core(s)? Have I possibly corrupted one or more cores? They are a
: working copy so nothing would be lost.
you've gotten lucky in the sense that you haven't attempted any update
operations yet -- basic queries don't care about the doc routing, so no
code paths have been run that will freak out yet. AS soon as you start
trying to update the index (adding new docs, replacing docs, etc...) you
are going to start getting inconsistencies ... docs with the same
unqiueKey on differnet shards, being unable to delete docs by uniqueKey
(because the delete request only gets forwarded to one shard, and it's not
the one where you have the doc), etc...
: Yes this works but isn't this really just a convenient way to avoid the shard
: parameter on /select?
basically - that's pretty much the whole point of collection aliases.
it's a server side control to save the client from needing to specify an
list of collections at query time. you can centrally decide what an
alias means, and when to redefine it w/o your clients knowing anything
changed.
-Hoss
http://www.lucidworks.com/
Re: combining cores into a collection
Posted by David Santamauro <da...@gmail.com>.
On 01/02/2014 12:44 PM, Chris Hostetter wrote:
>
> : Not really ... uptime is irrelevant because they aren't in production. I just
> : don't want to spend the time reloading 1TB of documents.
>
> terminologiy confusion: you mean you don't wnat to *reindex* all of the
> documents ... in solr "reloading" a core means something specific &
> different from what you are talking about, and is what michael.boom was
> refering to.
quite correct, sorry. reindex the core(s), not reload the core(s).
> : I want to bring them all into a cloud collection. Assume I have 3 cores/shards
> :
> : core1
> : core2
> : core3
>
> You can't convert arbitrary cores into shards of a new collection, because
> the document routing logic (which dictates what shard a doc lives in based
> on it's uniqueKey) won't make sense.
I guess this is the heart if the issue.
I managed to assign the individual cores to a collection using the
collection API to create the collection and then the solr.xml to define
the core(s) and it's collection. This *seemed* to work. I even test
indexed a set of documents checking totals before and after as well as
content. Again, this *seemed* to work.
Did I get lucky that all 5k documents were coincidentally found in the
appropriate core(s)? Have I possibly corrupted one or more cores? They
are a working copy so nothing would be lost.
> : I want to be able to address all three as if they were shards of a collection,
> : something like.
>
> w/o reindexing, one thing you could do is create a single collection for
> each of your cores, and then create a collection alias over all three of
> these collections...
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection
Yes this works but isn't this really just a convenient way to avoid the
shard parameter on /select?
> if you want to just be able to shove docs to a single "collection" in solr
> cloud and have them replace docs with the same uniqueKey then you're
yes, this is what I was hoping I could do.
> going to need to either: re-index using SolrCloud so the default document
> routing is done properly up front; implement a custom doc router that
> knows about whatever rules you used to decide what would be in core1,
> core2, core3.
I was afraid of that, but see question above about what I've done and
index consistency.
Thanks for the insight.
David
Re: combining cores into a collection
Posted by Chris Hostetter <ho...@fucit.org>.
: Not really ... uptime is irrelevant because they aren't in production. I just
: don't want to spend the time reloading 1TB of documents.
terminologiy confusion: you mean you don't wnat to *reindex* all of the
documents ... in solr "reloading" a core means something specific &
different from what you are talking about, and is what michael.boom was
refering to.
: I want to bring them all into a cloud collection. Assume I have 3 cores/shards
:
: core1
: core2
: core3
You can't convert arbitrary cores into shards of a new collection, because
the document routing logic (which dictates what shard a doc lives in based
on it's uniqueKey) won't make sense.
you could theoretically implement a custom router class that knows about
whatever rules you've used in the past to decide docs go in core1, core2,
core3, etc... but that would probably be fairly complicated.
: I want to be able to address all three as if they were shards of a collection,
: something like.
w/o reindexing, one thing you could do is create a single collection for
each of your cores, and then create a collection alias over all three of
these collections...
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection
: I want to be able to load to collection1. search collection1 etc.
can you elaborate on what you mean by "load to collection1" ... being
about to "search collection1" (where collection1 is an alias for
core1_collection, core2_collection, core3_collection) would be easy ...
but understanding what your goal is moving forward with "loading" is
important.
if you want to just be able to shove docs to a single "collection" in solr
cloud and have them replace docs with the same uniqueKey then you're
going to need to either: re-index using SolrCloud so the default document
routing is done properly up front; implement a custom doc router that
knows about whatever rules you used to decide what would be in core1,
core2, core3.
-Hoss
http://www.lucidworks.com/
Re: combining cores into a collection
Posted by David Santamauro <da...@gmail.com>.
On 01/02/2014 08:29 AM, michael.boom wrote:
> Hi David,
>
> "They are loaded with a lot of data so avoiding a reload is of the utmost
> importance."
> Well, reloading a core won't cause any data loss. Is it 100% availability
> during the process is what you need?
Not really ... uptime is irrelevant because they aren't in production. I
just don't want to spend the time reloading 1TB of documents.
Basically, I have a bunch of (previously known as ... ) shards on one
machine (I'd like them to stay on one machine) that aren't associated
with a SolrCloud. I query them using
shards=localhost:1234/core1,localhost:1234/core2[,etc...]
My current loading logic doesn't matter but rest assured, there are no
duplicate uniqueIds across each shard.
I want to bring them all into a cloud collection. Assume I have 3
cores/shards
core1
core2
core3
as above, I currently query them as:
/core1?q=*:*&shards=localhost:1234/core2,localhost:1234/core3
I want to be able to address all three as if they were shards of a
collection, something like.
collection1
=> shard1 (was core1)
=> shard2 (was core2)
=> shard3 (was core3)
I want to be able to load to collection1. search collection1 etc.
I've tried
/collections?action=CREATE&name=collection1&shards=core1,core2,core3
.. but it doesn't actually recognize the existing cores.
thanks
Re: combining cores into a collection
Posted by "michael.boom" <my...@yahoo.com>.
Hi David,
"They are loaded with a lot of data so avoiding a reload is of the utmost
importance."
Well, reloading a core won't cause any data loss. Is it 100% availability
during the process is what you need?
-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/combining-cores-into-a-collection-tp4109090p4109101.html
Sent from the Solr - User mailing list archive at Nabble.com.