You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michal Krajňanský <mi...@gmail.com> on 2014/11/10 14:03:39 UTC

Lucene to Solrcloud migration

Hi All,

I have been working on a project that has long employed Lucene indexer.

Currently, the system implements a proprietary document routing and index
plugging/unplugging on top of the Lucene and of course contains a great
body of indexes. Recently an idea came up to migrate from Lucene to
Solrcloud, which appears to be more powerfull that our proprietary system.

Could you suggest the best way to seamlessly migrate the system to use
Solrcloud, when the reindexing is not an option?

- all the existing indexes represent a single collection in terms of
Solrcloud
- the documents are organized in "shards" according to date (integer) and
language (a possibly extensible discrete set)
- the indexes are disjunct

I have been able to convert the existing indexes to the newest Lucene
version and plug them individually into the Solrcloud. However, there is
the question of routing, sharding etc.

Any insight appreciated.

Best,


Michal Krajnansky

Re: Lucene to Solrcloud migration

Posted by Erick Erickson <er...@gmail.com>.
bq:  So I guess with compositeId router I am out of luck.

No, not at all. Atomic updates are exactly about updating
a doc and NOT changing the id. A different uniqueKey is
a different doc by definition.

So you can easily use atomic updates with composite IDs
since you are changing a field of an existing doc as long
as the router bits are the same.

But that may be irrelevant....

Take a look at LotsOfCores (WARNING! this is NOT
verified in SolrCloud!). The design there is exactly to
limit the number of simultaneous cores in memory, having
them load/unload themselves based on the limits you set up.
So you can just fire queries blindly at your server where the
URL includes the core name and be confident that you'll stay
within your hardware limits.

http://wiki.apache.org/solr/LotsOfCores

If you're using SolrCloud, though, there's really no concept
of unloading specific cores/indexes at once, it really pre-supposes
that you've scaled your system such that you can have them all
active at once. So I don't really see how routing to specific cores
is going to help you.

Then again I don't know your problem space.

Best,
Erick

On Tue, Nov 11, 2014 at 11:33 AM, Michal Krajňanský
<mi...@gmail.com> wrote:
> Hm. So I found that one can update stored fields with "atomic update"
> operation, however according to
> http://stackoverflow.com/questions/19058795/it-is-possible-to-update-uniquekey-in-solr-4
> this will not work for uniqueKey. So I guess with compositeId router I am
> out of luck.
>
> I have been also searching for a way to implement my own routing mechanism.
> Anyway, this seem to be a cleaner solution -- I would not need to modify
> existing index, just compute hash from the other (stored) fields than just
> document id. Can you confirm that it is possible? The documentation is
> however very modest (I only found that it is possible to specify custom
> hash function).
>
> Best,
>
> Michal
>
> 2014-11-11 16:48 GMT+01:00 Michael Della Bitta <
> michael.della.bitta@appinions.com>:
>
>> Yeah, Erick confused me a bit too, but I think what he's talking about
>> takes for granted that you'd have your various indexes directly set up as
>> individual collections.
>>
>> If instead you're considering one big collection, or a few collections
>> based on aggregations of your individual indexes, having big, multisharded
>> collections using compositeId should work, unless there's a use case we're
>> not discussing.
>>
>> Michael
>>
>>
>> On 11/11/14 10:27, Michal Krajňanský wrote:
>>
>>> Hi Eric, Michael,
>>>
>>> thank you both for your comments.
>>>
>>> 2014-11-11 5:05 GMT+01:00 Erick Erickson <er...@gmail.com>:
>>>
>>>  bq: - the documents are organized in "shards" according to date (integer)
>>>> and
>>>> language (a possibly extensible discrete set)
>>>>
>>>> bq: - the indexes are disjunct
>>>>
>>>> OK, I'm having a hard time getting my head around these two statements.
>>>>
>>>> If the indexes are disjunct in the sense that you only search one at a
>>>> time,
>>>> then they are different "collections" in SolrCloud jargon.
>>>>
>>>>
>>>>  I just meant that every document is contained in a single one of the
>>> indexes. I have a lot of Lucene indexes for various [language X timespan],
>>> but logically we are speaking about a single huge index. That is why I
>>> thought it would be natural to represent is as a single SolrCloud
>>> collection.
>>>
>>> If, on the other hand, these are a big collection and you want to search
>>>
>>>> them all with a single query, I suggest that in SolrCloud land you don't
>>>> want them to be discrete shards. My reasoning here is that let's say you
>>>> have a bunch of documents for October, 2014 in Spanish. By putting these
>>>> all on a single shard, your queries all have to be serviced by that one
>>>> shard. You don't get any parallelism.
>>>>
>>>>
>>>>  That is right. Actually the parallelization is not the main issue right
>>> now. The queries are very sparse, currently our system does not support
>>> load balancing at all. I imagined that in the future it could be
>>> achievable
>>> via SolrCloud replication.
>>>
>>> The main consideration is to be able to plug the indexes in and out on
>>> demand. The total size of the data is in terabytes. We usually want to
>>> search only the latest indexes but occassionally it is needed to plug in
>>> one of the older ones.
>>>
>>> Maybe (probably) I still have some misconceptions about the uses of
>>> SolrCloud...
>>>
>>> If it really does make sense in your case to route all the doc to a
>>>
>>>> single shard,
>>>> then Michael's comment is spot-on use compositeId router.
>>>>
>>>>
>>>>  You confuse me here. I was not thinking about a single shard, on the
>>> contrary, any [language X timespan] index would be itself a shard. I agree
>>> that compositeId router seems to be natural for what I need. I am
>>> currently
>>> searching for the way to convert my indexes in such way that my document
>>> ID's have the composite format. Currently these are just unique integers,
>>> so I would like to prefix all the document ID's of an index with it's
>>> language and timespan. I do not know how, but I believe this should be
>>> possible, as it is a constant operation that would not change the
>>> structure
>>> of the index.
>>>
>>> Best,
>>>
>>> Michal
>>>
>>>
>>>
>>>  Best,
>>>> Erick
>>>>
>>>> On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta
>>>> <mi...@appinions.com> wrote:
>>>>
>>>>> Hi Michal,
>>>>>
>>>>> Is there a particular reason to shard your collections like that? If it
>>>>>
>>>> was
>>>>
>>>>> mainly for ease of operations, I'd consider just using CompositeId to
>>>>> prevent specific types of queries hotspotting particular nodes.
>>>>>
>>>>> If your ingest rate is fast, you might also consider making each
>>>>> "collection" an alias that points to many actual collections, and
>>>>> periodically closing off a collection and starting a new one. This
>>>>>
>>>> prevents
>>>>
>>>>> cache churn and the impact of large merges.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>> On 11/10/14 08:03, Michal Krajňanský wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have been working on a project that has long employed Lucene indexer.
>>>>>>
>>>>>> Currently, the system implements a proprietary document routing and
>>>>>>
>>>>> index
>>>>
>>>>> plugging/unplugging on top of the Lucene and of course contains a great
>>>>>> body of indexes. Recently an idea came up to migrate from Lucene to
>>>>>> Solrcloud, which appears to be more powerfull that our proprietary
>>>>>>
>>>>> system.
>>>>
>>>>> Could you suggest the best way to seamlessly migrate the system to use
>>>>>> Solrcloud, when the reindexing is not an option?
>>>>>>
>>>>>> - all the existing indexes represent a single collection in terms of
>>>>>> Solrcloud
>>>>>> - the documents are organized in "shards" according to date (integer)
>>>>>>
>>>>> and
>>>>
>>>>> language (a possibly extensible discrete set)
>>>>>> - the indexes are disjunct
>>>>>>
>>>>>> I have been able to convert the existing indexes to the newest Lucene
>>>>>> version and plug them individually into the Solrcloud. However, there
>>>>>> is
>>>>>> the question of routing, sharding etc.
>>>>>>
>>>>>> Any insight appreciated.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>>
>>>>>> Michal Krajnansky
>>>>>>
>>>>>>
>>

Re: Lucene to Solrcloud migration

Posted by Michal Krajňanský <mi...@gmail.com>.
Hm. So I found that one can update stored fields with "atomic update"
operation, however according to
http://stackoverflow.com/questions/19058795/it-is-possible-to-update-uniquekey-in-solr-4
this will not work for uniqueKey. So I guess with compositeId router I am
out of luck.

I have been also searching for a way to implement my own routing mechanism.
Anyway, this seem to be a cleaner solution -- I would not need to modify
existing index, just compute hash from the other (stored) fields than just
document id. Can you confirm that it is possible? The documentation is
however very modest (I only found that it is possible to specify custom
hash function).

Best,

Michal

2014-11-11 16:48 GMT+01:00 Michael Della Bitta <
michael.della.bitta@appinions.com>:

> Yeah, Erick confused me a bit too, but I think what he's talking about
> takes for granted that you'd have your various indexes directly set up as
> individual collections.
>
> If instead you're considering one big collection, or a few collections
> based on aggregations of your individual indexes, having big, multisharded
> collections using compositeId should work, unless there's a use case we're
> not discussing.
>
> Michael
>
>
> On 11/11/14 10:27, Michal Krajňanský wrote:
>
>> Hi Eric, Michael,
>>
>> thank you both for your comments.
>>
>> 2014-11-11 5:05 GMT+01:00 Erick Erickson <er...@gmail.com>:
>>
>>  bq: - the documents are organized in "shards" according to date (integer)
>>> and
>>> language (a possibly extensible discrete set)
>>>
>>> bq: - the indexes are disjunct
>>>
>>> OK, I'm having a hard time getting my head around these two statements.
>>>
>>> If the indexes are disjunct in the sense that you only search one at a
>>> time,
>>> then they are different "collections" in SolrCloud jargon.
>>>
>>>
>>>  I just meant that every document is contained in a single one of the
>> indexes. I have a lot of Lucene indexes for various [language X timespan],
>> but logically we are speaking about a single huge index. That is why I
>> thought it would be natural to represent is as a single SolrCloud
>> collection.
>>
>> If, on the other hand, these are a big collection and you want to search
>>
>>> them all with a single query, I suggest that in SolrCloud land you don't
>>> want them to be discrete shards. My reasoning here is that let's say you
>>> have a bunch of documents for October, 2014 in Spanish. By putting these
>>> all on a single shard, your queries all have to be serviced by that one
>>> shard. You don't get any parallelism.
>>>
>>>
>>>  That is right. Actually the parallelization is not the main issue right
>> now. The queries are very sparse, currently our system does not support
>> load balancing at all. I imagined that in the future it could be
>> achievable
>> via SolrCloud replication.
>>
>> The main consideration is to be able to plug the indexes in and out on
>> demand. The total size of the data is in terabytes. We usually want to
>> search only the latest indexes but occassionally it is needed to plug in
>> one of the older ones.
>>
>> Maybe (probably) I still have some misconceptions about the uses of
>> SolrCloud...
>>
>> If it really does make sense in your case to route all the doc to a
>>
>>> single shard,
>>> then Michael's comment is spot-on use compositeId router.
>>>
>>>
>>>  You confuse me here. I was not thinking about a single shard, on the
>> contrary, any [language X timespan] index would be itself a shard. I agree
>> that compositeId router seems to be natural for what I need. I am
>> currently
>> searching for the way to convert my indexes in such way that my document
>> ID's have the composite format. Currently these are just unique integers,
>> so I would like to prefix all the document ID's of an index with it's
>> language and timespan. I do not know how, but I believe this should be
>> possible, as it is a constant operation that would not change the
>> structure
>> of the index.
>>
>> Best,
>>
>> Michal
>>
>>
>>
>>  Best,
>>> Erick
>>>
>>> On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta
>>> <mi...@appinions.com> wrote:
>>>
>>>> Hi Michal,
>>>>
>>>> Is there a particular reason to shard your collections like that? If it
>>>>
>>> was
>>>
>>>> mainly for ease of operations, I'd consider just using CompositeId to
>>>> prevent specific types of queries hotspotting particular nodes.
>>>>
>>>> If your ingest rate is fast, you might also consider making each
>>>> "collection" an alias that points to many actual collections, and
>>>> periodically closing off a collection and starting a new one. This
>>>>
>>> prevents
>>>
>>>> cache churn and the impact of large merges.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> On 11/10/14 08:03, Michal Krajňanský wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I have been working on a project that has long employed Lucene indexer.
>>>>>
>>>>> Currently, the system implements a proprietary document routing and
>>>>>
>>>> index
>>>
>>>> plugging/unplugging on top of the Lucene and of course contains a great
>>>>> body of indexes. Recently an idea came up to migrate from Lucene to
>>>>> Solrcloud, which appears to be more powerfull that our proprietary
>>>>>
>>>> system.
>>>
>>>> Could you suggest the best way to seamlessly migrate the system to use
>>>>> Solrcloud, when the reindexing is not an option?
>>>>>
>>>>> - all the existing indexes represent a single collection in terms of
>>>>> Solrcloud
>>>>> - the documents are organized in "shards" according to date (integer)
>>>>>
>>>> and
>>>
>>>> language (a possibly extensible discrete set)
>>>>> - the indexes are disjunct
>>>>>
>>>>> I have been able to convert the existing indexes to the newest Lucene
>>>>> version and plug them individually into the Solrcloud. However, there
>>>>> is
>>>>> the question of routing, sharding etc.
>>>>>
>>>>> Any insight appreciated.
>>>>>
>>>>> Best,
>>>>>
>>>>>
>>>>> Michal Krajnansky
>>>>>
>>>>>
>

Re: Lucene to Solrcloud migration

Posted by Michael Della Bitta <mi...@appinions.com>.
Yeah, Erick confused me a bit too, but I think what he's talking about 
takes for granted that you'd have your various indexes directly set up 
as individual collections.

If instead you're considering one big collection, or a few collections 
based on aggregations of your individual indexes, having big, 
multisharded collections using compositeId should work, unless there's a 
use case we're not discussing.

Michael

On 11/11/14 10:27, Michal Krajňanský wrote:
> Hi Eric, Michael,
>
> thank you both for your comments.
>
> 2014-11-11 5:05 GMT+01:00 Erick Erickson <er...@gmail.com>:
>
>> bq: - the documents are organized in "shards" according to date (integer)
>> and
>> language (a possibly extensible discrete set)
>>
>> bq: - the indexes are disjunct
>>
>> OK, I'm having a hard time getting my head around these two statements.
>>
>> If the indexes are disjunct in the sense that you only search one at a
>> time,
>> then they are different "collections" in SolrCloud jargon.
>>
>>
> I just meant that every document is contained in a single one of the
> indexes. I have a lot of Lucene indexes for various [language X timespan],
> but logically we are speaking about a single huge index. That is why I
> thought it would be natural to represent is as a single SolrCloud
> collection.
>
> If, on the other hand, these are a big collection and you want to search
>> them all with a single query, I suggest that in SolrCloud land you don't
>> want them to be discrete shards. My reasoning here is that let's say you
>> have a bunch of documents for October, 2014 in Spanish. By putting these
>> all on a single shard, your queries all have to be serviced by that one
>> shard. You don't get any parallelism.
>>
>>
> That is right. Actually the parallelization is not the main issue right
> now. The queries are very sparse, currently our system does not support
> load balancing at all. I imagined that in the future it could be achievable
> via SolrCloud replication.
>
> The main consideration is to be able to plug the indexes in and out on
> demand. The total size of the data is in terabytes. We usually want to
> search only the latest indexes but occassionally it is needed to plug in
> one of the older ones.
>
> Maybe (probably) I still have some misconceptions about the uses of
> SolrCloud...
>
> If it really does make sense in your case to route all the doc to a
>> single shard,
>> then Michael's comment is spot-on use compositeId router.
>>
>>
> You confuse me here. I was not thinking about a single shard, on the
> contrary, any [language X timespan] index would be itself a shard. I agree
> that compositeId router seems to be natural for what I need. I am currently
> searching for the way to convert my indexes in such way that my document
> ID's have the composite format. Currently these are just unique integers,
> so I would like to prefix all the document ID's of an index with it's
> language and timespan. I do not know how, but I believe this should be
> possible, as it is a constant operation that would not change the structure
> of the index.
>
> Best,
>
> Michal
>
>
>
>> Best,
>> Erick
>>
>> On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta
>> <mi...@appinions.com> wrote:
>>> Hi Michal,
>>>
>>> Is there a particular reason to shard your collections like that? If it
>> was
>>> mainly for ease of operations, I'd consider just using CompositeId to
>>> prevent specific types of queries hotspotting particular nodes.
>>>
>>> If your ingest rate is fast, you might also consider making each
>>> "collection" an alias that points to many actual collections, and
>>> periodically closing off a collection and starting a new one. This
>> prevents
>>> cache churn and the impact of large merges.
>>>
>>> Michael
>>>
>>>
>>>
>>> On 11/10/14 08:03, Michal Krajňanský wrote:
>>>> Hi All,
>>>>
>>>> I have been working on a project that has long employed Lucene indexer.
>>>>
>>>> Currently, the system implements a proprietary document routing and
>> index
>>>> plugging/unplugging on top of the Lucene and of course contains a great
>>>> body of indexes. Recently an idea came up to migrate from Lucene to
>>>> Solrcloud, which appears to be more powerfull that our proprietary
>> system.
>>>> Could you suggest the best way to seamlessly migrate the system to use
>>>> Solrcloud, when the reindexing is not an option?
>>>>
>>>> - all the existing indexes represent a single collection in terms of
>>>> Solrcloud
>>>> - the documents are organized in "shards" according to date (integer)
>> and
>>>> language (a possibly extensible discrete set)
>>>> - the indexes are disjunct
>>>>
>>>> I have been able to convert the existing indexes to the newest Lucene
>>>> version and plug them individually into the Solrcloud. However, there is
>>>> the question of routing, sharding etc.
>>>>
>>>> Any insight appreciated.
>>>>
>>>> Best,
>>>>
>>>>
>>>> Michal Krajnansky
>>>>


Re: Lucene to Solrcloud migration

Posted by Michal Krajňanský <mi...@gmail.com>.
Hi Eric, Michael,

thank you both for your comments.

2014-11-11 5:05 GMT+01:00 Erick Erickson <er...@gmail.com>:

> bq: - the documents are organized in "shards" according to date (integer)
> and
> language (a possibly extensible discrete set)
>
> bq: - the indexes are disjunct
>
> OK, I'm having a hard time getting my head around these two statements.
>
> If the indexes are disjunct in the sense that you only search one at a
> time,
> then they are different "collections" in SolrCloud jargon.
>
>
I just meant that every document is contained in a single one of the
indexes. I have a lot of Lucene indexes for various [language X timespan],
but logically we are speaking about a single huge index. That is why I
thought it would be natural to represent is as a single SolrCloud
collection.

If, on the other hand, these are a big collection and you want to search
> them all with a single query, I suggest that in SolrCloud land you don't
> want them to be discrete shards. My reasoning here is that let's say you
> have a bunch of documents for October, 2014 in Spanish. By putting these
> all on a single shard, your queries all have to be serviced by that one
> shard. You don't get any parallelism.
>
>
That is right. Actually the parallelization is not the main issue right
now. The queries are very sparse, currently our system does not support
load balancing at all. I imagined that in the future it could be achievable
via SolrCloud replication.

The main consideration is to be able to plug the indexes in and out on
demand. The total size of the data is in terabytes. We usually want to
search only the latest indexes but occassionally it is needed to plug in
one of the older ones.

Maybe (probably) I still have some misconceptions about the uses of
SolrCloud...

If it really does make sense in your case to route all the doc to a
> single shard,
> then Michael's comment is spot-on use compositeId router.
>
>
You confuse me here. I was not thinking about a single shard, on the
contrary, any [language X timespan] index would be itself a shard. I agree
that compositeId router seems to be natural for what I need. I am currently
searching for the way to convert my indexes in such way that my document
ID's have the composite format. Currently these are just unique integers,
so I would like to prefix all the document ID's of an index with it's
language and timespan. I do not know how, but I believe this should be
possible, as it is a constant operation that would not change the structure
of the index.

Best,

Michal



> Best,
> Erick
>
> On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta
> <mi...@appinions.com> wrote:
> > Hi Michal,
> >
> > Is there a particular reason to shard your collections like that? If it
> was
> > mainly for ease of operations, I'd consider just using CompositeId to
> > prevent specific types of queries hotspotting particular nodes.
> >
> > If your ingest rate is fast, you might also consider making each
> > "collection" an alias that points to many actual collections, and
> > periodically closing off a collection and starting a new one. This
> prevents
> > cache churn and the impact of large merges.
> >
> > Michael
> >
> >
> >
> > On 11/10/14 08:03, Michal Krajňanský wrote:
> >>
> >> Hi All,
> >>
> >> I have been working on a project that has long employed Lucene indexer.
> >>
> >> Currently, the system implements a proprietary document routing and
> index
> >> plugging/unplugging on top of the Lucene and of course contains a great
> >> body of indexes. Recently an idea came up to migrate from Lucene to
> >> Solrcloud, which appears to be more powerfull that our proprietary
> system.
> >>
> >> Could you suggest the best way to seamlessly migrate the system to use
> >> Solrcloud, when the reindexing is not an option?
> >>
> >> - all the existing indexes represent a single collection in terms of
> >> Solrcloud
> >> - the documents are organized in "shards" according to date (integer)
> and
> >> language (a possibly extensible discrete set)
> >> - the indexes are disjunct
> >>
> >> I have been able to convert the existing indexes to the newest Lucene
> >> version and plug them individually into the Solrcloud. However, there is
> >> the question of routing, sharding etc.
> >>
> >> Any insight appreciated.
> >>
> >> Best,
> >>
> >>
> >> Michal Krajnansky
> >>
> >
>

Re: Lucene to Solrcloud migration

Posted by Erick Erickson <er...@gmail.com>.
bq: - the documents are organized in "shards" according to date (integer) and
language (a possibly extensible discrete set)

bq: - the indexes are disjunct

OK, I'm having a hard time getting my head around these two statements.

If the indexes are disjunct in the sense that you only search one at a time,
then they are different "collections" in SolrCloud jargon.

If, on the other hand, these are a big collection and you want to search
them all with a single query, I suggest that in SolrCloud land you don't
want them to be discrete shards. My reasoning here is that let's say you
have a bunch of documents for October, 2014 in Spanish. By putting these
all on a single shard, your queries all have to be serviced by that one
shard. You don't get any parallelism.

If it really does make sense in your case to route all the doc to a
single shard,
then Michael's comment is spot-on use compositeId router.

Best,
Erick

On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta
<mi...@appinions.com> wrote:
> Hi Michal,
>
> Is there a particular reason to shard your collections like that? If it was
> mainly for ease of operations, I'd consider just using CompositeId to
> prevent specific types of queries hotspotting particular nodes.
>
> If your ingest rate is fast, you might also consider making each
> "collection" an alias that points to many actual collections, and
> periodically closing off a collection and starting a new one. This prevents
> cache churn and the impact of large merges.
>
> Michael
>
>
>
> On 11/10/14 08:03, Michal Krajňanský wrote:
>>
>> Hi All,
>>
>> I have been working on a project that has long employed Lucene indexer.
>>
>> Currently, the system implements a proprietary document routing and index
>> plugging/unplugging on top of the Lucene and of course contains a great
>> body of indexes. Recently an idea came up to migrate from Lucene to
>> Solrcloud, which appears to be more powerfull that our proprietary system.
>>
>> Could you suggest the best way to seamlessly migrate the system to use
>> Solrcloud, when the reindexing is not an option?
>>
>> - all the existing indexes represent a single collection in terms of
>> Solrcloud
>> - the documents are organized in "shards" according to date (integer) and
>> language (a possibly extensible discrete set)
>> - the indexes are disjunct
>>
>> I have been able to convert the existing indexes to the newest Lucene
>> version and plug them individually into the Solrcloud. However, there is
>> the question of routing, sharding etc.
>>
>> Any insight appreciated.
>>
>> Best,
>>
>>
>> Michal Krajnansky
>>
>

Re: Lucene to Solrcloud migration

Posted by Michael Della Bitta <mi...@appinions.com>.
Hi Michal,

Is there a particular reason to shard your collections like that? If it 
was mainly for ease of operations, I'd consider just using CompositeId 
to prevent specific types of queries hotspotting particular nodes.

If your ingest rate is fast, you might also consider making each 
"collection" an alias that points to many actual collections, and 
periodically closing off a collection and starting a new one. This 
prevents cache churn and the impact of large merges.

Michael


On 11/10/14 08:03, Michal Krajňanský wrote:
> Hi All,
>
> I have been working on a project that has long employed Lucene indexer.
>
> Currently, the system implements a proprietary document routing and index
> plugging/unplugging on top of the Lucene and of course contains a great
> body of indexes. Recently an idea came up to migrate from Lucene to
> Solrcloud, which appears to be more powerfull that our proprietary system.
>
> Could you suggest the best way to seamlessly migrate the system to use
> Solrcloud, when the reindexing is not an option?
>
> - all the existing indexes represent a single collection in terms of
> Solrcloud
> - the documents are organized in "shards" according to date (integer) and
> language (a possibly extensible discrete set)
> - the indexes are disjunct
>
> I have been able to convert the existing indexes to the newest Lucene
> version and plug them individually into the Solrcloud. However, there is
> the question of routing, sharding etc.
>
> Any insight appreciated.
>
> Best,
>
>
> Michal Krajnansky
>