You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sebastian Riemer <s....@littera.eu> on 2016/06/20 16:49:08 UTC

How many cores is too many cores?

Hi,

Currently I have a single solr server handling 5 cores which differ in the content they provide.

However, each of them might hold data for many different clients/customers. Let's say for example one day there might be 300 different clients each storing their data in those 5 cores.

Every client can make backups of his data and import that data back into our system. That however, makes re-indexing all of his documents in the cores necessary, which A) is very slow at the moment since fetching the data from MySQL-DB is slow and B) would slow down searches for all other clients while the reindexing is taking place, right?

Now my idea would be:

What if each client gets his own 5 cores? Then instead of re-indexing I could simply copy back the solr-index files (which I copied while making the backup) into his core-directories, right?

That would lead to about 5 x 300 cores, equals 1500 cores.

Am I insane by thinking that way?

Best regards,
Sebastian


Re: AW: How many cores is too many cores?

Posted by Erick Erickson <er...@gmail.com>.
Good luck! This really requires that you can configure your solr to cache N
cores and don't expect more than N-M users in parallel. Or you don't mind
if some users see periodic slow responses. You can sometimes make a hidden
call when a user signs on to pre-load her core, but again your usage
pattern may not tolerate that.

Finally, note that the lazy-load parameter does NOT require a transient
cache. That avoids the "load all cores at startup" delay.

Best,
Erick
On Jun 21, 2016 4:54 AM, "Sebastian Riemer" <s....@littera.eu> wrote:

> Thanks for your respone Erick!
>
> Currently we are trying to keep things simple so we don't use SolrCloud.
>
> I'll give it a look, configuration seems easy, however testing with many
> clients in parallel seems not so much.
>
> Thanks again,
> Sebastian
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:erickerickson@gmail.com]
> Gesendet: Dienstag, 21. Juni 2016 01:52
> An: solr-user <so...@lucene.apache.org>
> Betreff: Re: How many cores is too many cores?
>
> Sebastian:
>
> It Depends (tm). Solr can handle this, but there are caveats. Is this
> SolrCloud or not? Each core will consume some resources and there are some
> JIRAs out there about specifically that many cores in SolrCloud.
> If your problem space works with the LotsOfCores, start here:
> https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml
> and
> https://cwiki.apache.org/confluence/display/solr/Defining+core.properties
> The idea is that if your access pattern is
> > sign on
> > ask some questions
> > go away
> you can configure that only N cores are loaded at any one time.
> Theoretically you can have a huge number of cores (I've tested with
> 15,000) defined, but only say 100 active at a time.
>
> There are also options you can specify that cause a core to not be loaded
> until requested, but not aged out.
>
> The 1,500 core case will keep Solr from coming up until all of the cores
> have been opened, which can be lengthy. But you can define the number of
> threads that are running in parallel to open the cores....
> but the default is unlimited so you can run out of threads (really
> memory).....
>
> So the real answer is "it's not insane, but you really need to test it
> operationally and tweak a bunch of settings before making your decision"....
>
> Best,
> Erick
>
> On Mon, Jun 20, 2016 at 12:49 PM, Sebastian Riemer <s....@littera.eu>
> wrote:
> > Hi,
> >
> > Currently I have a single solr server handling 5 cores which differ in
> the content they provide.
> >
> > However, each of them might hold data for many different
> clients/customers. Let's say for example one day there might be 300
> different clients each storing their data in those 5 cores.
> >
> > Every client can make backups of his data and import that data back into
> our system. That however, makes re-indexing all of his documents in the
> cores necessary, which A) is very slow at the moment since fetching the
> data from MySQL-DB is slow and B) would slow down searches for all other
> clients while the reindexing is taking place, right?
> >
> > Now my idea would be:
> >
> > What if each client gets his own 5 cores? Then instead of re-indexing I
> could simply copy back the solr-index files (which I copied while making
> the backup) into his core-directories, right?
> >
> > That would lead to about 5 x 300 cores, equals 1500 cores.
> >
> > Am I insane by thinking that way?
> >
> > Best regards,
> > Sebastian
> >
>

AW: How many cores is too many cores?

Posted by Sebastian Riemer <s....@littera.eu>.
Thanks for your respone Erick!

Currently we are trying to keep things simple so we don't use SolrCloud.

I'll give it a look, configuration seems easy, however testing with many clients in parallel seems not so much.

Thanks again,
Sebastian

-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:erickerickson@gmail.com] 
Gesendet: Dienstag, 21. Juni 2016 01:52
An: solr-user <so...@lucene.apache.org>
Betreff: Re: How many cores is too many cores?

Sebastian:

It Depends (tm). Solr can handle this, but there are caveats. Is this SolrCloud or not? Each core will consume some resources and there are some JIRAs out there about specifically that many cores in SolrCloud.
If your problem space works with the LotsOfCores, start here:
https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml
and
https://cwiki.apache.org/confluence/display/solr/Defining+core.properties
The idea is that if your access pattern is
> sign on
> ask some questions
> go away
you can configure that only N cores are loaded at any one time.
Theoretically you can have a huge number of cores (I've tested with
15,000) defined, but only say 100 active at a time.

There are also options you can specify that cause a core to not be loaded until requested, but not aged out.

The 1,500 core case will keep Solr from coming up until all of the cores have been opened, which can be lengthy. But you can define the number of threads that are running in parallel to open the cores....
but the default is unlimited so you can run out of threads (really memory).....

So the real answer is "it's not insane, but you really need to test it operationally and tweak a bunch of settings before making your decision"....

Best,
Erick

On Mon, Jun 20, 2016 at 12:49 PM, Sebastian Riemer <s....@littera.eu> wrote:
> Hi,
>
> Currently I have a single solr server handling 5 cores which differ in the content they provide.
>
> However, each of them might hold data for many different clients/customers. Let's say for example one day there might be 300 different clients each storing their data in those 5 cores.
>
> Every client can make backups of his data and import that data back into our system. That however, makes re-indexing all of his documents in the cores necessary, which A) is very slow at the moment since fetching the data from MySQL-DB is slow and B) would slow down searches for all other clients while the reindexing is taking place, right?
>
> Now my idea would be:
>
> What if each client gets his own 5 cores? Then instead of re-indexing I could simply copy back the solr-index files (which I copied while making the backup) into his core-directories, right?
>
> That would lead to about 5 x 300 cores, equals 1500 cores.
>
> Am I insane by thinking that way?
>
> Best regards,
> Sebastian
>

Re: How many cores is too many cores?

Posted by Erick Erickson <er...@gmail.com>.
Sebastian:

It Depends (tm). Solr can handle this, but there are caveats. Is this
SolrCloud or not? Each core will consume some resources and there are
some JIRAs out there about specifically that many cores in SolrCloud.
If your problem space works with the LotsOfCores, start here:
https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml
and
https://cwiki.apache.org/confluence/display/solr/Defining+core.properties
The idea is that if your access pattern is
> sign on
> ask some questions
> go away
you can configure that only N cores are loaded at any one time.
Theoretically you can have a huge number of cores (I've tested with
15,000) defined, but only say 100 active at a time.

There are also options you can specify that cause a core to not be
loaded until requested, but not aged out.

The 1,500 core case will keep Solr from coming up until all of the
cores have been opened, which can be lengthy. But you can define the
number of threads that are running in parallel to open the cores....
but the default is unlimited so you can run out of threads (really
memory).....

So the real answer is "it's not insane, but you really need to test it
operationally and tweak a bunch of settings before making your
decision"....

Best,
Erick

On Mon, Jun 20, 2016 at 12:49 PM, Sebastian Riemer <s....@littera.eu> wrote:
> Hi,
>
> Currently I have a single solr server handling 5 cores which differ in the content they provide.
>
> However, each of them might hold data for many different clients/customers. Let's say for example one day there might be 300 different clients each storing their data in those 5 cores.
>
> Every client can make backups of his data and import that data back into our system. That however, makes re-indexing all of his documents in the cores necessary, which A) is very slow at the moment since fetching the data from MySQL-DB is slow and B) would slow down searches for all other clients while the reindexing is taking place, right?
>
> Now my idea would be:
>
> What if each client gets his own 5 cores? Then instead of re-indexing I could simply copy back the solr-index files (which I copied while making the backup) into his core-directories, right?
>
> That would lead to about 5 x 300 cores, equals 1500 cores.
>
> Am I insane by thinking that way?
>
> Best regards,
> Sebastian
>