You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Manoj Bharadwaj <mb...@gmail.com> on 2014/10/07 14:27:40 UTC

Advise on an architecture with lot of cores

Hi folks,

My team inherited a SOLR setup with an architecture that has a core for
every customer. We have a few different types of cores, say "A", "B", C",
and for each one of this there is a core per customer - namely "A1",
"A2"..., "B1", "B2"... Overall we have over 600 cores. We don't know the
history behind the current design - the exact reasons why it was done the
way it was done - one probable consideration was to ensure a customer data
separate from other.

We want to go to a single core per type architecture, and move on to  SOLR
cloud as well in near future to achieve sharding via the features cloud
provides.

Further aspects such as monitoring become easier as well. We will need to
watch and tune the caches for the different pattern of hits that we see.

Is there anything else to evaluate before we move to a single core per type
setup?

We are using 4.4.0 currently and will be moving to latest 4.10.1 as a part
of the redesign as well.

Regards
Manoj

Re: Advise on an architecture with lot of cores

Posted by Aditya <fi...@gmail.com>.

Hi Manoj

There  are advantages in both the approach. I recently read an article,
http://lucidworks.com/blog/podcast-solr-at-scale-at-aol/ . AOL uses Solr
and it uses one core per user.

Having one core per customer helps you
1. Easily migrate / backup the index
2. Load the core as and when required. When user has signed in, load his
index otherwise you don't need to keep his data in memory.
3. Rebuilding data for particular user is easier

Cons:
1. If most of users are actively siging in and you need to load most of the
cores all the time then it will reduce the search time.
2. Each core will have some set of files and there could be situitation
where you will end up in too many files open exception. (We faced this
scenario).

Having single core for all
1. This reduces the headache of user specific stuff and sees the DB / index
as a black box, where you could query for all
2. When the load is more, shard it

Cons:
1. Rebuilding index will take more time

Regards
Aditya
www.findbestopensource.com

On Tue, Oct 7, 2014 at 8:01 PM, Manoj Bharadwaj <mb...@gmail.com>
wrote:

> Hi Toke,
>
> I don't think I answered your question properly.
>
> With the current 1 core/customer setup many cores are idle. The redesign we
> are working on will move most of our searches to being driven by SOLR vs
> database (current split is 90% database, 10% solr). With that change, all
> cores will see traffic.
>
> We have 25G data in the index (across all cores) and they are currently in
> a 2 core VM with 32G memory. We are making some changes to the schema and
> the analyzers and we see the index size growing by 25% or so due to this.
> And to support this we will be moving to a VM with 4 cores and 64G memory.
> Hardware as such isn't a constraint.
>
> Regards
> Manoj
>
> On Tue, Oct 7, 2014 at 8:47 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
> wrote:
>
> > On Tue, 2014-10-07 at 14:27 +0200, Manoj Bharadwaj wrote:
> > > My team inherited a SOLR setup with an architecture that has a core for
> > > every customer. We have a few different types of cores, say "A", "B",
> C",
> > > and for each one of this there is a core per customer - namely "A1",
> > > "A2"..., "B1", "B2"... Overall we have over 600 cores. We don't know
> the
> > > history behind the current design - the exact reasons why it was done
> the
> > > way it was done - one probable consideration was to ensure a customer
> > data
> > > separate from other.
> >
> > It is not a bad reason. It ensures that ranked search is optimized
> > towards each customer's data and makes it easy to manage adding and
> > removing customers.
> >
> > > We want to go to a single core per type architecture, and move on to
> > SOLR
> > > cloud as well in near future to achieve sharding via the features cloud
> > > provides.
> >
> > If the setup is heavy queried on most of the cores or is there are
> > core-spanning searches, collapsing the user-specific cores into fewer
> > super-cores might lower hardware requirements a bit. On the other hand,
> > it most of the cores are idle most of the time, the 1 core/customer
> > setup would be give better utilization of the hardware.
> >
> > Why do you want to collapse the cores?
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
> >
>

Re: Advise on an architecture with lot of cores

Posted by Manoj Bharadwaj <mb...@gmail.com>.

Hi Toke,

I don't think I answered your question properly.

With the current 1 core/customer setup many cores are idle. The redesign we
are working on will move most of our searches to being driven by SOLR vs
database (current split is 90% database, 10% solr). With that change, all
cores will see traffic.

We have 25G data in the index (across all cores) and they are currently in
a 2 core VM with 32G memory. We are making some changes to the schema and
the analyzers and we see the index size growing by 25% or so due to this.
And to support this we will be moving to a VM with 4 cores and 64G memory.
Hardware as such isn't a constraint.

Regards
Manoj

On Tue, Oct 7, 2014 at 8:47 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> On Tue, 2014-10-07 at 14:27 +0200, Manoj Bharadwaj wrote:
> > My team inherited a SOLR setup with an architecture that has a core for
> > every customer. We have a few different types of cores, say "A", "B", C",
> > and for each one of this there is a core per customer - namely "A1",
> > "A2"..., "B1", "B2"... Overall we have over 600 cores. We don't know the
> > history behind the current design - the exact reasons why it was done the
> > way it was done - one probable consideration was to ensure a customer
> data
> > separate from other.
>
> It is not a bad reason. It ensures that ranked search is optimized
> towards each customer's data and makes it easy to manage adding and
> removing customers.
>
> > We want to go to a single core per type architecture, and move on to
> SOLR
> > cloud as well in near future to achieve sharding via the features cloud
> > provides.
>
> If the setup is heavy queried on most of the cores or is there are
> core-spanning searches, collapsing the user-specific cores into fewer
> super-cores might lower hardware requirements a bit. On the other hand,
> it most of the cores are idle most of the time, the 1 core/customer
> setup would be give better utilization of the hardware.
>
> Why do you want to collapse the cores?
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>

Re: Advise on an architecture with lot of cores

Posted by "youknowwho@heroicefforts.net" <yo...@heroicefforts.net>.

"On the other hand,
it [sic] most of the cores are idle most of the time, the 1 core/customer
setup would be give better utilization of the hardware."

This is an important point.  I've seen performance go to hell when 10M, 100M, and 1B cloud collections were consolidated in a hardware constrained environment.  The data belonged to the same customer and there were good reason for this approach.  In our case, we were able to reduce our queries by n-1 (where n is the number of collections consolidated), but the overall query was slower; many seconds vs subsecond.  You won't have that option, but maybe you are in a better place wrt hardware. The newer cloud routing may also play an important role here (maybe someone else could speak to that).  As you alluded earlier, the query generation must be altered to generate a fq security clause (operator precedence is important here).

If search performance is a vital part of your company's service offering, then it's definitely worth the money to collect representative queries and test on alternate hardware before committing your production environment.

Cheers,

-Jess

On October 7, 2014 8:56:46 AM EST, Manoj Bharadwaj <mb...@gmail.com> wrote:
>Hi Toke,
>
>Thank you for your insights.
>
>
>> Why do you want to collapse the cores?
>>
>
>Most of the cores are small and a few big ones make up the bulk. Our
>thinking was that it would be as easy to just have one core. Monitoring
>becomes easy as well (we are using a monitoring tool in which there is
>a
>limit on the number of endpoints that can be monitored, and we are
>considering other monitoring solutions including Sematext).
>
>Regards
>Manoj

-- 
Sent from my mobile. Please excuse my brevity.

Re: Advise on an architecture with lot of cores

Posted by Manoj Bharadwaj <mb...@gmail.com>.

Hi Toke,

Thank you for your insights.


> Why do you want to collapse the cores?
>

Most of the cores are small and a few big ones make up the bulk. Our
thinking was that it would be as easy to just have one core. Monitoring
becomes easy as well (we are using a monitoring tool in which there is a
limit on the number of endpoints that can be monitored, and we are
considering other monitoring solutions including Sematext).

Regards
Manoj

Re: Advise on an architecture with lot of cores

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Tue, 2014-10-07 at 14:27 +0200, Manoj Bharadwaj wrote:
> My team inherited a SOLR setup with an architecture that has a core for
> every customer. We have a few different types of cores, say "A", "B", C",
> and for each one of this there is a core per customer - namely "A1",
> "A2"..., "B1", "B2"... Overall we have over 600 cores. We don't know the
> history behind the current design - the exact reasons why it was done the
> way it was done - one probable consideration was to ensure a customer data
> separate from other.

It is not a bad reason. It ensures that ranked search is optimized
towards each customer's data and makes it easy to manage adding and
removing customers.

> We want to go to a single core per type architecture, and move on to  SOLR
> cloud as well in near future to achieve sharding via the features cloud
> provides.

If the setup is heavy queried on most of the cores or is there are
core-spanning searches, collapsing the user-specific cores into fewer
super-cores might lower hardware requirements a bit. On the other hand,
it most of the cores are idle most of the time, the 1 core/customer
setup would be give better utilization of the hardware.

Why do you want to collapse the cores?

- Toke Eskildsen, State and University Library, Denmark

Re: Advise on an architecture with lot of cores

Posted by Manoj Bharadwaj <mb...@gmail.com>.

Yes, we have plan to eventually have to shard the clusters - that will go
hand in hand with how rest of the system gets partitioned as well (swim
lanes). The other considerations for these lanes will be geo location etc
(in a AWS context, zones in east coast will be used for swim lanes that
cater to customer in that region.

p.s we are not yet in AWS, but it is part of medium term strategy.

On Tue, Oct 7, 2014 at 9:12 AM, Jack Krupansky <ja...@basetechnology.com>
wrote:

> You'll have to do a proof of concept test to determine how many
> collections Solr/SolrCloud can handle.
>
> With a very large number of customers you may have to do sharding of the
> clusters themselves - limit each cluster to however many
> customers/colllections work well (100? 250?) and then have separate
> clusters for larger groups of customers, maybe with a smaller cluster with
> a collection that maps the customer ID to a Solr cluster, and then the
> application layer can direct requests to the Solr cluster that owns that
> customer.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Manoj Bharadwaj
> Sent: Tuesday, October 7, 2014 8:27 AM
> To: solr-user@lucene.apache.org
> Subject: Advise on an architecture with lot of cores
>
>
> Hi folks,
>
> My team inherited a SOLR setup with an architecture that has a core for
> every customer. We have a few different types of cores, say "A", "B", C",
> and for each one of this there is a core per customer - namely "A1",
> "A2"..., "B1", "B2"... Overall we have over 600 cores. We don't know the
> history behind the current design - the exact reasons why it was done the
> way it was done - one probable consideration was to ensure a customer data
> separate from other.
>
> We want to go to a single core per type architecture, and move on to  SOLR
> cloud as well in near future to achieve sharding via the features cloud
> provides.
>
> Further aspects such as monitoring become easier as well. We will need to
> watch and tune the caches for the different pattern of hits that we see.
>
> Is there anything else to evaluate before we move to a single core per type
> setup?
>
> We are using 4.4.0 currently and will be moving to latest 4.10.1 as a part
> of the redesign as well.
>
> Regards
> Manoj
>

Re: Advise on an architecture with lot of cores

Posted by Jack Krupansky <ja...@basetechnology.com>.

You'll have to do a proof of concept test to determine how many collections 
Solr/SolrCloud can handle.

With a very large number of customers you may have to do sharding of the 
clusters themselves - limit each cluster to however many 
customers/colllections work well (100? 250?) and then have separate clusters 
for larger groups of customers, maybe with a smaller cluster with a 
collection that maps the customer ID to a Solr cluster, and then the 
application layer can direct requests to the Solr cluster that owns that 
customer.

-- Jack Krupansky

-----Original Message----- 
From: Manoj Bharadwaj
Sent: Tuesday, October 7, 2014 8:27 AM
To: solr-user@lucene.apache.org
Subject: Advise on an architecture with lot of cores

Hi folks,

My team inherited a SOLR setup with an architecture that has a core for
every customer. We have a few different types of cores, say "A", "B", C",
and for each one of this there is a core per customer - namely "A1",
"A2"..., "B1", "B2"... Overall we have over 600 cores. We don't know the
history behind the current design - the exact reasons why it was done the
way it was done - one probable consideration was to ensure a customer data
separate from other.

We want to go to a single core per type architecture, and move on to  SOLR
cloud as well in near future to achieve sharding via the features cloud
provides.

Further aspects such as monitoring become easier as well. We will need to
watch and tune the caches for the different pattern of hits that we see.

Is there anything else to evaluate before we move to a single core per type
setup?

We are using 4.4.0 currently and will be moving to latest 4.10.1 as a part
of the redesign as well.

Regards
Manoj