You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zane Rockenbaugh <za...@navigo.com> on 2014/07/09 07:09:29 UTC

Planning ahead for Solr Cloud and Scaling

I'm working on a product hosted with AWS that uses Elastic Beanstalk
auto-scaling to good effect and we are trying to set up similar (more or
less) runtime scaling support with Solr. I think I understand how to set
this up, and wanted to check I was on the right track.

We currently run 3 cores on a single host / Solr server / shard. This is
just fine for now, and we have overhead for the near future. However, I
need to have a plan, and then test, for a higher capacity future.

1) I gather that if I set up SolrCloud, and then later load increases, I
can spin up a second host / Solr server, create a new shard, and then split
the first shard:

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3

And doing this, we no longer have to commit to shards out of the gate.

2) I'm not clear whether there's a big advantage splitting up the cores or
not. Two of the three cores will have about the same number of documents,
though only one contains large amounts of text. The third core is much
smaller in both bytes and documents (2 orders of magnitude).

3) We are also looking at moving multi-lingual. The current plan is to
store the localized text in fields within the same core. The languages will
be added over time. We can update the schema (as each will be optional).
This seems easier than adding a core for each language. Is there a downside?

Thanks for any pointers.

Re: Planning ahead for Solr Cloud and Scaling

Posted by Timothy Potter <th...@gmail.com>.
Hi Zane,

re 1: as an alternative to shard splitting, you can just overshard the
collection from the start and then migrate existing shards to new
hardware as needed. The migrate can happen online, see collection API
ADDREPLICA. Once the new replica is online on the new hardware, you
can unload the older replica on your original hardware. There are
other benefits to oversharding, such as increased parallelism during
indexing and query execution (provided you have the CPU capacity,
which is typically the case on modern hardware).

re 2: mainly depends on how the Java GC and heap are affected by
colocating the cores on the same JVM ... if heap is stable and the GC
is keeping up and qps / latency times are acceptable, I wouldn't
change it.

re 3: read Trey's chapter 14 in Solr in Action ;-)

Cheers,
Tim

On Tue, Jul 8, 2014 at 10:09 PM, Zane Rockenbaugh <za...@navigo.com> wrote:
> I'm working on a product hosted with AWS that uses Elastic Beanstalk
> auto-scaling to good effect and we are trying to set up similar (more or
> less) runtime scaling support with Solr. I think I understand how to set
> this up, and wanted to check I was on the right track.
>
> We currently run 3 cores on a single host / Solr server / shard. This is
> just fine for now, and we have overhead for the near future. However, I
> need to have a plan, and then test, for a higher capacity future.
>
> 1) I gather that if I set up SolrCloud, and then later load increases, I
> can spin up a second host / Solr server, create a new shard, and then split
> the first shard:
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
>
> And doing this, we no longer have to commit to shards out of the gate.
>
> 2) I'm not clear whether there's a big advantage splitting up the cores or
> not. Two of the three cores will have about the same number of documents,
> though only one contains large amounts of text. The third core is much
> smaller in both bytes and documents (2 orders of magnitude).
>
> 3) We are also looking at moving multi-lingual. The current plan is to
> store the localized text in fields within the same core. The languages will
> be added over time. We can update the schema (as each will be optional).
> This seems easier than adding a core for each language. Is there a downside?
>
> Thanks for any pointers.