You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Simpson <ch...@outlook.com> on 2013/03/01 01:10:35 UTC

Solr 4.1 Solr Cloud Shard Structure

Dear Lucene / Solr Community-

I recently posted this question on Stackoverflow, but it doesnt seem to be going too far. Then I found this mailing list and was hoping perhaps to have more luck:

Question-

If I plan on holding 7TB of data in a Solr Cloud, is it bad practice to begin with 1 server holding 100 shards and then begin populating the collection where once the size grew, each shard ultimately will be peeled off into its own dedicated server (holding ~70GB ea with its own dedicated resources and replicas)?

That is, I would start the collection with 100 shards locally, then as data grew, I could peel off one shard at a time and give it its own server -- dedicated w/plenty of resources.

Is this okay to do -- or would I somehow incur a massive bottleneck internally by putting that many shards in 1 server to start with while data was low?

Thank you.
Chris
 
 		 	   		  

Re: Solr 4.1 Solr Cloud Shard Structure

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Chris,

I started a discussion on this topic on the ElasticSearch mailing list the
other day.  As soon as SolrCloud get index alias functionality (JIRA for it
exists) I believe the same approach to cluster expansion will be applicable
to SolrCloud as what can be done with ES today:

http://search-lucene.com/m/RZYhi2ydnXD1&subj=Alternatives+to+oversharding+to+handle+index+cluster+growth+

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Feb 28, 2013 at 7:10 PM, Chris Simpson <chrissimpson1223@outlook.com
> wrote:

> Dear Lucene / Solr Community-
>
> I recently posted this question on Stackoverflow, but it doesnt seem to be
> going too far. Then I found this mailing list and was hoping perhaps to
> have more luck:
>
> Question-
>
> If I plan on holding 7TB of data in a Solr Cloud, is it bad practice to
> begin with 1 server holding 100 shards and then begin populating the
> collection where once the size grew, each shard ultimately will be peeled
> off into its own dedicated server (holding ~70GB ea with its own dedicated
> resources and replicas)?
>
> That is, I would start the collection with 100 shards locally, then as
> data grew, I could peel off one shard at a time and give it its own server
> -- dedicated w/plenty of resources.
>
> Is this okay to do -- or would I somehow incur a massive bottleneck
> internally by putting that many shards in 1 server to start with while data
> was low?
>
> Thank you.
> Chris
>
>

Re: Solr 4.1 Solr Cloud Shard Structure

Posted by Mark Miller <ma...@gmail.com>.
On Feb 28, 2013, at 7:55 PM, Walter Underwood <wu...@wunderwood.org> wrote:

> 100 shards on a node will almost certainly be slow

I think it depends on some things - with one of the largest of those things being your hardware. Many have found that you can get much better performance out of super concurrent, beefy hardware using more cores on a single node. So there will be some give and take that are tough jump to conclusions about. Slower at 100, I would assume yes, slow, depends.

One thing that will happen is that you will require a lot more threads…

You would want some pretty beefy hardware.

But you don't have to do 100 either. That should just be a rough starting number. At some point you have to reindex into a new cluster if you keep growing. Or consider shard splitting if its feasible (and becomes available). You can only over shard so much.

So perhaps you do 50 or whatever. It will be faster than you think I imagine. My main concern is the number of threads - might want to mess with Xss to minimize their ram usage at least.

- Mark

Re: Solr 4.1 Solr Cloud Shard Structure

Posted by Walter Underwood <wu...@wunderwood.org>.
100 shards on a node will almost certainly be slow, but at least it would be scalable. 7TB of data on one node is going to be slow regardless of how you shard it.

I might choose a number with more useful divisors than 100, perhaps 96 or 144.

wunder

On Feb 28, 2013, at 4:25 PM, Mark Miller wrote:

> You will pay some in performance, but it's certainly not bad practice. It's a good choice for setting up so that you can scale later. You just have to do some testing to make sure it fits your requirments. The Collections API even has built in support for this - you can specify more shards than nodes and it will overload a node. See the documentation. Later you can start up a new replica on another machine and kill/remove the original.
> 
> - Mark
> 
> On Feb 28, 2013, at 7:10 PM, Chris Simpson <ch...@outlook.com> wrote:
> 
>> Dear Lucene / Solr Community-
>> 
>> I recently posted this question on Stackoverflow, but it doesnt seem to be going too far. Then I found this mailing list and was hoping perhaps to have more luck:
>> 
>> Question-
>> 
>> If I plan on holding 7TB of data in a Solr Cloud, is it bad practice to begin with 1 server holding 100 shards and then begin populating the collection where once the size grew, each shard ultimately will be peeled off into its own dedicated server (holding ~70GB ea with its own dedicated resources and replicas)?
>> 
>> That is, I would start the collection with 100 shards locally, then as data grew, I could peel off one shard at a time and give it its own server -- dedicated w/plenty of resources.
>> 
>> Is this okay to do -- or would I somehow incur a massive bottleneck internally by putting that many shards in 1 server to start with while data was low?
>> 
>> Thank you.
>> Chris
>> 





Re: Solr 4.1 Solr Cloud Shard Structure

Posted by Mark Miller <ma...@gmail.com>.
You will pay some in performance, but it's certainly not bad practice. It's a good choice for setting up so that you can scale later. You just have to do some testing to make sure it fits your requirments. The Collections API even has built in support for this - you can specify more shards than nodes and it will overload a node. See the documentation. Later you can start up a new replica on another machine and kill/remove the original.

- Mark

On Feb 28, 2013, at 7:10 PM, Chris Simpson <ch...@outlook.com> wrote:

> Dear Lucene / Solr Community-
> 
> I recently posted this question on Stackoverflow, but it doesnt seem to be going too far. Then I found this mailing list and was hoping perhaps to have more luck:
> 
> Question-
> 
> If I plan on holding 7TB of data in a Solr Cloud, is it bad practice to begin with 1 server holding 100 shards and then begin populating the collection where once the size grew, each shard ultimately will be peeled off into its own dedicated server (holding ~70GB ea with its own dedicated resources and replicas)?
> 
> That is, I would start the collection with 100 shards locally, then as data grew, I could peel off one shard at a time and give it its own server -- dedicated w/plenty of resources.
> 
> Is this okay to do -- or would I somehow incur a massive bottleneck internally by putting that many shards in 1 server to start with while data was low?
> 
> Thank you.
> Chris
> 
>