You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Adrian Liew <ad...@avanade.com> on 2015/11/27 11:23:13 UTC

SolrCloud Shard + Replica on Multiple servers with SolrCloud

Hi all,

I am trying to figure out how to setup 3 shard 3 server setup with a replication factor of 2 with SolrCloud 5.3.0.

In particular trying to follow this setup described in this blog: http://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/

EC2 Instance 1

Shard 1 - Leader  (port 8984 separate drive with 50 GB SSD)
Shard 2 - Leader  (port 8985 separate drive with 50 GB SSD)

EC2 Instance 2

Shard 1 - Replica (port 8984 separate drive with 50 GB SSD)
Shard 2 - Replica (port 8985 separate drive with 50 GB SSD)

EC2 Instance 3

Shard 1 - Replica (port 8984 separate drive with 50 GB SSD)
Shard 2 - Replica (port 8985 separate drive with 50 GB SSD)

Can anyone shed some light on how these can be configured using the SolrCloud collection API or using Solr command line utility to split them on different instances.

As I know there are two approaches to sharding that is "Custom Sharding" and "Automatic Sharding". Which approach suits the use case described above?

Is anyone able to provide pointers from past experience or point me to a good article that describes how this can be setup?

Regards,
Adrian


Re: SolrCloud Shard + Replica on Multiple servers with SolrCloud

Posted by Upayavira <uv...@odoko.co.uk>.
Answers inline

On Tue, Dec 1, 2015, at 06:03 AM, Adrian Liew wrote:
> Hi all,
> 
> Will really like to seek anyone's opinion on my query below. Desperate to
> know if this is possible or if someone is keen to share their thought
> experience.
> 
> Best regards,
> Adrian
> 
> 
> -----Original Message-----
> From: Adrian Liew [mailto:adrian.liew@avanade.com] 
> Sent: Saturday, November 28, 2015 10:38 AM
> To: solr-user@lucene.apache.org
> Subject: RE: SolrCloud Shard + Replica on Multiple servers with SolrCloud
> 
> Hi Upaya,
> 
> I am trying to setup a 3 shard 3 server setup with a replication factor
> of 2 with SolrCloud 5.3.0.
> 
> In particular trying to follow this setup described in this blog:
> http://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/
> 
> Correction to description below:
> 
> EC2 Instance 1
> 
> Shard 1 - Leader  (port 8984 separate drive with 50 GB SSD) Shard 2 -
> Leader  (port 8985 separate drive with 50 GB SSD) - Leader (port 8986
> separate drive with 50 GB SSD)
> 
> EC2 Instance 2
> 
> Shard 1 - Replica (port 8984 separate drive with 50 GB SSD) Shard 2 -
> Replica (port 8985 separate drive with 50 GB SSD) - Replica (port 8986
> separate drive with 50 GB SSD)
> 
> EC2 Instance 3
> 
> Shard 1 - Replica (port 8984 separate drive with 50 GB SSD) Shard 2 -
> Replica (port 8985 separate drive with 50 GB SSD) - Replica (port 8986
> separate drive with 50 GB SSD)
> 
> To your questions
> 
> >>  Why are you running multiple instances on the same host? 
> This was the architecture best practice provided by Lucidworks. For more
> info, you can visit this site,
> http://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/

It seems they use two instances on a node because nodes get two free
40Gb SSD drives. Beyond that, they don't describe the reasoning.

> >> You can host your two replicas inside the same Solr instance.
> I reckon because this avoids the probability of a single shard (its
> leader and replicas) going down in one hit. What happens if on node that
> holds one shard goes down altogether? You will lose a chunk of your
> index. The architecture I mentioned above prevents that from happening. I
> will want my shards to be spread out for HA.

There is no real point hosting two replicas of the same shard on the
same node. Other than that, I'm not sure I see a huge benefit (beyond
the SSD one) of having multiple instances per node.

> >> Also, you should not concern yourself (too much) with which node is the leader as that can change through time.
> I am not concerned as I know this setup will guarantee a leader is in
> place for each shard for fault tolerance situation.

Okay.

> >> How have you come to the conclusion that you need to shard?
> I am preparing a use case for my customer. Haven't arrived yet as to when
> to shard. But I need to setup a demo to show to my customer. I am
> proposing this as an architecture for the long term to them.

Okay.
 
> > As I know there are two approaches to sharding that is "Custom Sharding"
> > and "Automatic Sharding". Which approach suits the use case described 
> > above?
> Do you know this answer?

Generally, I would use the inbuilt sharding functionality, unless you
come up with a good reason why it doesn't work for you.
 
> Do you also have your own opinion on setting up a 3 shard 3 server
> cluster? 

I guess, if you have three shards, then you want one shard per server
obviously. You could just have a replica of each shard on each of your
servers, that way you have 9 cores in total, three per node.  But that
wouldn't make straight-forward use of your to SSDs per instance.

Upayavira

RE: SolrCloud Shard + Replica on Multiple servers with SolrCloud

Posted by Adrian Liew <ad...@avanade.com>.
Hi all,

Will really like to seek anyone's opinion on my query below. Desperate to know if this is possible or if someone is keen to share their thought experience.

Best regards,
Adrian


-----Original Message-----
From: Adrian Liew [mailto:adrian.liew@avanade.com] 
Sent: Saturday, November 28, 2015 10:38 AM
To: solr-user@lucene.apache.org
Subject: RE: SolrCloud Shard + Replica on Multiple servers with SolrCloud

Hi Upaya,

I am trying to setup a 3 shard 3 server setup with a replication factor of 2 with SolrCloud 5.3.0.

In particular trying to follow this setup described in this blog: http://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/

Correction to description below:

EC2 Instance 1

Shard 1 - Leader  (port 8984 separate drive with 50 GB SSD) Shard 2 - Leader  (port 8985 separate drive with 50 GB SSD) - Leader (port 8986 separate drive with 50 GB SSD)

EC2 Instance 2

Shard 1 - Replica (port 8984 separate drive with 50 GB SSD) Shard 2 - Replica (port 8985 separate drive with 50 GB SSD) - Replica (port 8986 separate drive with 50 GB SSD)

EC2 Instance 3

Shard 1 - Replica (port 8984 separate drive with 50 GB SSD) Shard 2 - Replica (port 8985 separate drive with 50 GB SSD) - Replica (port 8986 separate drive with 50 GB SSD)

To your questions

>>  Why are you running multiple instances on the same host? 
This was the architecture best practice provided by Lucidworks. For more info, you can visit this site, http://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/

>> You can host your two replicas inside the same Solr instance.
I reckon because this avoids the probability of a single shard (its leader and replicas) going down in one hit. What happens if on node that holds one shard goes down altogether? You will lose a chunk of your index. The architecture I mentioned above prevents that from happening. I will want my shards to be spread out for HA.

>> Also, you should not concern yourself (too much) with which node is the leader as that can change through time.
I am not concerned as I know this setup will guarantee a leader is in place for each shard for fault tolerance situation.

>> How have you come to the conclusion that you need to shard?
I am preparing a use case for my customer. Haven't arrived yet as to when to shard. But I need to setup a demo to show to my customer. I am proposing this as an architecture for the long term to them.

> As I know there are two approaches to sharding that is "Custom Sharding"
> and "Automatic Sharding". Which approach suits the use case described 
> above?
Do you know this answer?

Do you also have your own opinion on setting up a 3 shard 3 server cluster? 

Regards,
Adrian

-----Original Message-----
From: Upayavira [mailto:uv@odoko.co.uk] 
Sent: Friday, November 27, 2015 9:09 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Shard + Replica on Multiple servers with SolrCloud

Why are you running multiple instances on the same host? You can host your two replicas inside the same Solr instance.

Also, you should not concern yourself (too much) with which node is the leader as that can change through time.

How have you come to the conclusion that you need to shard?

Upayavira

On Fri, Nov 27, 2015, at 10:23 AM, Adrian Liew wrote:
> Hi all,
> 
> I am trying to figure out how to setup 3 shard 3 server setup with a 
> replication factor of 2 with SolrCloud 5.3.0.
> 
> In particular trying to follow this setup described in this blog:
> http://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolk
> it/
> 
> EC2 Instance 1
> 
> Shard 1 - Leader  (port 8984 separate drive with 50 GB SSD) Shard 2 - 
> Leader  (port 8985 separate drive with 50 GB SSD)
> 
> EC2 Instance 2
> 
> Shard 1 - Replica (port 8984 separate drive with 50 GB SSD) Shard 2 - 
> Replica (port 8985 separate drive with 50 GB SSD)
> 
> EC2 Instance 3
> 
> Shard 1 - Replica (port 8984 separate drive with 50 GB SSD) Shard 2 - 
> Replica (port 8985 separate drive with 50 GB SSD)
> 
> Can anyone shed some light on how these can be configured using the 
> SolrCloud collection API or using Solr command line utility to split 
> them on different instances.
> 
> As I know there are two approaches to sharding that is "Custom Sharding"
> and "Automatic Sharding". Which approach suits the use case described 
> above?
> 
> Is anyone able to provide pointers from past experience or point me to 
> a good article that describes how this can be setup?
> 
> Regards,
> Adrian
> 

RE: SolrCloud Shard + Replica on Multiple servers with SolrCloud

Posted by Adrian Liew <ad...@avanade.com>.
Hi Upaya,

I am trying to setup a 3 shard 3 server setup with a replication factor of 2 with SolrCloud 5.3.0.

In particular trying to follow this setup described in this blog: http://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/

Correction to description below:

EC2 Instance 1

Shard 1 - Leader  (port 8984 separate drive with 50 GB SSD) Shard 2 - Leader  (port 8985 separate drive with 50 GB SSD) - Leader (port 8986 separate drive with 50 GB SSD)

EC2 Instance 2

Shard 1 - Replica (port 8984 separate drive with 50 GB SSD) Shard 2 - Replica (port 8985 separate drive with 50 GB SSD) - Replica (port 8986 separate drive with 50 GB SSD)

EC2 Instance 3

Shard 1 - Replica (port 8984 separate drive with 50 GB SSD) Shard 2 - Replica (port 8985 separate drive with 50 GB SSD) - Replica (port 8986 separate drive with 50 GB SSD)

To your questions

>>  Why are you running multiple instances on the same host? 
This was the architecture best practice provided by Lucidworks. For more info, you can visit this site, http://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/

>> You can host your two replicas inside the same Solr instance.
I reckon because this avoids the probability of a single shard (its leader and replicas) going down in one hit. What happens if on node that holds one shard goes down altogether? You will lose a chunk of your index. The architecture I mentioned above prevents that from happening. I will want my shards to be spread out for HA.

>> Also, you should not concern yourself (too much) with which node is the leader as that can change through time.
I am not concerned as I know this setup will guarantee a leader is in place for each shard for fault tolerance situation.

>> How have you come to the conclusion that you need to shard?
I am preparing a use case for my customer. Haven't arrived yet as to when to shard. But I need to setup a demo to show to my customer. I am proposing this as an architecture for the long term to them.

> As I know there are two approaches to sharding that is "Custom Sharding"
> and "Automatic Sharding". Which approach suits the use case described 
> above? 
Do you know this answer?

Do you also have your own opinion on setting up a 3 shard 3 server cluster? 

Regards,
Adrian

-----Original Message-----
From: Upayavira [mailto:uv@odoko.co.uk] 
Sent: Friday, November 27, 2015 9:09 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Shard + Replica on Multiple servers with SolrCloud

Why are you running multiple instances on the same host? You can host your two replicas inside the same Solr instance.

Also, you should not concern yourself (too much) with which node is the leader as that can change through time.

How have you come to the conclusion that you need to shard?

Upayavira

On Fri, Nov 27, 2015, at 10:23 AM, Adrian Liew wrote:
> Hi all,
> 
> I am trying to figure out how to setup 3 shard 3 server setup with a 
> replication factor of 2 with SolrCloud 5.3.0.
> 
> In particular trying to follow this setup described in this blog:
> http://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolk
> it/
> 
> EC2 Instance 1
> 
> Shard 1 - Leader  (port 8984 separate drive with 50 GB SSD) Shard 2 - 
> Leader  (port 8985 separate drive with 50 GB SSD)
> 
> EC2 Instance 2
> 
> Shard 1 - Replica (port 8984 separate drive with 50 GB SSD) Shard 2 - 
> Replica (port 8985 separate drive with 50 GB SSD)
> 
> EC2 Instance 3
> 
> Shard 1 - Replica (port 8984 separate drive with 50 GB SSD) Shard 2 - 
> Replica (port 8985 separate drive with 50 GB SSD)
> 
> Can anyone shed some light on how these can be configured using the 
> SolrCloud collection API or using Solr command line utility to split 
> them on different instances.
> 
> As I know there are two approaches to sharding that is "Custom Sharding"
> and "Automatic Sharding". Which approach suits the use case described 
> above?
> 
> Is anyone able to provide pointers from past experience or point me to 
> a good article that describes how this can be setup?
> 
> Regards,
> Adrian
> 

Re: SolrCloud Shard + Replica on Multiple servers with SolrCloud

Posted by Upayavira <uv...@odoko.co.uk>.
Why are you running multiple instances on the same host? You can host
your two replicas inside the same Solr instance.

Also, you should not concern yourself (too much) with which node is the
leader as that can change through time.

How have you come to the conclusion that you need to shard?

Upayavira

On Fri, Nov 27, 2015, at 10:23 AM, Adrian Liew wrote:
> Hi all,
> 
> I am trying to figure out how to setup 3 shard 3 server setup with a
> replication factor of 2 with SolrCloud 5.3.0.
> 
> In particular trying to follow this setup described in this blog:
> http://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/
> 
> EC2 Instance 1
> 
> Shard 1 - Leader  (port 8984 separate drive with 50 GB SSD)
> Shard 2 - Leader  (port 8985 separate drive with 50 GB SSD)
> 
> EC2 Instance 2
> 
> Shard 1 - Replica (port 8984 separate drive with 50 GB SSD)
> Shard 2 - Replica (port 8985 separate drive with 50 GB SSD)
> 
> EC2 Instance 3
> 
> Shard 1 - Replica (port 8984 separate drive with 50 GB SSD)
> Shard 2 - Replica (port 8985 separate drive with 50 GB SSD)
> 
> Can anyone shed some light on how these can be configured using the
> SolrCloud collection API or using Solr command line utility to split them
> on different instances.
> 
> As I know there are two approaches to sharding that is "Custom Sharding"
> and "Automatic Sharding". Which approach suits the use case described
> above?
> 
> Is anyone able to provide pointers from past experience or point me to a
> good article that describes how this can be setup?
> 
> Regards,
> Adrian
>