You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by SOLR4189 <Kl...@yandex.ru> on 2018/01/27 10:08:07 UTC

Using replicas in SOLR-6.5.1

I use SOLR-6.5.1. I would like to use SolrCloud replicas. And I have some
questions:

1) What is the best architecture for this if my collection contains 20
shards, and each shard is in different vm? 40 vms where 20 for leaders and
20 for replicas? Or maybe stay with 20 vms where leader and replica (of
another leader) in the same vm but to add RAM?

2) What are opened issues about replicas in SOLR-6.5.1 that I need to check?

3) If I use SolrCloud replica, which configuration parameters should I
change? Which can I change?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Using replicas in SOLR-6.5.1

Posted by Shawn Heisey <ap...@elyograg.org>.

On 1/27/2018 6:53 AM, SOLR4189 wrote:
> 1. You are right, due to memory and garbage collection issues I set each
> shard to different VM. So in my VM I has 50 GB RAM (10 GB for JVM and 40 GB
> for index) and it works good for my using case. Maybe I don't understand
> solr terms, but if you say to set one VM for 20 shards what does it mean? 20
> nodes or 20 JVMs or 20 solr instances on the same virtual server? Can you
> explain what did you mean?

Generally you're going to have one Solr instance per machine, whether 
that machine is physical or virtual.  One Solr instance can handle many 
indexes (cores).  Running multiple Solr instances per machine involves a 
fair amount of overhead, mostly memory, and isn't recommended except for 
some very specific circumstances with *huge* Java heaps.

I can't actually speak for Sameer here, but I think they likely meant 
that you would have two Solr instances, each with 20 cores.  They would 
be in two virtual machines, and ideally, those virtual machines would be 
each hosted by a completely separate physical host.

> 2. I speak about like issues: "facet perfomance regression" or "using ltr
> with grouping" or "using timeAllowed with grouping". Something that will
> stop me to use replicas feature. Sometimes I don't understand solr issues,
> for example, if bug is unresolved and affects version 4.10 and fix version
> none, what does it mean? This bug can happen in solr-6.5.1 also?

It's almost impossible to say whether 6.5.1 would be affected using only 
the version fields in Jira.  Usually if the fix-version is empty, the 
issue hasn't been fixed ... but there are sometimes problems which exist 
in an older version, but have been fixed by a later change.  That later 
change might be completely unrelated to the issue, but the developer 
just happened to see something in the code they examined that they 
didn't like.

What's the issue state?  If it is Closed or Resolved (and the resolution 
says Fixed), then the fix-version SHOULD indicate which versions the fix 
is in or will be in.  If it's not resolved/closed, then it's most likely 
not fixed at all, no matter what the fix-version field states.  As far 
as I am aware, the only state transition that can be automatically done 
by Jira itself is Fixed->Closed, and even that automatic transition only 
takes place with user action -- as part of the release process for a new 
version of Lucene/Solr.  The general rule (which might not always 
happen) is that if the issue is not fixed, fix-version should be empty.

Thanks,
Shawn

Re: Using replicas in SOLR-6.5.1

Posted by SOLR4189 <Kl...@yandex.ru>.

1. You are right, due to memory and garbage collection issues I set each
shard to different VM. So in my VM I has 50 GB RAM (10 GB for JVM and 40 GB
for index) and it works good for my using case. Maybe I don't understand
solr terms, but if you say to set one VM for 20 shards what does it mean? 20
nodes or 20 JVMs or 20 solr instances on the same virtual server? Can you
explain what did you mean?

2. I speak about like issues: "facet perfomance regression" or "using ltr
with grouping" or "using timeAllowed with grouping". Something that will
stop me to use replicas feature. Sometimes I don't understand solr issues,
for example, if bug is unresolved and affects version 4.10 and fix version
none, what does it mean? This bug can happen in solr-6.5.1 also?

3. Yes, I'm familiar with the Solr Collection API.

I preferred to set each shard to different small VMs. 

Just make sure with you *one solr node = one JVM = one solr instance = one
or many shards?
*

Thank you.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Using replicas in SOLR-6.5.1

Posted by Sameer Maggon <sa...@searchstax.com>.

1. You could just have 2 VMs, one has all 20 shards of your collection, the
other one has the replicas for those shards. In this scenario, if one VM is
not available, you still have application availability as at least one
replica is available for each shard. This assumes that your VM can fit all
the data in one VM (all 20 shards) without compromising on performance or
getting into memory or garbage collection issues (I am not sure what the
size of your collection or shards is). For additional redundancy, you can
add another VM and add another replica for for all your shards.

2. Can you provide more specifics around what sort of issues are you
thinking of? Replication in general is pretty solid in the version you are
talking about. You could comb through JIRA (
https://issues.apache.org/jira/browse/SOLR-5821?jql=project%20%3D%20SOLR%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20text%20~%20%22replica%22
)

3. I would recommend you take a look at the Solr Collection API (
https://lucene.apache.org/solr/guide/6_6/collections-api.html). Parameters
that you want to pay more attention to are "replicationFactor", "numShards"
and "maxShardsPerNode" that relate to the shards and replicas.

If you have a use case that warrants you to go beyond the above scenario of
having all shards on the same VM, then you should read more into
"maxShardsPerNode", etc. - but perhaps you can share a bit more around that
use that.

Thanks,
-- 
Sameer Maggon
https://www.searchstax.com | Solr-as-as-Service platform on AWS, Azure and
GCP

On Sat, Jan 27, 2018 at 2:08 AM, SOLR4189 <Kl...@yandex.ru> wrote:

> I use SOLR-6.5.1. I would like to use SolrCloud replicas. And I have some
> questions:
>
> 1) What is the best architecture for this if my collection contains 20
> shards, and each shard is in different vm? 40 vms where 20 for leaders and
> 20 for replicas? Or maybe stay with 20 vms where leader and replica (of
> another leader) in the same vm but to add RAM?
>
> 2) What are opened issues about replicas in SOLR-6.5.1 that I need to
> check?
>
> 3) If I use SolrCloud replica, which configuration parameters should I
> change? Which can I change?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html