You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by SOLR4189 <Kl...@yandex.ru> on 2018/01/27 10:08:07 UTC
Using replicas in SOLR-6.5.1
I use SOLR-6.5.1. I would like to use SolrCloud replicas. And I have some
questions:
1) What is the best architecture for this if my collection contains 20
shards, and each shard is in different vm? 40 vms where 20 for leaders and
20 for replicas? Or maybe stay with 20 vms where leader and replica (of
another leader) in the same vm but to add RAM?
2) What are opened issues about replicas in SOLR-6.5.1 that I need to check?
3) If I use SolrCloud replica, which configuration parameters should I
change? Which can I change?
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Using replicas in SOLR-6.5.1
Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/27/2018 6:53 AM, SOLR4189 wrote:
> 1. You are right, due to memory and garbage collection issues I set each
> shard to different VM. So in my VM I has 50 GB RAM (10 GB for JVM and 40 GB
> for index) and it works good for my using case. Maybe I don't understand
> solr terms, but if you say to set one VM for 20 shards what does it mean? 20
> nodes or 20 JVMs or 20 solr instances on the same virtual server? Can you
> explain what did you mean?
Generally you're going to have one Solr instance per machine, whether
that machine is physical or virtual. One Solr instance can handle many
indexes (cores). Running multiple Solr instances per machine involves a
fair amount of overhead, mostly memory, and isn't recommended except for
some very specific circumstances with *huge* Java heaps.
I can't actually speak for Sameer here, but I think they likely meant
that you would have two Solr instances, each with 20 cores. They would
be in two virtual machines, and ideally, those virtual machines would be
each hosted by a completely separate physical host.
> 2. I speak about like issues: "facet perfomance regression" or "using ltr
> with grouping" or "using timeAllowed with grouping". Something that will
> stop me to use replicas feature. Sometimes I don't understand solr issues,
> for example, if bug is unresolved and affects version 4.10 and fix version
> none, what does it mean? This bug can happen in solr-6.5.1 also?
It's almost impossible to say whether 6.5.1 would be affected using only
the version fields in Jira. Usually if the fix-version is empty, the
issue hasn't been fixed ... but there are sometimes problems which exist
in an older version, but have been fixed by a later change. That later
change might be completely unrelated to the issue, but the developer
just happened to see something in the code they examined that they
didn't like.
What's the issue state? If it is Closed or Resolved (and the resolution
says Fixed), then the fix-version SHOULD indicate which versions the fix
is in or will be in. If it's not resolved/closed, then it's most likely
not fixed at all, no matter what the fix-version field states. As far
as I am aware, the only state transition that can be automatically done
by Jira itself is Fixed->Closed, and even that automatic transition only
takes place with user action -- as part of the release process for a new
version of Lucene/Solr. The general rule (which might not always
happen) is that if the issue is not fixed, fix-version should be empty.
Thanks,
Shawn
Re: Using replicas in SOLR-6.5.1
Posted by SOLR4189 <Kl...@yandex.ru>.
1. You are right, due to memory and garbage collection issues I set each
shard to different VM. So in my VM I has 50 GB RAM (10 GB for JVM and 40 GB
for index) and it works good for my using case. Maybe I don't understand
solr terms, but if you say to set one VM for 20 shards what does it mean? 20
nodes or 20 JVMs or 20 solr instances on the same virtual server? Can you
explain what did you mean?
2. I speak about like issues: "facet perfomance regression" or "using ltr
with grouping" or "using timeAllowed with grouping". Something that will
stop me to use replicas feature. Sometimes I don't understand solr issues,
for example, if bug is unresolved and affects version 4.10 and fix version
none, what does it mean? This bug can happen in solr-6.5.1 also?
3. Yes, I'm familiar with the Solr Collection API.
I preferred to set each shard to different small VMs.
Just make sure with you *one solr node = one JVM = one solr instance = one
or many shards?
*
Thank you.
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Using replicas in SOLR-6.5.1
Posted by Sameer Maggon <sa...@searchstax.com>.
1. You could just have 2 VMs, one has all 20 shards of your collection, the
other one has the replicas for those shards. In this scenario, if one VM is
not available, you still have application availability as at least one
replica is available for each shard. This assumes that your VM can fit all
the data in one VM (all 20 shards) without compromising on performance or
getting into memory or garbage collection issues (I am not sure what the
size of your collection or shards is). For additional redundancy, you can
add another VM and add another replica for for all your shards.
2. Can you provide more specifics around what sort of issues are you
thinking of? Replication in general is pretty solid in the version you are
talking about. You could comb through JIRA (
https://issues.apache.org/jira/browse/SOLR-5821?jql=project%20%3D%20SOLR%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20text%20~%20%22replica%22
)
3. I would recommend you take a look at the Solr Collection API (
https://lucene.apache.org/solr/guide/6_6/collections-api.html). Parameters
that you want to pay more attention to are "replicationFactor", "numShards"
and "maxShardsPerNode" that relate to the shards and replicas.
If you have a use case that warrants you to go beyond the above scenario of
having all shards on the same VM, then you should read more into
"maxShardsPerNode", etc. - but perhaps you can share a bit more around that
use that.
Thanks,
--
Sameer Maggon
https://www.searchstax.com | Solr-as-as-Service platform on AWS, Azure and
GCP
On Sat, Jan 27, 2018 at 2:08 AM, SOLR4189 <Kl...@yandex.ru> wrote:
> I use SOLR-6.5.1. I would like to use SolrCloud replicas. And I have some
> questions:
>
> 1) What is the best architecture for this if my collection contains 20
> shards, and each shard is in different vm? 40 vms where 20 for leaders and
> 20 for replicas? Or maybe stay with 20 vms where leader and replica (of
> another leader) in the same vm but to add RAM?
>
> 2) What are opened issues about replicas in SOLR-6.5.1 that I need to
> check?
>
> 3) If I use SolrCloud replica, which configuration parameters should I
> change? Which can I change?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html