You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wei <we...@gmail.com> on 2018/08/26 06:00:38 UTC

Multiple solr instances per host vs Multiple cores in same solr instance

Hi,

I have a question about the deployment configuration in solr cloud.  When
we need to increase the number of shards in solr cloud, there are two
options:

1.  Run multiple solr instances per host, each with a different port and
hosting a single core for one shard.

2.  Run one solr instance per host, and have multiple cores(shards) in the
same solr instance.

Which would be better performance wise? For the first option I think JVM
size for each solr instance can be smaller, but deployment is more
complicated? Are there any differences for cpu utilization?

Thanks,
Wei

Re: Multiple solr instances per host vs Multiple cores in same solr instance

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

I would start with one instance per host and add more shards to that one. As long as you stay below 32G heap this would be a preferred setup.
It is a common mistake to think that you need more JVM heap than necessary. In fact you should try to minimize your heap and leave more free RAM for OS.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 27. aug. 2018 kl. 05:09 skrev Wei <we...@gmail.com>:
> 
> Thanks Shawn. When using multiple Solr instances per host, is there any way
> to prevent solrcloud from putting multiple replicas of the same shard on
> same host?
> I see it makes sense if we can splitting into multiple instances with
> smaller heap size. Besides that, do you think multiple instances will be
> able to get better CPU utilization on multi-core server?
> 
> Thanks,
> Wei
> 
> On Sun, Aug 26, 2018 at 4:37 AM Shawn Heisey <ap...@elyograg.org> wrote:
> 
>> On 8/26/2018 12:00 AM, Wei wrote:
>>> I have a question about the deployment configuration in solr cloud.  When
>>> we need to increase the number of shards in solr cloud, there are two
>>> options:
>>> 
>>> 1.  Run multiple solr instances per host, each with a different port and
>>> hosting a single core for one shard.
>>> 
>>> 2.  Run one solr instance per host, and have multiple cores(shards) in
>> the
>>> same solr instance.
>>> 
>>> Which would be better performance wise? For the first option I think JVM
>>> size for each solr instance can be smaller, but deployment is more
>>> complicated? Are there any differences for cpu utilization?
>> 
>> My general advice is to only have one Solr instance per machine.  One
>> Solr instance can handle many indexes, and usually will do so with less
>> overhead than two or more instances.
>> 
>> I can think of *ONE* exception to this -- when a single Solr instance
>> would require a heap that's extremely large. Splitting that into two or
>> more instances MIGHT greatly reduce garbage collection pauses.  But
>> there's a caveat to the caveat -- in my strong opinion, if your Solr
>> instance is so big that it requires a huge heap and you're considering
>> splitting into multiple Solr instances on one machine, you very likely
>> need to run each of those instances on *separate* machines, so that each
>> one can have access to all the resources of the machine it's running on.
>> 
>> For SolrCloud, when you're running multiple instances per machine, Solr
>> will consider those to be completely separate instances, and you may end
>> up with all of the replicas for a shard on a single machine, which is a
>> problem for high availability.
>> 
>> Thanks,
>> Shawn
>> 
>> 


Re: Multiple solr instances per host vs Multiple cores in same solr instance

Posted by Wei <we...@gmail.com>.
Hi Erick,

I am looking into the rule based replica placement documentation and
confused. How to ensure there are no more than one replica for any shard on
the same host?   There is an example rule  shard:*,replica:<2,node:* seem
to serve the purpose, but  I am not sure if  'node' refer to solr instance
or actual physical host. Is there an example for defining node?

Thanks



On Sun, Aug 26, 2018 at 8:37 PM Erick Erickson <er...@gmail.com>
wrote:

> Yes, you can use the "node placement rules", see:
> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>
> This is a variant of "rack awareness".
>
> Of course the simplest way if you're not doing very many collections is to
> create the collection with the special "EMPTY" createNodeSet then just
> build out your collection with ADDREPLICA, placing each replica on a
> particular node. The idea of that capability was exactly to explicitly
> control
> where each and every replica landed.
>
> As a third alternative, just create the collection and let Solr put
> the replicas where
> it will, then use MOVEREPLICA to position replicas as you want.
>
> The node placement rules are primarily intended for automated or very large
> setups. Manually placing replicas is simpler for limited numbers.
>
> Best,
> Erick
> On Sun, Aug 26, 2018 at 8:10 PM Wei <we...@gmail.com> wrote:
> >
> > Thanks Shawn. When using multiple Solr instances per host, is there any
> way
> > to prevent solrcloud from putting multiple replicas of the same shard on
> > same host?
> > I see it makes sense if we can splitting into multiple instances with
> > smaller heap size. Besides that, do you think multiple instances will be
> > able to get better CPU utilization on multi-core server?
> >
> > Thanks,
> > Wei
> >
> > On Sun, Aug 26, 2018 at 4:37 AM Shawn Heisey <ap...@elyograg.org>
> wrote:
> >
> > > On 8/26/2018 12:00 AM, Wei wrote:
> > > > I have a question about the deployment configuration in solr cloud.
> When
> > > > we need to increase the number of shards in solr cloud, there are two
> > > > options:
> > > >
> > > > 1.  Run multiple solr instances per host, each with a different port
> and
> > > > hosting a single core for one shard.
> > > >
> > > > 2.  Run one solr instance per host, and have multiple cores(shards)
> in
> > > the
> > > > same solr instance.
> > > >
> > > > Which would be better performance wise? For the first option I think
> JVM
> > > > size for each solr instance can be smaller, but deployment is more
> > > > complicated? Are there any differences for cpu utilization?
> > >
> > > My general advice is to only have one Solr instance per machine.  One
> > > Solr instance can handle many indexes, and usually will do so with less
> > > overhead than two or more instances.
> > >
> > > I can think of *ONE* exception to this -- when a single Solr instance
> > > would require a heap that's extremely large. Splitting that into two or
> > > more instances MIGHT greatly reduce garbage collection pauses.  But
> > > there's a caveat to the caveat -- in my strong opinion, if your Solr
> > > instance is so big that it requires a huge heap and you're considering
> > > splitting into multiple Solr instances on one machine, you very likely
> > > need to run each of those instances on *separate* machines, so that
> each
> > > one can have access to all the resources of the machine it's running
> on.
> > >
> > > For SolrCloud, when you're running multiple instances per machine, Solr
> > > will consider those to be completely separate instances, and you may
> end
> > > up with all of the replicas for a shard on a single machine, which is a
> > > problem for high availability.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
>

Re: Multiple solr instances per host vs Multiple cores in same solr instance

Posted by Erick Erickson <er...@gmail.com>.
Yes, you can use the "node placement rules", see:
https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html

This is a variant of "rack awareness".

Of course the simplest way if you're not doing very many collections is to
create the collection with the special "EMPTY" createNodeSet then just
build out your collection with ADDREPLICA, placing each replica on a
particular node. The idea of that capability was exactly to explicitly control
where each and every replica landed.

As a third alternative, just create the collection and let Solr put
the replicas where
it will, then use MOVEREPLICA to position replicas as you want.

The node placement rules are primarily intended for automated or very large
setups. Manually placing replicas is simpler for limited numbers.

Best,
Erick
On Sun, Aug 26, 2018 at 8:10 PM Wei <we...@gmail.com> wrote:
>
> Thanks Shawn. When using multiple Solr instances per host, is there any way
> to prevent solrcloud from putting multiple replicas of the same shard on
> same host?
> I see it makes sense if we can splitting into multiple instances with
> smaller heap size. Besides that, do you think multiple instances will be
> able to get better CPU utilization on multi-core server?
>
> Thanks,
> Wei
>
> On Sun, Aug 26, 2018 at 4:37 AM Shawn Heisey <ap...@elyograg.org> wrote:
>
> > On 8/26/2018 12:00 AM, Wei wrote:
> > > I have a question about the deployment configuration in solr cloud.  When
> > > we need to increase the number of shards in solr cloud, there are two
> > > options:
> > >
> > > 1.  Run multiple solr instances per host, each with a different port and
> > > hosting a single core for one shard.
> > >
> > > 2.  Run one solr instance per host, and have multiple cores(shards) in
> > the
> > > same solr instance.
> > >
> > > Which would be better performance wise? For the first option I think JVM
> > > size for each solr instance can be smaller, but deployment is more
> > > complicated? Are there any differences for cpu utilization?
> >
> > My general advice is to only have one Solr instance per machine.  One
> > Solr instance can handle many indexes, and usually will do so with less
> > overhead than two or more instances.
> >
> > I can think of *ONE* exception to this -- when a single Solr instance
> > would require a heap that's extremely large. Splitting that into two or
> > more instances MIGHT greatly reduce garbage collection pauses.  But
> > there's a caveat to the caveat -- in my strong opinion, if your Solr
> > instance is so big that it requires a huge heap and you're considering
> > splitting into multiple Solr instances on one machine, you very likely
> > need to run each of those instances on *separate* machines, so that each
> > one can have access to all the resources of the machine it's running on.
> >
> > For SolrCloud, when you're running multiple instances per machine, Solr
> > will consider those to be completely separate instances, and you may end
> > up with all of the replicas for a shard on a single machine, which is a
> > problem for high availability.
> >
> > Thanks,
> > Shawn
> >
> >

Re: Multiple solr instances per host vs Multiple cores in same solr instance

Posted by Wei <we...@gmail.com>.
Thanks Shawn. When using multiple Solr instances per host, is there any way
to prevent solrcloud from putting multiple replicas of the same shard on
same host?
I see it makes sense if we can splitting into multiple instances with
smaller heap size. Besides that, do you think multiple instances will be
able to get better CPU utilization on multi-core server?

Thanks,
Wei

On Sun, Aug 26, 2018 at 4:37 AM Shawn Heisey <ap...@elyograg.org> wrote:

> On 8/26/2018 12:00 AM, Wei wrote:
> > I have a question about the deployment configuration in solr cloud.  When
> > we need to increase the number of shards in solr cloud, there are two
> > options:
> >
> > 1.  Run multiple solr instances per host, each with a different port and
> > hosting a single core for one shard.
> >
> > 2.  Run one solr instance per host, and have multiple cores(shards) in
> the
> > same solr instance.
> >
> > Which would be better performance wise? For the first option I think JVM
> > size for each solr instance can be smaller, but deployment is more
> > complicated? Are there any differences for cpu utilization?
>
> My general advice is to only have one Solr instance per machine.  One
> Solr instance can handle many indexes, and usually will do so with less
> overhead than two or more instances.
>
> I can think of *ONE* exception to this -- when a single Solr instance
> would require a heap that's extremely large. Splitting that into two or
> more instances MIGHT greatly reduce garbage collection pauses.  But
> there's a caveat to the caveat -- in my strong opinion, if your Solr
> instance is so big that it requires a huge heap and you're considering
> splitting into multiple Solr instances on one machine, you very likely
> need to run each of those instances on *separate* machines, so that each
> one can have access to all the resources of the machine it's running on.
>
> For SolrCloud, when you're running multiple instances per machine, Solr
> will consider those to be completely separate instances, and you may end
> up with all of the replicas for a shard on a single machine, which is a
> problem for high availability.
>
> Thanks,
> Shawn
>
>

Re: Multiple solr instances per host vs Multiple cores in same solr instance

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/26/2018 12:00 AM, Wei wrote:
> I have a question about the deployment configuration in solr cloud.  When
> we need to increase the number of shards in solr cloud, there are two
> options:
>
> 1.  Run multiple solr instances per host, each with a different port and
> hosting a single core for one shard.
>
> 2.  Run one solr instance per host, and have multiple cores(shards) in the
> same solr instance.
>
> Which would be better performance wise? For the first option I think JVM
> size for each solr instance can be smaller, but deployment is more
> complicated? Are there any differences for cpu utilization?

My general advice is to only have one Solr instance per machine.  One 
Solr instance can handle many indexes, and usually will do so with less 
overhead than two or more instances.

I can think of *ONE* exception to this -- when a single Solr instance 
would require a heap that's extremely large. Splitting that into two or 
more instances MIGHT greatly reduce garbage collection pauses.  But 
there's a caveat to the caveat -- in my strong opinion, if your Solr 
instance is so big that it requires a huge heap and you're considering 
splitting into multiple Solr instances on one machine, you very likely 
need to run each of those instances on *separate* machines, so that each 
one can have access to all the resources of the machine it's running on.

For SolrCloud, when you're running multiple instances per machine, Solr 
will consider those to be completely separate instances, and you may end 
up with all of the replicas for a shard on a single machine, which is a 
problem for high availability.

Thanks,
Shawn