You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by Edward Turner <ed...@gmail.com> on 2021/05/10 13:44:51 UTC

Dedicated nodes for a specific collection in Solrcloud

Hi all,

Question in brief: in Solrcloud, how can we assign specific nodes to serve
a collection, given that our cloud is deployed using the backup/restore
feature?

We are using Solrcloud 8.5.2 with 5 nodes, serving 13 collections. The
majority of these collections are not large, but one of them is large and
is the most important one in our application. This important collection has
5 shards about 60 GB in size (more than double the largest other
collection) with about 330 M documents. Our nodes have 32 GB RAM and 8
CPUs. We use NFS, but when we go into production, we will have dedicated
SSDs available to us.

Since our most important collection is being used by users much more than
the other collections, we think it makes sense to serve its data from
specific nodes, which are not used to serve any other collection. So far,
we have not needed to do this, but we do now as we're seeing some
performance issues ...

The caveat is that we utilise the backup/restore feature of Solrcloud to
deploy our application on different data-centres.

We've read briefly about the Autoscaling features of Solrcloud, but have
not yet made use of them. However, I can't see whether Autoscaling allows
us to dedicate specific nodes to a collection.

Does anyone with any experience with this have any advice for us?

Kind regards,

Edd
--------------------
Edward Turner

Re: Dedicated nodes for a specific collection in Solrcloud

Posted by Gus Heck <gu...@gmail.com>.

Also if you go with custom code using MOVEREPLICA use the docs from 8.6
which are much more complete and accurate (after I tripped on some features
not well described, got irritated and took out my frustrations on the
documentation :) ). There is probably nothing in 8.6 that differs from 8.5
for this feature aside from better docs.

On Fri, May 14, 2021 at 9:55 AM Jan Høydahl <ja...@cominvent.com> wrote:

> Hi,
>
> Autoscaling could perhaps be used, since you could label your nodes and
> make some rules.
>
> But beware that Autoscaling is deprecated and is replaced with a new
> framework / replica placement plugin from 9.0
> See
> http://www.cominvent.com/pub/solr-9-docs/core/org/apache/solr/cluster/placement/PlacementPlugin.html
> for a documentation of this Interface.
> Also see draft ref-guide
> https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/replica-placement-plugins.html
>
> What you could then do once you start using 9.0 is write your own
> placement plugin which, depending on env.variables / sys.props set on each
> node, can decide how to place collections there.
> You could e.g. have a rule that would reserve nodes tagged as
> "data-intensive" for collections with a collection-property
> "require-data-intensive", and place other replicas across the remaining
> nodes...
>
> Another option until you get there would be to write your own client code
> which periodically inspects the clusterstate from the outside, and then
> issues MOVEREPLICA commands until it is happy. It would not prevent the
> large collection ending up on the wrong nodes after a RESTORE, but it would
> make sure they are moved immediately.
>
> Hope this gives some food for thought :)
>
> Jan
>
> > 10. mai 2021 kl. 15:44 skrev Edward Turner <ed...@gmail.com>:
> >
> > Hi all,
> >
> > Question in brief: in Solrcloud, how can we assign specific nodes to
> serve
> > a collection, given that our cloud is deployed using the backup/restore
> > feature?
> >
> > We are using Solrcloud 8.5.2 with 5 nodes, serving 13 collections. The
> > majority of these collections are not large, but one of them is large and
> > is the most important one in our application. This important collection
> has
> > 5 shards about 60 GB in size (more than double the largest other
> > collection) with about 330 M documents. Our nodes have 32 GB RAM and 8
> > CPUs. We use NFS, but when we go into production, we will have dedicated
> > SSDs available to us.
> >
> > Since our most important collection is being used by users much more than
> > the other collections, we think it makes sense to serve its data from
> > specific nodes, which are not used to serve any other collection. So far,
> > we have not needed to do this, but we do now as we're seeing some
> > performance issues ...
> >
> > The caveat is that we utilise the backup/restore feature of Solrcloud to
> > deploy our application on different data-centres.
> >
> > We've read briefly about the Autoscaling features of Solrcloud, but have
> > not yet made use of them. However, I can't see whether Autoscaling allows
> > us to dedicate specific nodes to a collection.
> >
> > Does anyone with any experience with this have any advice for us?
> >
> > Kind regards,
> >
> > Edd
> > --------------------
> > Edward Turner
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Dedicated nodes for a specific collection in Solrcloud

Posted by Jan Høydahl <ja...@cominvent.com>.

Hi,

Autoscaling could perhaps be used, since you could label your nodes and make some rules.

But beware that Autoscaling is deprecated and is replaced with a new framework / replica placement plugin from 9.0
See http://www.cominvent.com/pub/solr-9-docs/core/org/apache/solr/cluster/placement/PlacementPlugin.html for a documentation of this Interface.
Also see draft ref-guide https://nightlies.apache.org/solr/draft-guides/solr-reference-guide-main/replica-placement-plugins.html

What you could then do once you start using 9.0 is write your own placement plugin which, depending on env.variables / sys.props set on each node, can decide how to place collections there.
You could e.g. have a rule that would reserve nodes tagged as "data-intensive" for collections with a collection-property "require-data-intensive", and place other replicas across the remaining nodes...

Another option until you get there would be to write your own client code which periodically inspects the clusterstate from the outside, and then issues MOVEREPLICA commands until it is happy. It would not prevent the large collection ending up on the wrong nodes after a RESTORE, but it would make sure they are moved immediately.

Hope this gives some food for thought :)

Jan

> 10. mai 2021 kl. 15:44 skrev Edward Turner <ed...@gmail.com>:
> 
> Hi all,
> 
> Question in brief: in Solrcloud, how can we assign specific nodes to serve
> a collection, given that our cloud is deployed using the backup/restore
> feature?
> 
> We are using Solrcloud 8.5.2 with 5 nodes, serving 13 collections. The
> majority of these collections are not large, but one of them is large and
> is the most important one in our application. This important collection has
> 5 shards about 60 GB in size (more than double the largest other
> collection) with about 330 M documents. Our nodes have 32 GB RAM and 8
> CPUs. We use NFS, but when we go into production, we will have dedicated
> SSDs available to us.
> 
> Since our most important collection is being used by users much more than
> the other collections, we think it makes sense to serve its data from
> specific nodes, which are not used to serve any other collection. So far,
> we have not needed to do this, but we do now as we're seeing some
> performance issues ...
> 
> The caveat is that we utilise the backup/restore feature of Solrcloud to
> deploy our application on different data-centres.
> 
> We've read briefly about the Autoscaling features of Solrcloud, but have
> not yet made use of them. However, I can't see whether Autoscaling allows
> us to dedicate specific nodes to a collection.
> 
> Does anyone with any experience with this have any advice for us?
> 
> Kind regards,
> 
> Edd
> --------------------
> Edward Turner