You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Greg Roodt <gr...@gmail.com> on 2017/12/19 05:03:27 UTC

Solr 7.1 Solrcloud dynamic/automatic replicas

Hi

Background:
* I am looking to upgrade from Solr 6.1 to Solr 7.1.
* Currently the system is run in cloud mode with a single collection and
single shard per node.
* Currently when a new node is added to the cluster, it becomes a replica
and copies the data / core "automagically".

Question:
Is it possible to have this dynamic / automatic behaviour for repliacs in
Solr 7.1? I've seen mention of autoscale APIs and the Collections API and
also legacyCloud = true. I'm a little confused about what the best approach
is.

Right now, our system is very flexible and we can scale-up by adding new
nodes to the cluster. I would really like to keep this behaviour when we
upgrade to 7.1.

Is anybody able to point me in the right direction or describe how to
achieve this?

Kind Regards
Greg

Re: Solr 7.1 Solrcloud dynamic/automatic replicas

Posted by Erick Erickson <er...@gmail.com>.

If you specify the node parameter for ADDREPLICA I don't think so, but
as you know you have to understand the topology via CLUSTERSTATUS or
some such.

If you don't specify the "node" parameter, I think if you take a look
at the "Rule-based Replica Placement" here:
https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html.
I have to say you'll have to experiment here, I haven't verified that
ADDREPLICA does the right thing here.

Do note that this functionality will be superseded by the "Policy"
framework but the transition is pretty straightforward.

Best,
Erick

On Wed, Dec 20, 2017 at 3:45 PM, Greg Roodt <gr...@gmail.com> wrote:
> Thanks again Erick. It looks like I've got this working.
>
> One final question I think:
> Is there a way to prevent ADDREPLICA from adding another core if a core for
> the collection already exists on the node?
>
> I've noticed that if I call ADDREPLICA twice for the same IP:PORT_solr, I
> get multiple cores. I can probably check `clusterstatus`, but I was
> wondering if there is another way to make the ADDREPLICA call idempotent.
>
>
>
> On 21 December 2017 at 03:27, Erick Erickson <er...@gmail.com>
> wrote:
>
>> The internal method is ZkController.generateNodeName(), although it's
>> fairly simple, there are bunches of samples in ZkControllerTest....
>>
>> But yeah, it requires that you know your hostname and port, and the
>> context is "solr".....
>>
>> On Tue, Dec 19, 2017 at 8:04 PM, Greg Roodt <gr...@gmail.com> wrote:
>> > Ok, thanks. I'll take a look into using the ADDREPLICA API.
>> >
>> > I've found a few examples of the znode format. It seems to be
>> IP:PORT_solr
>> > (where I presume _solr is the name of the context or something?).
>> >
>> > Is there a way to discover what a znode is? i.e. Can my new node
>> determine
>> > what it's znode is? Or is my only option to use the IP:PORT_solr
>> convention?
>> >
>> >
>> >
>> >
>> > On 20 December 2017 at 11:33, Erick Erickson <er...@gmail.com>
>> > wrote:
>> >
>> >> Yes, ADDREPLICA is mostly equivalent, it's also supported going
>> forward....
>> >>
>> >> LegacyCloud should work temporarily, I'd change it going forward though.
>> >>
>> >> Finally, you'll want to add a "node" parameter to insure your replica is
>> >> placed on the exact node you want, see the livenodes znode for the
>> >> format...
>> >>
>> >> On Dec 19, 2017 16:06, "Greg Roodt" <gr...@gmail.com> wrote:
>> >>
>> >> > Thanks for the reply. So it sounds like the method that I'm using to
>> >> > automatically add replicas on Solr 6.2 is not recommended and not
>> going
>> >> to
>> >> > be supported in future versions.
>> >> >
>> >> > A couple of follow up questions then:
>> >> > * Do you know if running with legacyCloud=true will make this
>> behaviour
>> >> > work "for now" until I can find a better way of doing this?
>> >> > * Will it be enough for my newly added nodes to then startup solr
>> (with
>> >> > correct ZK_HOST) and call the ADDREPLICA API as follows?
>> >> > ```
>> >> > curl http://localhost:port
>> >> > /solr/admin/collections?action=ADDREPLICA&collection=
>> blah&shard=*shard1*
>> >> > ```
>> >> > That seems mostly equivalent to writing that core.properties file
>> that I
>> >> am
>> >> > using in 6.2
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On 20 December 2017 at 09:34, Shawn Heisey <ap...@elyograg.org>
>> wrote:
>> >> >
>> >> > > On 12/19/2017 3:06 PM, Greg Roodt wrote:
>> >> > > > Thanks for your reply Erick.
>> >> > > >
>> >> > > > This is what I'm doing at the moment with Solr 6.2 (I was
>> mistaken,
>> >> > > before
>> >> > > > I said 6.1).
>> >> > > >
>> >> > > > 1. A new instance comes online
>> >> > > > 2. Systemd starts solr with a custom start.sh script
>> >> > > > 3. This script creates a core.properties file that looks like
>> this:
>> >> > > > ```
>> >> > > > name=blah
>> >> > > > shard=shard1
>> >> > > > ```
>> >> > > > 4. Script starts solr via the jar.
>> >> > > > ```
>> >> > > > java -DzkHost=....... -jar start.jar
>> >> > > > ```
>> >> > >
>> >> > > The way that we would expect this to be normally done is a little
>> >> > > different.  Adding a node to the cloud normally will NOT copy any
>> >> > > indexes.  You have basically tricked SolrCloud into adding the
>> replica
>> >> > > automatically by creating a core before Solr starts.  SolrCloud
>> >> > > incorporates the new core into the cluster according to the info
>> that
>> >> > > you have put in core.properties, notices that it has no index, and
>> >> > > replicates it from the existing leader.
>> >> > >
>> >> > > Normally, what we would expect for adding a new node is this:
>> >> > >
>> >> > >  * Run the service installer script on the new machine
>> >> > >  * Add a ZK_HOST variable to /etc/default/solr.in.sh
>> >> > >  * Use "service solr restart"to get Solr to join the cloud
>> >> > >  * Call the ADDREPLICA action on the Collections API
>> >> > >
>> >> > > The reason that your method works is that currently, the "truth"
>> about
>> >> > > the cluster is a mixture of what's in ZooKeeper and what's actually
>> >> > > present on each Solr instance.
>> >> > >
>> >> > > There is an effort to change this so that ZooKeeper is the sole
>> source
>> >> > > of truth, and if a core is found that the ZK database doesn't know
>> >> > > about, it won't be started, because it's not a known part of the
>> >> > > cluster.  If this goal is realized in a future version of Solr, then
>> >> the
>> >> > > method you're currently using is not going to work like it does at
>> the
>> >> > > moment.  I do not know how much of this has been done, but I know
>> that
>> >> > > there have been people working on it.
>> >> > >
>> >> > > Thanks,
>> >> > > Shawn
>> >> > >
>> >> > >
>> >> >
>> >>
>>

Re: Solr 7.1 Solrcloud dynamic/automatic replicas

Posted by Greg Roodt <gr...@gmail.com>.

Thanks again Erick. It looks like I've got this working.

One final question I think:
Is there a way to prevent ADDREPLICA from adding another core if a core for
the collection already exists on the node?

I've noticed that if I call ADDREPLICA twice for the same IP:PORT_solr, I
get multiple cores. I can probably check `clusterstatus`, but I was
wondering if there is another way to make the ADDREPLICA call idempotent.



On 21 December 2017 at 03:27, Erick Erickson <er...@gmail.com>
wrote:

> The internal method is ZkController.generateNodeName(), although it's
> fairly simple, there are bunches of samples in ZkControllerTest....
>
> But yeah, it requires that you know your hostname and port, and the
> context is "solr".....
>
> On Tue, Dec 19, 2017 at 8:04 PM, Greg Roodt <gr...@gmail.com> wrote:
> > Ok, thanks. I'll take a look into using the ADDREPLICA API.
> >
> > I've found a few examples of the znode format. It seems to be
> IP:PORT_solr
> > (where I presume _solr is the name of the context or something?).
> >
> > Is there a way to discover what a znode is? i.e. Can my new node
> determine
> > what it's znode is? Or is my only option to use the IP:PORT_solr
> convention?
> >
> >
> >
> >
> > On 20 December 2017 at 11:33, Erick Erickson <er...@gmail.com>
> > wrote:
> >
> >> Yes, ADDREPLICA is mostly equivalent, it's also supported going
> forward....
> >>
> >> LegacyCloud should work temporarily, I'd change it going forward though.
> >>
> >> Finally, you'll want to add a "node" parameter to insure your replica is
> >> placed on the exact node you want, see the livenodes znode for the
> >> format...
> >>
> >> On Dec 19, 2017 16:06, "Greg Roodt" <gr...@gmail.com> wrote:
> >>
> >> > Thanks for the reply. So it sounds like the method that I'm using to
> >> > automatically add replicas on Solr 6.2 is not recommended and not
> going
> >> to
> >> > be supported in future versions.
> >> >
> >> > A couple of follow up questions then:
> >> > * Do you know if running with legacyCloud=true will make this
> behaviour
> >> > work "for now" until I can find a better way of doing this?
> >> > * Will it be enough for my newly added nodes to then startup solr
> (with
> >> > correct ZK_HOST) and call the ADDREPLICA API as follows?
> >> > ```
> >> > curl http://localhost:port
> >> > /solr/admin/collections?action=ADDREPLICA&collection=
> blah&shard=*shard1*
> >> > ```
> >> > That seems mostly equivalent to writing that core.properties file
> that I
> >> am
> >> > using in 6.2
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On 20 December 2017 at 09:34, Shawn Heisey <ap...@elyograg.org>
> wrote:
> >> >
> >> > > On 12/19/2017 3:06 PM, Greg Roodt wrote:
> >> > > > Thanks for your reply Erick.
> >> > > >
> >> > > > This is what I'm doing at the moment with Solr 6.2 (I was
> mistaken,
> >> > > before
> >> > > > I said 6.1).
> >> > > >
> >> > > > 1. A new instance comes online
> >> > > > 2. Systemd starts solr with a custom start.sh script
> >> > > > 3. This script creates a core.properties file that looks like
> this:
> >> > > > ```
> >> > > > name=blah
> >> > > > shard=shard1
> >> > > > ```
> >> > > > 4. Script starts solr via the jar.
> >> > > > ```
> >> > > > java -DzkHost=....... -jar start.jar
> >> > > > ```
> >> > >
> >> > > The way that we would expect this to be normally done is a little
> >> > > different.  Adding a node to the cloud normally will NOT copy any
> >> > > indexes.  You have basically tricked SolrCloud into adding the
> replica
> >> > > automatically by creating a core before Solr starts.  SolrCloud
> >> > > incorporates the new core into the cluster according to the info
> that
> >> > > you have put in core.properties, notices that it has no index, and
> >> > > replicates it from the existing leader.
> >> > >
> >> > > Normally, what we would expect for adding a new node is this:
> >> > >
> >> > >  * Run the service installer script on the new machine
> >> > >  * Add a ZK_HOST variable to /etc/default/solr.in.sh
> >> > >  * Use "service solr restart"to get Solr to join the cloud
> >> > >  * Call the ADDREPLICA action on the Collections API
> >> > >
> >> > > The reason that your method works is that currently, the "truth"
> about
> >> > > the cluster is a mixture of what's in ZooKeeper and what's actually
> >> > > present on each Solr instance.
> >> > >
> >> > > There is an effort to change this so that ZooKeeper is the sole
> source
> >> > > of truth, and if a core is found that the ZK database doesn't know
> >> > > about, it won't be started, because it's not a known part of the
> >> > > cluster.  If this goal is realized in a future version of Solr, then
> >> the
> >> > > method you're currently using is not going to work like it does at
> the
> >> > > moment.  I do not know how much of this has been done, but I know
> that
> >> > > there have been people working on it.
> >> > >
> >> > > Thanks,
> >> > > Shawn
> >> > >
> >> > >
> >> >
> >>
>

Re: Solr 7.1 Solrcloud dynamic/automatic replicas

Posted by Erick Erickson <er...@gmail.com>.

The internal method is ZkController.generateNodeName(), although it's
fairly simple, there are bunches of samples in ZkControllerTest....

But yeah, it requires that you know your hostname and port, and the
context is "solr".....

On Tue, Dec 19, 2017 at 8:04 PM, Greg Roodt <gr...@gmail.com> wrote:
> Ok, thanks. I'll take a look into using the ADDREPLICA API.
>
> I've found a few examples of the znode format. It seems to be IP:PORT_solr
> (where I presume _solr is the name of the context or something?).
>
> Is there a way to discover what a znode is? i.e. Can my new node determine
> what it's znode is? Or is my only option to use the IP:PORT_solr convention?
>
>
>
>
> On 20 December 2017 at 11:33, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Yes, ADDREPLICA is mostly equivalent, it's also supported going forward....
>>
>> LegacyCloud should work temporarily, I'd change it going forward though.
>>
>> Finally, you'll want to add a "node" parameter to insure your replica is
>> placed on the exact node you want, see the livenodes znode for the
>> format...
>>
>> On Dec 19, 2017 16:06, "Greg Roodt" <gr...@gmail.com> wrote:
>>
>> > Thanks for the reply. So it sounds like the method that I'm using to
>> > automatically add replicas on Solr 6.2 is not recommended and not going
>> to
>> > be supported in future versions.
>> >
>> > A couple of follow up questions then:
>> > * Do you know if running with legacyCloud=true will make this behaviour
>> > work "for now" until I can find a better way of doing this?
>> > * Will it be enough for my newly added nodes to then startup solr (with
>> > correct ZK_HOST) and call the ADDREPLICA API as follows?
>> > ```
>> > curl http://localhost:port
>> > /solr/admin/collections?action=ADDREPLICA&collection=blah&shard=*shard1*
>> > ```
>> > That seems mostly equivalent to writing that core.properties file that I
>> am
>> > using in 6.2
>> >
>> >
>> >
>> >
>> >
>> > On 20 December 2017 at 09:34, Shawn Heisey <ap...@elyograg.org> wrote:
>> >
>> > > On 12/19/2017 3:06 PM, Greg Roodt wrote:
>> > > > Thanks for your reply Erick.
>> > > >
>> > > > This is what I'm doing at the moment with Solr 6.2 (I was mistaken,
>> > > before
>> > > > I said 6.1).
>> > > >
>> > > > 1. A new instance comes online
>> > > > 2. Systemd starts solr with a custom start.sh script
>> > > > 3. This script creates a core.properties file that looks like this:
>> > > > ```
>> > > > name=blah
>> > > > shard=shard1
>> > > > ```
>> > > > 4. Script starts solr via the jar.
>> > > > ```
>> > > > java -DzkHost=....... -jar start.jar
>> > > > ```
>> > >
>> > > The way that we would expect this to be normally done is a little
>> > > different.  Adding a node to the cloud normally will NOT copy any
>> > > indexes.  You have basically tricked SolrCloud into adding the replica
>> > > automatically by creating a core before Solr starts.  SolrCloud
>> > > incorporates the new core into the cluster according to the info that
>> > > you have put in core.properties, notices that it has no index, and
>> > > replicates it from the existing leader.
>> > >
>> > > Normally, what we would expect for adding a new node is this:
>> > >
>> > >  * Run the service installer script on the new machine
>> > >  * Add a ZK_HOST variable to /etc/default/solr.in.sh
>> > >  * Use "service solr restart"to get Solr to join the cloud
>> > >  * Call the ADDREPLICA action on the Collections API
>> > >
>> > > The reason that your method works is that currently, the "truth" about
>> > > the cluster is a mixture of what's in ZooKeeper and what's actually
>> > > present on each Solr instance.
>> > >
>> > > There is an effort to change this so that ZooKeeper is the sole source
>> > > of truth, and if a core is found that the ZK database doesn't know
>> > > about, it won't be started, because it's not a known part of the
>> > > cluster.  If this goal is realized in a future version of Solr, then
>> the
>> > > method you're currently using is not going to work like it does at the
>> > > moment.  I do not know how much of this has been done, but I know that
>> > > there have been people working on it.
>> > >
>> > > Thanks,
>> > > Shawn
>> > >
>> > >
>> >
>>

Re: Solr 7.1 Solrcloud dynamic/automatic replicas

Posted by Greg Roodt <gr...@gmail.com>.

Ok, thanks. I'll take a look into using the ADDREPLICA API.

I've found a few examples of the znode format. It seems to be IP:PORT_solr
(where I presume _solr is the name of the context or something?).

Is there a way to discover what a znode is? i.e. Can my new node determine
what it's znode is? Or is my only option to use the IP:PORT_solr convention?




On 20 December 2017 at 11:33, Erick Erickson <er...@gmail.com>
wrote:

> Yes, ADDREPLICA is mostly equivalent, it's also supported going forward....
>
> LegacyCloud should work temporarily, I'd change it going forward though.
>
> Finally, you'll want to add a "node" parameter to insure your replica is
> placed on the exact node you want, see the livenodes znode for the
> format...
>
> On Dec 19, 2017 16:06, "Greg Roodt" <gr...@gmail.com> wrote:
>
> > Thanks for the reply. So it sounds like the method that I'm using to
> > automatically add replicas on Solr 6.2 is not recommended and not going
> to
> > be supported in future versions.
> >
> > A couple of follow up questions then:
> > * Do you know if running with legacyCloud=true will make this behaviour
> > work "for now" until I can find a better way of doing this?
> > * Will it be enough for my newly added nodes to then startup solr (with
> > correct ZK_HOST) and call the ADDREPLICA API as follows?
> > ```
> > curl http://localhost:port
> > /solr/admin/collections?action=ADDREPLICA&collection=blah&shard=*shard1*
> > ```
> > That seems mostly equivalent to writing that core.properties file that I
> am
> > using in 6.2
> >
> >
> >
> >
> >
> > On 20 December 2017 at 09:34, Shawn Heisey <ap...@elyograg.org> wrote:
> >
> > > On 12/19/2017 3:06 PM, Greg Roodt wrote:
> > > > Thanks for your reply Erick.
> > > >
> > > > This is what I'm doing at the moment with Solr 6.2 (I was mistaken,
> > > before
> > > > I said 6.1).
> > > >
> > > > 1. A new instance comes online
> > > > 2. Systemd starts solr with a custom start.sh script
> > > > 3. This script creates a core.properties file that looks like this:
> > > > ```
> > > > name=blah
> > > > shard=shard1
> > > > ```
> > > > 4. Script starts solr via the jar.
> > > > ```
> > > > java -DzkHost=....... -jar start.jar
> > > > ```
> > >
> > > The way that we would expect this to be normally done is a little
> > > different.  Adding a node to the cloud normally will NOT copy any
> > > indexes.  You have basically tricked SolrCloud into adding the replica
> > > automatically by creating a core before Solr starts.  SolrCloud
> > > incorporates the new core into the cluster according to the info that
> > > you have put in core.properties, notices that it has no index, and
> > > replicates it from the existing leader.
> > >
> > > Normally, what we would expect for adding a new node is this:
> > >
> > >  * Run the service installer script on the new machine
> > >  * Add a ZK_HOST variable to /etc/default/solr.in.sh
> > >  * Use "service solr restart"to get Solr to join the cloud
> > >  * Call the ADDREPLICA action on the Collections API
> > >
> > > The reason that your method works is that currently, the "truth" about
> > > the cluster is a mixture of what's in ZooKeeper and what's actually
> > > present on each Solr instance.
> > >
> > > There is an effort to change this so that ZooKeeper is the sole source
> > > of truth, and if a core is found that the ZK database doesn't know
> > > about, it won't be started, because it's not a known part of the
> > > cluster.  If this goal is realized in a future version of Solr, then
> the
> > > method you're currently using is not going to work like it does at the
> > > moment.  I do not know how much of this has been done, but I know that
> > > there have been people working on it.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>

Re: Solr 7.1 Solrcloud dynamic/automatic replicas

Posted by Erick Erickson <er...@gmail.com>.

Yes, ADDREPLICA is mostly equivalent, it's also supported going forward....

LegacyCloud should work temporarily, I'd change it going forward though.

Finally, you'll want to add a "node" parameter to insure your replica is
placed on the exact node you want, see the livenodes znode for the format...

On Dec 19, 2017 16:06, "Greg Roodt" <gr...@gmail.com> wrote:

> Thanks for the reply. So it sounds like the method that I'm using to
> automatically add replicas on Solr 6.2 is not recommended and not going to
> be supported in future versions.
>
> A couple of follow up questions then:
> * Do you know if running with legacyCloud=true will make this behaviour
> work "for now" until I can find a better way of doing this?
> * Will it be enough for my newly added nodes to then startup solr (with
> correct ZK_HOST) and call the ADDREPLICA API as follows?
> ```
> curl http://localhost:port
> /solr/admin/collections?action=ADDREPLICA&collection=blah&shard=*shard1*
> ```
> That seems mostly equivalent to writing that core.properties file that I am
> using in 6.2
>
>
>
>
>
> On 20 December 2017 at 09:34, Shawn Heisey <ap...@elyograg.org> wrote:
>
> > On 12/19/2017 3:06 PM, Greg Roodt wrote:
> > > Thanks for your reply Erick.
> > >
> > > This is what I'm doing at the moment with Solr 6.2 (I was mistaken,
> > before
> > > I said 6.1).
> > >
> > > 1. A new instance comes online
> > > 2. Systemd starts solr with a custom start.sh script
> > > 3. This script creates a core.properties file that looks like this:
> > > ```
> > > name=blah
> > > shard=shard1
> > > ```
> > > 4. Script starts solr via the jar.
> > > ```
> > > java -DzkHost=....... -jar start.jar
> > > ```
> >
> > The way that we would expect this to be normally done is a little
> > different.  Adding a node to the cloud normally will NOT copy any
> > indexes.  You have basically tricked SolrCloud into adding the replica
> > automatically by creating a core before Solr starts.  SolrCloud
> > incorporates the new core into the cluster according to the info that
> > you have put in core.properties, notices that it has no index, and
> > replicates it from the existing leader.
> >
> > Normally, what we would expect for adding a new node is this:
> >
> >  * Run the service installer script on the new machine
> >  * Add a ZK_HOST variable to /etc/default/solr.in.sh
> >  * Use "service solr restart"to get Solr to join the cloud
> >  * Call the ADDREPLICA action on the Collections API
> >
> > The reason that your method works is that currently, the "truth" about
> > the cluster is a mixture of what's in ZooKeeper and what's actually
> > present on each Solr instance.
> >
> > There is an effort to change this so that ZooKeeper is the sole source
> > of truth, and if a core is found that the ZK database doesn't know
> > about, it won't be started, because it's not a known part of the
> > cluster.  If this goal is realized in a future version of Solr, then the
> > method you're currently using is not going to work like it does at the
> > moment.  I do not know how much of this has been done, but I know that
> > there have been people working on it.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: Solr 7.1 Solrcloud dynamic/automatic replicas

Posted by Greg Roodt <gr...@gmail.com>.

Thanks for the reply. So it sounds like the method that I'm using to
automatically add replicas on Solr 6.2 is not recommended and not going to
be supported in future versions.

A couple of follow up questions then:
* Do you know if running with legacyCloud=true will make this behaviour
work "for now" until I can find a better way of doing this?
* Will it be enough for my newly added nodes to then startup solr (with
correct ZK_HOST) and call the ADDREPLICA API as follows?
```
curl http://localhost:port
/solr/admin/collections?action=ADDREPLICA&collection=blah&shard=*shard1*
```
That seems mostly equivalent to writing that core.properties file that I am
using in 6.2





On 20 December 2017 at 09:34, Shawn Heisey <ap...@elyograg.org> wrote:

> On 12/19/2017 3:06 PM, Greg Roodt wrote:
> > Thanks for your reply Erick.
> >
> > This is what I'm doing at the moment with Solr 6.2 (I was mistaken,
> before
> > I said 6.1).
> >
> > 1. A new instance comes online
> > 2. Systemd starts solr with a custom start.sh script
> > 3. This script creates a core.properties file that looks like this:
> > ```
> > name=blah
> > shard=shard1
> > ```
> > 4. Script starts solr via the jar.
> > ```
> > java -DzkHost=....... -jar start.jar
> > ```
>
> The way that we would expect this to be normally done is a little
> different.  Adding a node to the cloud normally will NOT copy any
> indexes.  You have basically tricked SolrCloud into adding the replica
> automatically by creating a core before Solr starts.  SolrCloud
> incorporates the new core into the cluster according to the info that
> you have put in core.properties, notices that it has no index, and
> replicates it from the existing leader.
>
> Normally, what we would expect for adding a new node is this:
>
>  * Run the service installer script on the new machine
>  * Add a ZK_HOST variable to /etc/default/solr.in.sh
>  * Use "service solr restart"to get Solr to join the cloud
>  * Call the ADDREPLICA action on the Collections API
>
> The reason that your method works is that currently, the "truth" about
> the cluster is a mixture of what's in ZooKeeper and what's actually
> present on each Solr instance.
>
> There is an effort to change this so that ZooKeeper is the sole source
> of truth, and if a core is found that the ZK database doesn't know
> about, it won't be started, because it's not a known part of the
> cluster.  If this goal is realized in a future version of Solr, then the
> method you're currently using is not going to work like it does at the
> moment.  I do not know how much of this has been done, but I know that
> there have been people working on it.
>
> Thanks,
> Shawn
>
>

Re: Solr 7.1 Solrcloud dynamic/automatic replicas

Posted by Shawn Heisey <ap...@elyograg.org>.

On 12/19/2017 3:06 PM, Greg Roodt wrote:
> Thanks for your reply Erick.
>
> This is what I'm doing at the moment with Solr 6.2 (I was mistaken, before
> I said 6.1).
>
> 1. A new instance comes online
> 2. Systemd starts solr with a custom start.sh script
> 3. This script creates a core.properties file that looks like this:
> ```
> name=blah
> shard=shard1
> ```
> 4. Script starts solr via the jar.
> ```
> java -DzkHost=....... -jar start.jar
> ```

The way that we would expect this to be normally done is a little
different.  Adding a node to the cloud normally will NOT copy any
indexes.  You have basically tricked SolrCloud into adding the replica
automatically by creating a core before Solr starts.  SolrCloud
incorporates the new core into the cluster according to the info that
you have put in core.properties, notices that it has no index, and
replicates it from the existing leader.

Normally, what we would expect for adding a new node is this:

 * Run the service installer script on the new machine
 * Add a ZK_HOST variable to /etc/default/solr.in.sh
 * Use "service solr restart"to get Solr to join the cloud
 * Call the ADDREPLICA action on the Collections API

The reason that your method works is that currently, the "truth" about
the cluster is a mixture of what's in ZooKeeper and what's actually
present on each Solr instance.

There is an effort to change this so that ZooKeeper is the sole source
of truth, and if a core is found that the ZK database doesn't know
about, it won't be started, because it's not a known part of the
cluster.  If this goal is realized in a future version of Solr, then the
method you're currently using is not going to work like it does at the
moment.  I do not know how much of this has been done, but I know that
there have been people working on it.

Thanks,
Shawn

Re: Solr 7.1 Solrcloud dynamic/automatic replicas

Posted by Greg Roodt <gr...@gmail.com>.

Thanks for your reply Erick.

This is what I'm doing at the moment with Solr 6.2 (I was mistaken, before
I said 6.1).

1. A new instance comes online
2. Systemd starts solr with a custom start.sh script
3. This script creates a core.properties file that looks like this:
```
name=blah
shard=shard1
```
4. Script starts solr via the jar.
```
java -DzkHost=....... -jar start.jar
```

If I understand correctly, what happens is that the node joins the cluster,
and runs a 'repair' from the leader where it copies the core? Once the
repair is done, the node is a healthy member of the cluster?

Is there a way to do something similar in 7.1?

On 19 December 2017 at 16:55, Erick Erickson <er...@gmail.com>
wrote:

> What have you configured to add the replica when a new node is spun up?
>
> If you're just copying the entire directory including the core.properties
> file,
> you're just getting lucky. The legcyCloud=true default is _probably_ adding
> the replica with a new URL and thus making it distinct.
>
> Please detail exactly what you do when you add a new node.....
>
> Best,
> Erick
>
> On Mon, Dec 18, 2017 at 9:03 PM, Greg Roodt <gr...@gmail.com> wrote:
> > Hi
> >
> > Background:
> > * I am looking to upgrade from Solr 6.1 to Solr 7.1.
> > * Currently the system is run in cloud mode with a single collection and
> > single shard per node.
> > * Currently when a new node is added to the cluster, it becomes a replica
> > and copies the data / core "automagically".
> >
> > Question:
> > Is it possible to have this dynamic / automatic behaviour for repliacs in
> > Solr 7.1? I've seen mention of autoscale APIs and the Collections API and
> > also legacyCloud = true. I'm a little confused about what the best
> approach
> > is.
> >
> > Right now, our system is very flexible and we can scale-up by adding new
> > nodes to the cluster. I would really like to keep this behaviour when we
> > upgrade to 7.1.
> >
> > Is anybody able to point me in the right direction or describe how to
> > achieve this?
> >
> > Kind Regards
> > Greg
>

Re: Solr 7.1 Solrcloud dynamic/automatic replicas

Posted by Erick Erickson <er...@gmail.com>.

What have you configured to add the replica when a new node is spun up?

If you're just copying the entire directory including the core.properties file,
you're just getting lucky. The legcyCloud=true default is _probably_ adding
the replica with a new URL and thus making it distinct.

Please detail exactly what you do when you add a new node.....

Best,
Erick

On Mon, Dec 18, 2017 at 9:03 PM, Greg Roodt <gr...@gmail.com> wrote:
> Hi
>
> Background:
> * I am looking to upgrade from Solr 6.1 to Solr 7.1.
> * Currently the system is run in cloud mode with a single collection and
> single shard per node.
> * Currently when a new node is added to the cluster, it becomes a replica
> and copies the data / core "automagically".
>
> Question:
> Is it possible to have this dynamic / automatic behaviour for repliacs in
> Solr 7.1? I've seen mention of autoscale APIs and the Collections API and
> also legacyCloud = true. I'm a little confused about what the best approach
> is.
>
> Right now, our system is very flexible and we can scale-up by adding new
> nodes to the cluster. I would really like to keep this behaviour when we
> upgrade to 7.1.
>
> Is anybody able to point me in the right direction or describe how to
> achieve this?
>
> Kind Regards
> Greg