You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Andy C <an...@gmail.com> on 2016/07/28 15:38:22 UTC

Are there issues with the use of SolrCloud / embedded Zookeeper in non-HA deployments?

We have integrated Solr 5.3.1 into our product. During installation
customers have the option of setting up a single Solr instance, or for high
availability deployments, multiple Solr instances in a master/slave
configuration.

We are looking at migrating to SolrCloud for HA deployments, but are
wondering if it makes sense to also use SolrCloud in non-HA deployments?

Our thought is that this would simplify things. We could use the same
approach for deploying our schema.xml and other configuration files on all
systems, we could always use the SolrJ CloudSolrClient class to communicate
with Solr, etc.

Would it make sense to use the embedded Zookeeper instance in this
situation? I have seen warning that the embedded Zookeeper should not be
used in production deployments, but the reason generally given is that if
Solr goes down Zookeeper will also go down, which doesn't seem relevant
here. Are there other reasons not to use the embedded Zookeeper?

More generally, are there downsides to using SolrCloud with a single
Zookeeper node and single Solr node?

Would appreciate any feedback.

Thanks,
Andy

Re: Are there issues with the use of SolrCloud / embedded Zookeeper in non-HA deployments?

Posted by Shawn Heisey <ap...@elyograg.org>.

On 7/28/2016 9:38 AM, Andy C wrote:
> Would it make sense to use the embedded Zookeeper instance in this
> situation? I have seen warning that the embedded Zookeeper should not
> be used in production deployments, but the reason generally given is
> that if Solr goes down Zookeeper will also go down, which doesn't seem
> relevant here. Are there other reasons not to use the embedded Zookeeper?

The embedded zookeeper uses code copied from a fairly old version of
zookeeper and slightly modified.  This was needed at the time SolrCloud
was created because that version of zookeeper would fail to start if the
"myid" file was missing or didn't contain a valid server ID.  In order
for Solr to be able to control the the embedded ZK sufficiently, it
wasn't possible to include the myid file with Solr, so the hack was needed.

Because SolrCloud uses copied code to parse the zoo.cfg file and start
the embedded zookeeper, it will not support ZK features added after 3.2,
like snapshot auto-purge.

Recently, Zookeeper was changed so it will work without a myid file if
there are no "server" lines in the config, so the code hack in SolrCloud
is no longer required.  It will take some time for Solr's code to be
changed to take advantage of this.

As far as functionality, the embedded zookeeper will do fine for non-HA
deployments, but it does mean there will be differences between your
production and non-HA environments in *doing* the deployment, and in how
Solr is configured/started.  If that's acceptable to you, and you do not
need advanced ZK features, then the embedded ZK would be good enough for
non-HA environments.

I personally would still use standalone ZK even for a dev environment,
just to reduce the number of things that are different from production.

Thanks,
Shawn