You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Greg Solovyev <gr...@zimbra.com> on 2014/11/01 00:08:14 UTC

Consul instead of ZooKeeper anyone?

I am investigating a project to make SolrCloud run on Consul instead of ZooKeeper. So far, my research revealed no such efforts, but I wanted to check with this list to make sure I am not going to be reinventing the wheel. Have anyone attempted using Consul instead of ZK to coordinate SolrCloud nodes? 

Thanks, 
Greg 

Re: Consul instead of ZooKeeper anyone?

Posted by "Jürgen Wagner (DVT)" <ju...@devoteam.com>.
Hello Greg,
  Consul and  Zookeeper are quite similar in their offering with respect
to what SolrCloud needs. Service discovery, watches on distributed
cluster state, updates of configuration could all be handled through
Consul. Plus, Consul does offer built-in  capabilities for
multi-datacenter scenarios and encryption. Also, the capability to
inquire Consul via DNS, i.e., without any client-side library
requirements, is quite compelling. One could integrate Java, C/C++,
C#/.NET, Python, Ruby and other types of clients without much effort.

The largest benefit, however, I would see for the zoo of services around
Solr. At least in my experience, SolrCloud for serious applications is
never deployed by itself. There will be numerous services for data
collection, semantic processing, log management, monitoring,
administration, reporting and user front-ends around the core SolrCloud.
This zoo is hard to manage and especially the coordination of
configuration and cluster consistency is hard to manage. Consul could
help here as it comes from the more operations-type level of managing an
elastic set of services in data centers.

So, after singing the praises, why have I not started using Consul then? :-)

First and foremost: Zookeeper from the Hadoop/Apache ecosystem is
already integrated with SolrCloud. Ripping it out and replacing it with
something similar but not quite the same would require significant
effort, esp. for testing this thoroughly. My clients are not willing to
pay for basic groundworks.

Second: Consul looks nice but documentation leaves many questions open.
Once you start setting it up, there will be questions where you have to
dive into the code for answers. Consul does not give me the same
"mature" impression as Zookeeper. So, I am still using our own service
management framework for the zoo of services in typical search clouds.
Consul is young, however, and may evolve. The version is 0.4.1 and I
don't use anything with a zero in front to manage a serious customer
infrastructure. Would you trust the a customer's 50-100 TB of source
data to a set  of SolrClouds based on a 0.x Consul? ;-)

Third: Consul lacks a decent integration with log management. In any
distributed environment, you don't just want to keep a snapshot of the
moment, but rather a possibly long history of state changes and
statistics, so there is a chance to not just monitor, but also to act.
In that respect, we would need more of cloud management recipes
integrated, without having to pull out the entire Puppet or Chef stack
that will come with its own view of the world. That again is a topic of
maturity and being fit for real-life requirements. I would love to see
Consul evolve into that type of lightweight cloud management with basic
services integrated. But: some way to go still.

There are other issues, but these are the major ones from my perspective.

So, the concept is nice, Hashimoto et al. are known to be creative
heads, and therefore I will keep watching what's happing there, but I
won't use Consul for any real customer projects yet - not even that part
that is not SolrCloud-dependent.

Best regards,
--Jürgen



On 01.11.2014 00:08, Greg Solovyev wrote:
> I am investigating a project to make SolrCloud run on Consul instead of ZooKeeper. So far, my research revealed no such efforts, but I wanted to check with this list to make sure I am not going to be reinventing the wheel. Have anyone attempted using Consul instead of ZK to coordinate SolrCloud nodes? 
>
> Thanks, 
> Greg 
>


-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center "Intelligence"
& Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wagner@devoteam.com
<ma...@devoteam.com>, URL: www.devoteam.de
<http://www.devoteam.de/>

------------------------------------------------------------------------
Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071



Re: Consul instead of ZooKeeper anyone?

Posted by Walter Underwood <wu...@wunderwood.org>.
It looks like Consul solves a different problem than Zookeeper. Consul manages what servers are up and starts new ones as needed. Zookeeper doesn’t start servers, but does leader election when they fail.

I don’t see any way that Consul could replace Zookeeper, but it could solve another part of the problem.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/

On Oct 31, 2014, at 5:15 PM, Erick Erickson <er...@gmail.com> wrote:

> Not that I know of, but.... look before you leap. I took a quick look at
> Consul and it really doesn't look like any kind of drop-in replacement.
> Also, the Zookeeper usage in SolrCloud isn't really pluggable
> AFAIK, so there'll be lots of places in the Solr code that need to be
> reworked etc., especially in the realm of collections and sharding.
> 
> The Collections API will be challenging to port over I think.
> 
> Not to mention SolrJ and CloudSolrServer for clients who want to interact
> with SolrCloud through Java.
> 
> Not saying it won't work, I just suspect that getting it done would be
> a big job, and thereafter keeping those changes in sync with the
> changing SolrCloud code base would chew up a lots of time. So if
> I were putting my Product Manager hat on I'd ask "is the benefit
> worth the effort?".
> 
> All that said, go for it if you've a mind to!
> 
> Best,
> Erick
> 
> On Fri, Oct 31, 2014 at 4:08 PM, Greg Solovyev <gr...@zimbra.com> wrote:
>> I am investigating a project to make SolrCloud run on Consul instead of ZooKeeper. So far, my research revealed no such efforts, but I wanted to check with this list to make sure I am not going to be reinventing the wheel. Have anyone attempted using Consul instead of ZK to coordinate SolrCloud nodes?
>> 
>> Thanks,
>> Greg


Re: Consul instead of ZooKeeper anyone?

Posted by "Jürgen Wagner (DVT)" <ju...@devoteam.com>.
Hello Greg,
  we run Zookeeper not on dedicated Zookeeper machines, but rather on
admin nodes in search application clusters (that makes two instances),
plus on at least one more node that does not have much load (e.g., a
crawling node). Also, as long as you don't stuff too much data into
Zookeeper yourself, the memory footprint of 2 GB seems to be a bit
generous to support SolrCloud.

Best regards,
--Jürgen


On 04.11.2014 20:23, Greg Solovyev wrote:
> Thanks for the answers Erick. I can see that this is a significant effort and I am certainly not asking the community to undertake this work. I was actually going to take a stab at it myself. Regarding $$ savings from not requiring ZK my assumption is that ZK in production demands a dedicated host and requires 2GB RAM/instance while Consul runs on less than 100MB RAM/instance. So, for ISPs, BSP and large enterprise deployments, the savings come would from reduced resource requirements. 
>
> Thanks,
> Greg
>
>

-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center "Intelligence"
& Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wagner@devoteam.com
<ma...@devoteam.com>, URL: www.devoteam.de
<http://www.devoteam.de/>

------------------------------------------------------------------------
Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071



Re: Consul instead of ZooKeeper anyone?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 11/4/2014 12:23 PM, Greg Solovyev wrote:
> Thanks for the answers Erick. I can see that this is a significant effort and I am certainly not asking the community to undertake this work. I was actually going to take a stab at it myself. Regarding $$ savings from not requiring ZK my assumption is that ZK in production demands a dedicated host and requires 2GB RAM/instance while Consul runs on less than 100MB RAM/instance. So, for ISPs, BSP and large enterprise deployments, the savings come would from reduced resource requirements. 

I have a small SolrCloud install.  Three servers, two of which run Solr
4.2.1.  All three of them run zookeeper.  The zookeeper process running
on the first Solr server has been running for a long time - started over
a year ago on May  6 12:05:37 2013.  I forced a full garbage collection
on the process and then checked heap usage ... it's less than 50MB. 
Linux shows a resident size of 200MB, but I could probably decrease that
by starting with the -Xmx option.  That option is currently not used, so
it's accepting Java's defaults.

Zookeeper's resource requirements are very small and do not require
dedicated hardware.  It is a good idea to put the ZK database on
separate disk platters from Solr's data, but I'm not even doing that --
everything's on the same filesystem.

Thanks,
Shawn


Re: Consul instead of ZooKeeper anyone?

Posted by Greg Solovyev <gr...@zimbra.com>.
Thanks for the answers Erick. I can see that this is a significant effort and I am certainly not asking the community to undertake this work. I was actually going to take a stab at it myself. Regarding $$ savings from not requiring ZK my assumption is that ZK in production demands a dedicated host and requires 2GB RAM/instance while Consul runs on less than 100MB RAM/instance. So, for ISPs, BSP and large enterprise deployments, the savings come would from reduced resource requirements. 

Thanks,
Greg

----- Original Message -----
From: "Erick Erickson" <er...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Monday, November 3, 2014 3:25:25 PM
Subject: Re: Consul instead of ZooKeeper anyone?

bq:  Do you think it would be possible to add an abstraction layer to
Solr source code in near future?

I strongly doubt it. As you've already noted, this is a large amount
of work. Without some super-compelling advantage I just don't see the
interest.

bq:  to avoid deploying ZK just for SolrCloud would save a bunch of $$
for large customers

How so? It's free.

Making this change would, IMO, require a compelling story to generate
much enthusiasm. So far I haven't seen that story, and Jürgen and
Walter raise valid points that haven't been addressed. I suspect
you're significantly underestimating the effort to get this stable in
the SolrCloud world as well.

I don't really want to be such a wet blanket, but you're asking about
a very significant amount of work from a bunch of people, all of whom
have lots of things on their plate. So without a _very_ good reason, I
think it's unlikely to generate much interest.

Best,
Erick

On Mon, Nov 3, 2014 at 11:17 AM, Greg Solovyev <gr...@zimbra.com> wrote:
> Thanks Erick,
> after looking further into Solr's source code, I see that it's married to ZK libraries and it won't be possible to extend existing code without diverting from the trunk. At the same time, I don't see any reason for lack of abstraction in cloud-related code of Solr and SolrJ. As far as I can see Consul provides all that SolrCloud needs and so if cloud code was using some more abstraction, ZK bindings could be substituted with another library. I am willing to implement a this functionality and the abstraction, but at the same time, I don't want to maintain my own branch of Solr because of this integration. Do you think it would be possible to add an abstraction layer to Solr source code in near future?
>
> I think Consul has all the features that SolrCloud needs and what's especially attractive about Consul is that it's memory footprint is 100X smaller than ZK. Mainly though, we are considering Consul as a main service locator for a bunch of other moving parts within Zimbra, so being able to avoid deploying ZK just for SolrCloud would save a bunch of $$ for large customers.
>
> Thanks,
> Greg
>
> ----- Original Message -----
> From: "Erick Erickson" <er...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, October 31, 2014 5:15:09 PM
> Subject: Re: Consul instead of ZooKeeper anyone?
>
> Not that I know of, but.... look before you leap. I took a quick look at
> Consul and it really doesn't look like any kind of drop-in replacement.
> Also, the Zookeeper usage in SolrCloud isn't really pluggable
> AFAIK, so there'll be lots of places in the Solr code that need to be
> reworked etc., especially in the realm of collections and sharding.
>
> The Collections API will be challenging to port over I think.
>
> Not to mention SolrJ and CloudSolrServer for clients who want to interact
> with SolrCloud through Java.
>
> Not saying it won't work, I just suspect that getting it done would be
> a big job, and thereafter keeping those changes in sync with the
> changing SolrCloud code base would chew up a lots of time. So if
> I were putting my Product Manager hat on I'd ask "is the benefit
> worth the effort?".
>
> All that said, go for it if you've a mind to!
>
> Best,
> Erick
>
> On Fri, Oct 31, 2014 at 4:08 PM, Greg Solovyev <gr...@zimbra.com> wrote:
>> I am investigating a project to make SolrCloud run on Consul instead of ZooKeeper. So far, my research revealed no such efforts, but I wanted to check with this list to make sure I am not going to be reinventing the wheel. Have anyone attempted using Consul instead of ZK to coordinate SolrCloud nodes?
>>
>> Thanks,
>> Greg

Re: Consul instead of ZooKeeper anyone?

Posted by Erick Erickson <er...@gmail.com>.
bq:  Do you think it would be possible to add an abstraction layer to
Solr source code in near future?

I strongly doubt it. As you've already noted, this is a large amount
of work. Without some super-compelling advantage I just don't see the
interest.

bq:  to avoid deploying ZK just for SolrCloud would save a bunch of $$
for large customers

How so? It's free.

Making this change would, IMO, require a compelling story to generate
much enthusiasm. So far I haven't seen that story, and Jürgen and
Walter raise valid points that haven't been addressed. I suspect
you're significantly underestimating the effort to get this stable in
the SolrCloud world as well.

I don't really want to be such a wet blanket, but you're asking about
a very significant amount of work from a bunch of people, all of whom
have lots of things on their plate. So without a _very_ good reason, I
think it's unlikely to generate much interest.

Best,
Erick

On Mon, Nov 3, 2014 at 11:17 AM, Greg Solovyev <gr...@zimbra.com> wrote:
> Thanks Erick,
> after looking further into Solr's source code, I see that it's married to ZK libraries and it won't be possible to extend existing code without diverting from the trunk. At the same time, I don't see any reason for lack of abstraction in cloud-related code of Solr and SolrJ. As far as I can see Consul provides all that SolrCloud needs and so if cloud code was using some more abstraction, ZK bindings could be substituted with another library. I am willing to implement a this functionality and the abstraction, but at the same time, I don't want to maintain my own branch of Solr because of this integration. Do you think it would be possible to add an abstraction layer to Solr source code in near future?
>
> I think Consul has all the features that SolrCloud needs and what's especially attractive about Consul is that it's memory footprint is 100X smaller than ZK. Mainly though, we are considering Consul as a main service locator for a bunch of other moving parts within Zimbra, so being able to avoid deploying ZK just for SolrCloud would save a bunch of $$ for large customers.
>
> Thanks,
> Greg
>
> ----- Original Message -----
> From: "Erick Erickson" <er...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, October 31, 2014 5:15:09 PM
> Subject: Re: Consul instead of ZooKeeper anyone?
>
> Not that I know of, but.... look before you leap. I took a quick look at
> Consul and it really doesn't look like any kind of drop-in replacement.
> Also, the Zookeeper usage in SolrCloud isn't really pluggable
> AFAIK, so there'll be lots of places in the Solr code that need to be
> reworked etc., especially in the realm of collections and sharding.
>
> The Collections API will be challenging to port over I think.
>
> Not to mention SolrJ and CloudSolrServer for clients who want to interact
> with SolrCloud through Java.
>
> Not saying it won't work, I just suspect that getting it done would be
> a big job, and thereafter keeping those changes in sync with the
> changing SolrCloud code base would chew up a lots of time. So if
> I were putting my Product Manager hat on I'd ask "is the benefit
> worth the effort?".
>
> All that said, go for it if you've a mind to!
>
> Best,
> Erick
>
> On Fri, Oct 31, 2014 at 4:08 PM, Greg Solovyev <gr...@zimbra.com> wrote:
>> I am investigating a project to make SolrCloud run on Consul instead of ZooKeeper. So far, my research revealed no such efforts, but I wanted to check with this list to make sure I am not going to be reinventing the wheel. Have anyone attempted using Consul instead of ZK to coordinate SolrCloud nodes?
>>
>> Thanks,
>> Greg

Re: Consul instead of ZooKeeper anyone?

Posted by Greg Solovyev <gr...@zimbra.com>.
Thanks Erick, 
after looking further into Solr's source code, I see that it's married to ZK libraries and it won't be possible to extend existing code without diverting from the trunk. At the same time, I don't see any reason for lack of abstraction in cloud-related code of Solr and SolrJ. As far as I can see Consul provides all that SolrCloud needs and so if cloud code was using some more abstraction, ZK bindings could be substituted with another library. I am willing to implement a this functionality and the abstraction, but at the same time, I don't want to maintain my own branch of Solr because of this integration. Do you think it would be possible to add an abstraction layer to Solr source code in near future? 

I think Consul has all the features that SolrCloud needs and what's especially attractive about Consul is that it's memory footprint is 100X smaller than ZK. Mainly though, we are considering Consul as a main service locator for a bunch of other moving parts within Zimbra, so being able to avoid deploying ZK just for SolrCloud would save a bunch of $$ for large customers.

Thanks,
Greg

----- Original Message -----
From: "Erick Erickson" <er...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Friday, October 31, 2014 5:15:09 PM
Subject: Re: Consul instead of ZooKeeper anyone?

Not that I know of, but.... look before you leap. I took a quick look at
Consul and it really doesn't look like any kind of drop-in replacement.
Also, the Zookeeper usage in SolrCloud isn't really pluggable
AFAIK, so there'll be lots of places in the Solr code that need to be
reworked etc., especially in the realm of collections and sharding.

The Collections API will be challenging to port over I think.

Not to mention SolrJ and CloudSolrServer for clients who want to interact
with SolrCloud through Java.

Not saying it won't work, I just suspect that getting it done would be
a big job, and thereafter keeping those changes in sync with the
changing SolrCloud code base would chew up a lots of time. So if
I were putting my Product Manager hat on I'd ask "is the benefit
worth the effort?".

All that said, go for it if you've a mind to!

Best,
Erick

On Fri, Oct 31, 2014 at 4:08 PM, Greg Solovyev <gr...@zimbra.com> wrote:
> I am investigating a project to make SolrCloud run on Consul instead of ZooKeeper. So far, my research revealed no such efforts, but I wanted to check with this list to make sure I am not going to be reinventing the wheel. Have anyone attempted using Consul instead of ZK to coordinate SolrCloud nodes?
>
> Thanks,
> Greg

Re: Consul instead of ZooKeeper anyone?

Posted by Erick Erickson <er...@gmail.com>.
Not that I know of, but.... look before you leap. I took a quick look at
Consul and it really doesn't look like any kind of drop-in replacement.
Also, the Zookeeper usage in SolrCloud isn't really pluggable
AFAIK, so there'll be lots of places in the Solr code that need to be
reworked etc., especially in the realm of collections and sharding.

The Collections API will be challenging to port over I think.

Not to mention SolrJ and CloudSolrServer for clients who want to interact
with SolrCloud through Java.

Not saying it won't work, I just suspect that getting it done would be
a big job, and thereafter keeping those changes in sync with the
changing SolrCloud code base would chew up a lots of time. So if
I were putting my Product Manager hat on I'd ask "is the benefit
worth the effort?".

All that said, go for it if you've a mind to!

Best,
Erick

On Fri, Oct 31, 2014 at 4:08 PM, Greg Solovyev <gr...@zimbra.com> wrote:
> I am investigating a project to make SolrCloud run on Consul instead of ZooKeeper. So far, my research revealed no such efforts, but I wanted to check with this list to make sure I am not going to be reinventing the wheel. Have anyone attempted using Consul instead of ZK to coordinate SolrCloud nodes?
>
> Thanks,
> Greg