You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Joe Lerner <jo...@gmail.com> on 2019/01/03 19:46:55 UTC

So Many Zookeeper Warnings--There Must Be a Problem

Hi,

We have a simple architecture: 2 SOLR Cloud servers (on servers #1 and #2),
and 3 zookeeper instances (on servers #1, #2, and #3). Things work fine
(although we had a couple of brief unexplained outages), but:

One worrisome thing is that when I status zookeeper on #1 and #2, I get
Mode=Leader on both--#3 shows follower. This seems to be a pretty permanent
condition, at least right now as I look at it. And there isn't any big
maintenance or anything going on.

Also, we are getting *TONS* of continuous log warnings from our client
applications. From one server it shows this:



And from another server we get this:


These are making our logs impossible to read, but worse, I assume indicate
that something is wrong.

Thanks for any help!

Joe Lerner



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: So Many Zookeeper Warnings--There Must Be a Problem

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/4/2019 5:24 AM, Joe Lerner wrote:
> server #1 = myid#1
> server #2 = myid#2
> server #3 = myid#2
>
> My plan would be to do the following, while users are still online (it's a
> big [bad] deal if we need to take search offline):
>
> 1. Take zk #3 down.
> 2. Fix zk #3 by deleting the contents of the zk data directory and assign it
> myid#3
> 3. Bring zk#3 back up
> 4. Do a full re-build of all collections

There should be no need to rebuild anything in Solr once zookeeper is 
repaired in this fashion.  The third zookeeper will replicate data from 
whichever of the other two has won the leader election.  A three-node 
zookeeper ensemble is 100% functional with two nodes running.

You would only need to rebuild the Solr side if all data on the 
zookeeper side were lost.  I would not expect this action to lose any 
data in zookeeper.

The info you tried to share about your log messages in the original post 
for this thread did not come through.  I do not see it either on the 
mailing list or in the Nabble mirror.  It does look like you started 
another thread which does have the info.  I will address those messages 
in that thread.

Thanks,
Shawn


Re: So Many Zookeeper Warnings--There Must Be a Problem

Posted by Erick Erickson <er...@gmail.com>.
How brave are you? ;)....

I'll defer to Scott on the internals of ZK and why it
might be necessary to delete the ZK data dirs, but
what happens if you just correct your configuration and
drive on?

If that doesn't work here's something to try....
Shut down your Solr instances, then.

- bin/solr zk cp -r zk:/ some_local_dir

- fix your ZK, perhaps blowing the data directories away
and bring the ZK servers back up.

- bin/solr zk cp -r some_local_dir zk:/

Start your Solr instances.

NOTE: if you've configured your solr info with a "chroot", the ZK path
will be slightly different.

NOTE: I'm going from memory on the exact form of those commands.
bin/solr -help
should show you the info....

WARNING: This worked at some point in the past, but is _not_
"officially" supported, it was just a happy consequence of code to
copy data from ZK and back to replace the zkCli functionality, creating
one less thing for Solr users to have to keep track of.

What that does is copy the cluster status relevant to Solr from then back to ZK.

DO NOT change your Solr data in any way when doing this. What this is
trying to do is copy all the topology information in ZK. Assuming the Solr
nodes haven't changed, have the same IP address etc. it _might_ work for you.

Best,
Erick

On Fri, Jan 4, 2019 at 4:25 AM Joe Lerner <jo...@gmail.com> wrote:
>
> wrt, "You'll probably have to delete the contents of the zk data directory
> and rebuild your collections."
>
> Rebuild my *SOLR* collections? That's easy enough for us.
>
> If this is how we're incorrectly configured now:
>
> server #1 = myid#1
> server #2 = myid#2
> server #3 = myid#2
>
> My plan would be to do the following, while users are still online (it's a
> big [bad] deal if we need to take search offline):
>
> 1. Take zk #3 down.
> 2. Fix zk #3 by deleting the contents of the zk data directory and assign it
> myid#3
> 3. Bring zk#3 back up
> 4. Do a full re-build of all collections
>
> Thanks!
>
> Joe
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: So Many Zookeeper Warnings--There Must Be a Problem

Posted by Joe Lerner <jo...@gmail.com>.
wrt, "You'll probably have to delete the contents of the zk data directory
and rebuild your collections."

Rebuild my *SOLR* collections? That's easy enough for us. 

If this is how we're incorrectly configured now:

server #1 = myid#1
server #2 = myid#2
server #3 = myid#2

My plan would be to do the following, while users are still online (it's a
big [bad] deal if we need to take search offline):

1. Take zk #3 down.
2. Fix zk #3 by deleting the contents of the zk data directory and assign it
myid#3
3. Bring zk#3 back up
4. Do a full re-build of all collections

Thanks!

Joe



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: So Many Zookeeper Warnings--There Must Be a Problem

Posted by Scott Stults <ss...@opensourceconnections.com>.
Good! Hopefully that's your smoking gun.

The port settings are fine, but since you're deploying to separate servers
you don't need different ports in the "server.x=" section. This section of
the docs explains it better:

http://zookeeper.apache.org/doc/r3.4.7/zookeeperAdmin.html#sc_zkMulitServerSetup


On Thu, Jan 3, 2019 at 3:49 PM Joe Lerner <jo...@gmail.com> wrote:

> Hi Scott,
>
> First, we are definitely mis-onfigured for the myid thing. Basically two of
> them were identifying as ID #2, and they are the two ZK's claiming to be
> the
> leader. Definitely something to straighten out!
>
> Our 3 lines in zoo.cfg look correct. Except they look like this:
>
> clientPort:2181
>
> server.1=host1:2190:2195
> server.2=host2:2191:2196
> server.3=host3:2192:2197
>
> Notice the port range, and overlap...
>
> Is that.../copacetic/?
>
> Thanks!
>
> Joe
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com

Re: So Many Zookeeper Warnings--There Must Be a Problem

Posted by Joe Lerner <jo...@gmail.com>.
Hi Scott,

First, we are definitely mis-onfigured for the myid thing. Basically two of
them were identifying as ID #2, and they are the two ZK's claiming to be the
leader. Definitely something to straighten out!

Our 3 lines in zoo.cfg look correct. Except they look like this:

clientPort:2181

server.1=host1:2190:2195 
server.2=host2:2191:2196 
server.3=host3:2192:2197

Notice the port range, and overlap...

Is that.../copacetic/?

Thanks!

Joe 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: So Many Zookeeper Warnings--There Must Be a Problem

Posted by Scott Stults <ss...@opensourceconnections.com>.
Hi Joe,

Yeah, two leaders is definitely a problem. I'd fix that before wading
through the error logs.

Check out zoo.cfg on each server. You should have three lines at the end
similar to this:

server.1=host1:2181:2281
server.2=host2:2182:2282
server.3=host3:2183:2283

(substitute "host*" with the right IP or address of your servers)

Also on each server, check the file "myid". It should have a single number
that maps to the list above. For example, on host1 your myid file should
contain a single value of "1" in it. On host2 the file should contain "2".

You'll probably have to delete the contents of the zk data directory and
rebuild your collections.



On Thu, Jan 3, 2019 at 2:47 PM Joe Lerner <jo...@gmail.com> wrote:

> Hi,
>
> We have a simple architecture: 2 SOLR Cloud servers (on servers #1 and #2),
> and 3 zookeeper instances (on servers #1, #2, and #3). Things work fine
> (although we had a couple of brief unexplained outages), but:
>
> One worrisome thing is that when I status zookeeper on #1 and #2, I get
> Mode=Leader on both--#3 shows follower. This seems to be a pretty permanent
> condition, at least right now as I look at it. And there isn't any big
> maintenance or anything going on.
>
> Also, we are getting *TONS* of continuous log warnings from our client
> applications. From one server it shows this:
>
>
>
> And from another server we get this:
>
>
> These are making our logs impossible to read, but worse, I assume indicate
> that something is wrong.
>
> Thanks for any help!
>
> Joe Lerner
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com