You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rallavagu <ra...@gmail.com> on 2015/10/01 21:26:34 UTC

Zk and Solr Cloud

Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.

See following errors in ZK and Solr and they are connected.

When I see the following error in Zookeeper,

unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len11823809 is out of range!
         at 
org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
         at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
         at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
         at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)


There is the following corresponding error in Solr

caught end of stream exception
EndOfStreamException: Unable to read additional data from client 
sessionid 0x25024c8ea0e0000, likely client has closed socket
         at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
         at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
         at java.lang.Thread.run(Thread.java:744)

Any clues as to what is causing these errors. Thanks.

Re: Zk and Solr Cloud

Posted by Upayavira <uv...@odoko.co.uk>.
Very interesting, Shawn.

What I'd say is paste more of the stacktraces, so we can see the context
in which the exception happened. It could be that you are flooding the
overseer, or it could be that you have a synonyms file (or such) that is
too large. I'd like to think the rest of the stacktrace could give us
clues.

Upayavira

On Fri, Oct 2, 2015, at 06:58 AM, Shawn Heisey wrote:
> On 10/1/2015 1:26 PM, Rallavagu wrote:
> > Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.
> >
> > See following errors in ZK and Solr and they are connected.
> >
> > When I see the following error in Zookeeper,
> >
> > unexpected error, closing socket connection and attempting reconnect
> > java.io.IOException: Packet len11823809 is out of range!
> 
> This is usually caused by the overseer queue (stored in zookeeper)
> becoming extraordinarily huge, because it's being flooded with work
> entries far faster than the overseer can process them.  This causes the
> znode where the queue is stored to become larger than the maximum size
> for a znode, which defaults to about 1MB.  In this case (reading your
> log message that says len11823809), something in zookeeper has gotten to
> be 11MB in size, so the zookeeper client cannot read it.
> 
> I think the zookeeper server code must be handling the addition of
> children to the queue znode through a code path that doesn't pay
> attention to the maximum buffer size, just goes ahead and adds it,
> probably by simply appending data.  I'm unfamiliar with how the ZK
> database works, so I'm guessing here.
> 
> If I'm right about where the problem is, there are two workarounds to
> your immediate issue.
> 
> 1) Delete all the entries in your overseer queue using a zookeeper
> client that lets you edit the DB directly.  If you haven't changed the
> cloud structure and all your servers are working, this should be safe.
> 
> 2) Set the jute.maxbuffer system property on the startup commandline for
> all ZK servers and all ZK clients (Solr instances) to a size that's
> large enough to accommodate the huge znode.  In order to do the deletion
> mentioned in option 1 above,you might need to increase jute.maxbuffer on
> the servers and the client you use for the deletion.
> 
> These are just workarounds.  Whatever caused the huge queue in the first
> place must be addressed.  It is frequently a performance issue.  If you
> go to the following link, you will see that jute.maxbuffer is considered
> an unsafe option:
> 
> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options
> 
> In Jira issue SOLR-7191, I wrote the following in one of my comments:
> 
> "The giant queue I encountered was about 850000 entries, and resulted in
> a packet length of a little over 14 megabytes. If I divide 850000 by 14,
> I know that I can have about 60000 overseer queue entries in one znode
> before jute.maxbuffer needs to be increased."
> 
> https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834
> 
> Thanks,
> Shawn
> 

Re: Zk and Solr Cloud

Posted by Rallavagu <ra...@gmail.com>.
Thanks for the insight into this Erick. Thanks.

On 10/2/15 8:58 AM, Erick Erickson wrote:
> Rallavagu:
>
> Absent nodes going up and down or otherwise changing state, Zookeeper
> isn't involved in the normal operations of Solr (adding docs,
> querying, all that). That said, things that change the state of the
> Solr nodes _do_ involve Zookeeper and the Overseer. The Overseer is
> used to serialize and control changing information in the
> clusterstate.json (or state.json) and others. If the nodes all tried
> to write to Zk directly, it's hard to coordinate. That's a little
> simplistic and counterintuitive, but maybe this will help.
>
> When a Solr instance starts up it
> 1> registers itself as live with ZK
> 2> creates a listener that ZK pings when there's a state change (some
> node goes up or down, goes into recovery, gets added, whatever).
> 3> gets the current cluster state from ZK.
>
> Thereafter, this particular node doesn't need to ask ZK for anything.
> It knows the current topology of cluster and can route requests (index
> or query) to the correct Solr replica etc.
>
> Now, let's claim that "something changes". Solr stops on one of the
> nodes. Or someone adds a collection. Or..... The overseer usually gets
> involved in changing the state on ZK for this new action. Part of that
> is that ZK sends an event to all the Solr nodes that have registered
> themselves as listeners that causes them to ask ZK for the current
> state of the cluster, and each Solr node adjusts its actions based on
> this information. Note the kind of thing here that changes and
> triggers this is that a whole replica becomes able or unable to carry
> out its functions, NOT that the some collection gets another doc added
> or answers a query.
>
> Zk also periodically pings each Solr instance that's registered itself
> and, if the node fails to respond may force it into recovery & etc.
> Again, though, that has nothing to do with standard Solr operations.
>
> So a massive overseer queue tends to indicate that there's a LOT of
> state changes, lots of nodes going up and down etc. One implication of
> the above is that if you turn on all your nodes in a large cluster at
> the same time, there'll be a LOT of activity; they'll all register
> themselves, try to elect leaders for shards, to into/out of recovery,
> become active, all these are things that trigger overseer activity.
>
> Or there are simply bugs in how the overseer works in the version
> you're using, I know there's been a lot of effort to harden that area
> over the various versions.
>
> Two things that are "interesting".
> 1> Only one of your Solr instances hosts the overseer. If you're doing
> a restart of _all_ your boxes, it's advisable to bounce the node
> that's the overseer _last_. Otherwise you risk an odd situation: the
> overseer is elected and starts to work, that node restarts which
> causes the overseer role to switch to another node which immediately
> is bounced and a new overseer is elected and....
>
> 2> As of 5.x, there are two ZK formats
> a> the "old" format where the entire clusterstate for all collections
> is kept in a single ZK node (/clusterstate.json)
> b> the "new" format where each collection has its own state.json that
> only contains the state for that collection.
>
> This is very helpful when you have many clusters. In the <a> case, any
> time _any_ node changes, _all_ nodes have to get a new state. In <b>,
> only the nodes involved in a single collection need to get new
> information when any node in _that_ collection change.
>
> FWIW,
> Erick
>
>
>
> On Fri, Oct 2, 2015 at 8:03 AM, Ravi Solr <ra...@gmail.com> wrote:
>> Awesome nugget Shawn, I also faced similar issue a while ago while i was
>> doing a full re-index. It would be great if such tips are added into FAQ
>> type documentation on cwiki. I love the SOLR forum everyday I learn
>> something new :-)
>>
>> Thanks
>>
>> Ravi Kiran Bhaskar
>>
>> On Fri, Oct 2, 2015 at 1:58 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>>
>>> On 10/1/2015 1:26 PM, Rallavagu wrote:
>>>> Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.
>>>>
>>>> See following errors in ZK and Solr and they are connected.
>>>>
>>>> When I see the following error in Zookeeper,
>>>>
>>>> unexpected error, closing socket connection and attempting reconnect
>>>> java.io.IOException: Packet len11823809 is out of range!
>>>
>>> This is usually caused by the overseer queue (stored in zookeeper)
>>> becoming extraordinarily huge, because it's being flooded with work
>>> entries far faster than the overseer can process them.  This causes the
>>> znode where the queue is stored to become larger than the maximum size
>>> for a znode, which defaults to about 1MB.  In this case (reading your
>>> log message that says len11823809), something in zookeeper has gotten to
>>> be 11MB in size, so the zookeeper client cannot read it.
>>>
>>> I think the zookeeper server code must be handling the addition of
>>> children to the queue znode through a code path that doesn't pay
>>> attention to the maximum buffer size, just goes ahead and adds it,
>>> probably by simply appending data.  I'm unfamiliar with how the ZK
>>> database works, so I'm guessing here.
>>>
>>> If I'm right about where the problem is, there are two workarounds to
>>> your immediate issue.
>>>
>>> 1) Delete all the entries in your overseer queue using a zookeeper
>>> client that lets you edit the DB directly.  If you haven't changed the
>>> cloud structure and all your servers are working, this should be safe.
>>>
>>> 2) Set the jute.maxbuffer system property on the startup commandline for
>>> all ZK servers and all ZK clients (Solr instances) to a size that's
>>> large enough to accommodate the huge znode.  In order to do the deletion
>>> mentioned in option 1 above,you might need to increase jute.maxbuffer on
>>> the servers and the client you use for the deletion.
>>>
>>> These are just workarounds.  Whatever caused the huge queue in the first
>>> place must be addressed.  It is frequently a performance issue.  If you
>>> go to the following link, you will see that jute.maxbuffer is considered
>>> an unsafe option:
>>>
>>> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options
>>>
>>> In Jira issue SOLR-7191, I wrote the following in one of my comments:
>>>
>>> "The giant queue I encountered was about 850000 entries, and resulted in
>>> a packet length of a little over 14 megabytes. If I divide 850000 by 14,
>>> I know that I can have about 60000 overseer queue entries in one znode
>>> before jute.maxbuffer needs to be increased."
>>>
>>> https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834
>>>
>>> Thanks,
>>> Shawn
>>>
>>>

Re: Zk and Solr Cloud

Posted by Erick Erickson <er...@gmail.com>.
Rallavagu:

Absent nodes going up and down or otherwise changing state, Zookeeper
isn't involved in the normal operations of Solr (adding docs,
querying, all that). That said, things that change the state of the
Solr nodes _do_ involve Zookeeper and the Overseer. The Overseer is
used to serialize and control changing information in the
clusterstate.json (or state.json) and others. If the nodes all tried
to write to Zk directly, it's hard to coordinate. That's a little
simplistic and counterintuitive, but maybe this will help.

When a Solr instance starts up it
1> registers itself as live with ZK
2> creates a listener that ZK pings when there's a state change (some
node goes up or down, goes into recovery, gets added, whatever).
3> gets the current cluster state from ZK.

Thereafter, this particular node doesn't need to ask ZK for anything.
It knows the current topology of cluster and can route requests (index
or query) to the correct Solr replica etc.

Now, let's claim that "something changes". Solr stops on one of the
nodes. Or someone adds a collection. Or..... The overseer usually gets
involved in changing the state on ZK for this new action. Part of that
is that ZK sends an event to all the Solr nodes that have registered
themselves as listeners that causes them to ask ZK for the current
state of the cluster, and each Solr node adjusts its actions based on
this information. Note the kind of thing here that changes and
triggers this is that a whole replica becomes able or unable to carry
out its functions, NOT that the some collection gets another doc added
or answers a query.

Zk also periodically pings each Solr instance that's registered itself
and, if the node fails to respond may force it into recovery & etc.
Again, though, that has nothing to do with standard Solr operations.

So a massive overseer queue tends to indicate that there's a LOT of
state changes, lots of nodes going up and down etc. One implication of
the above is that if you turn on all your nodes in a large cluster at
the same time, there'll be a LOT of activity; they'll all register
themselves, try to elect leaders for shards, to into/out of recovery,
become active, all these are things that trigger overseer activity.

Or there are simply bugs in how the overseer works in the version
you're using, I know there's been a lot of effort to harden that area
over the various versions.

Two things that are "interesting".
1> Only one of your Solr instances hosts the overseer. If you're doing
a restart of _all_ your boxes, it's advisable to bounce the node
that's the overseer _last_. Otherwise you risk an odd situation: the
overseer is elected and starts to work, that node restarts which
causes the overseer role to switch to another node which immediately
is bounced and a new overseer is elected and....

2> As of 5.x, there are two ZK formats
a> the "old" format where the entire clusterstate for all collections
is kept in a single ZK node (/clusterstate.json)
b> the "new" format where each collection has its own state.json that
only contains the state for that collection.

This is very helpful when you have many clusters. In the <a> case, any
time _any_ node changes, _all_ nodes have to get a new state. In <b>,
only the nodes involved in a single collection need to get new
information when any node in _that_ collection change.

FWIW,
Erick



On Fri, Oct 2, 2015 at 8:03 AM, Ravi Solr <ra...@gmail.com> wrote:
> Awesome nugget Shawn, I also faced similar issue a while ago while i was
> doing a full re-index. It would be great if such tips are added into FAQ
> type documentation on cwiki. I love the SOLR forum everyday I learn
> something new :-)
>
> Thanks
>
> Ravi Kiran Bhaskar
>
> On Fri, Oct 2, 2015 at 1:58 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 10/1/2015 1:26 PM, Rallavagu wrote:
>> > Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.
>> >
>> > See following errors in ZK and Solr and they are connected.
>> >
>> > When I see the following error in Zookeeper,
>> >
>> > unexpected error, closing socket connection and attempting reconnect
>> > java.io.IOException: Packet len11823809 is out of range!
>>
>> This is usually caused by the overseer queue (stored in zookeeper)
>> becoming extraordinarily huge, because it's being flooded with work
>> entries far faster than the overseer can process them.  This causes the
>> znode where the queue is stored to become larger than the maximum size
>> for a znode, which defaults to about 1MB.  In this case (reading your
>> log message that says len11823809), something in zookeeper has gotten to
>> be 11MB in size, so the zookeeper client cannot read it.
>>
>> I think the zookeeper server code must be handling the addition of
>> children to the queue znode through a code path that doesn't pay
>> attention to the maximum buffer size, just goes ahead and adds it,
>> probably by simply appending data.  I'm unfamiliar with how the ZK
>> database works, so I'm guessing here.
>>
>> If I'm right about where the problem is, there are two workarounds to
>> your immediate issue.
>>
>> 1) Delete all the entries in your overseer queue using a zookeeper
>> client that lets you edit the DB directly.  If you haven't changed the
>> cloud structure and all your servers are working, this should be safe.
>>
>> 2) Set the jute.maxbuffer system property on the startup commandline for
>> all ZK servers and all ZK clients (Solr instances) to a size that's
>> large enough to accommodate the huge znode.  In order to do the deletion
>> mentioned in option 1 above,you might need to increase jute.maxbuffer on
>> the servers and the client you use for the deletion.
>>
>> These are just workarounds.  Whatever caused the huge queue in the first
>> place must be addressed.  It is frequently a performance issue.  If you
>> go to the following link, you will see that jute.maxbuffer is considered
>> an unsafe option:
>>
>> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options
>>
>> In Jira issue SOLR-7191, I wrote the following in one of my comments:
>>
>> "The giant queue I encountered was about 850000 entries, and resulted in
>> a packet length of a little over 14 megabytes. If I divide 850000 by 14,
>> I know that I can have about 60000 overseer queue entries in one znode
>> before jute.maxbuffer needs to be increased."
>>
>> https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834
>>
>> Thanks,
>> Shawn
>>
>>

Re: Zk and Solr Cloud

Posted by Ravi Solr <ra...@gmail.com>.
Awesome nugget Shawn, I also faced similar issue a while ago while i was
doing a full re-index. It would be great if such tips are added into FAQ
type documentation on cwiki. I love the SOLR forum everyday I learn
something new :-)

Thanks

Ravi Kiran Bhaskar

On Fri, Oct 2, 2015 at 1:58 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/1/2015 1:26 PM, Rallavagu wrote:
> > Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.
> >
> > See following errors in ZK and Solr and they are connected.
> >
> > When I see the following error in Zookeeper,
> >
> > unexpected error, closing socket connection and attempting reconnect
> > java.io.IOException: Packet len11823809 is out of range!
>
> This is usually caused by the overseer queue (stored in zookeeper)
> becoming extraordinarily huge, because it's being flooded with work
> entries far faster than the overseer can process them.  This causes the
> znode where the queue is stored to become larger than the maximum size
> for a znode, which defaults to about 1MB.  In this case (reading your
> log message that says len11823809), something in zookeeper has gotten to
> be 11MB in size, so the zookeeper client cannot read it.
>
> I think the zookeeper server code must be handling the addition of
> children to the queue znode through a code path that doesn't pay
> attention to the maximum buffer size, just goes ahead and adds it,
> probably by simply appending data.  I'm unfamiliar with how the ZK
> database works, so I'm guessing here.
>
> If I'm right about where the problem is, there are two workarounds to
> your immediate issue.
>
> 1) Delete all the entries in your overseer queue using a zookeeper
> client that lets you edit the DB directly.  If you haven't changed the
> cloud structure and all your servers are working, this should be safe.
>
> 2) Set the jute.maxbuffer system property on the startup commandline for
> all ZK servers and all ZK clients (Solr instances) to a size that's
> large enough to accommodate the huge znode.  In order to do the deletion
> mentioned in option 1 above,you might need to increase jute.maxbuffer on
> the servers and the client you use for the deletion.
>
> These are just workarounds.  Whatever caused the huge queue in the first
> place must be addressed.  It is frequently a performance issue.  If you
> go to the following link, you will see that jute.maxbuffer is considered
> an unsafe option:
>
> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options
>
> In Jira issue SOLR-7191, I wrote the following in one of my comments:
>
> "The giant queue I encountered was about 850000 entries, and resulted in
> a packet length of a little over 14 megabytes. If I divide 850000 by 14,
> I know that I can have about 60000 overseer queue entries in one znode
> before jute.maxbuffer needs to be increased."
>
> https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834
>
> Thanks,
> Shawn
>
>

Re: Zk and Solr Cloud

Posted by Rallavagu <ra...@gmail.com>.
Thanks Shawn.

Right. That is a great insight into the issue. We ended up clearing the 
overseer queue and then cloud became normal.

We were running Solr indexing process and wondering if that caused the 
queue to grow. Will Solr (leader) add a work entry to zookeeper for 
every update if not what are those work entries?

Thanks

On 10/1/15 10:58 PM, Shawn Heisey wrote:
> On 10/1/2015 1:26 PM, Rallavagu wrote:
>> Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.
>>
>> See following errors in ZK and Solr and they are connected.
>>
>> When I see the following error in Zookeeper,
>>
>> unexpected error, closing socket connection and attempting reconnect
>> java.io.IOException: Packet len11823809 is out of range!
>
> This is usually caused by the overseer queue (stored in zookeeper)
> becoming extraordinarily huge, because it's being flooded with work
> entries far faster than the overseer can process them.  This causes the
> znode where the queue is stored to become larger than the maximum size
> for a znode, which defaults to about 1MB.  In this case (reading your
> log message that says len11823809), something in zookeeper has gotten to
> be 11MB in size, so the zookeeper client cannot read it.
>
> I think the zookeeper server code must be handling the addition of
> children to the queue znode through a code path that doesn't pay
> attention to the maximum buffer size, just goes ahead and adds it,
> probably by simply appending data.  I'm unfamiliar with how the ZK
> database works, so I'm guessing here.
>
> If I'm right about where the problem is, there are two workarounds to
> your immediate issue.
>
> 1) Delete all the entries in your overseer queue using a zookeeper
> client that lets you edit the DB directly.  If you haven't changed the
> cloud structure and all your servers are working, this should be safe.
>
> 2) Set the jute.maxbuffer system property on the startup commandline for
> all ZK servers and all ZK clients (Solr instances) to a size that's
> large enough to accommodate the huge znode.  In order to do the deletion
> mentioned in option 1 above,you might need to increase jute.maxbuffer on
> the servers and the client you use for the deletion.
>
> These are just workarounds.  Whatever caused the huge queue in the first
> place must be addressed.  It is frequently a performance issue.  If you
> go to the following link, you will see that jute.maxbuffer is considered
> an unsafe option:
>
> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options
>
> In Jira issue SOLR-7191, I wrote the following in one of my comments:
>
> "The giant queue I encountered was about 850000 entries, and resulted in
> a packet length of a little over 14 megabytes. If I divide 850000 by 14,
> I know that I can have about 60000 overseer queue entries in one znode
> before jute.maxbuffer needs to be increased."
>
> https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834
>
> Thanks,
> Shawn
>

Re: Zk and Solr Cloud

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/1/2015 1:26 PM, Rallavagu wrote:
> Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.
>
> See following errors in ZK and Solr and they are connected.
>
> When I see the following error in Zookeeper,
>
> unexpected error, closing socket connection and attempting reconnect
> java.io.IOException: Packet len11823809 is out of range!

This is usually caused by the overseer queue (stored in zookeeper)
becoming extraordinarily huge, because it's being flooded with work
entries far faster than the overseer can process them.  This causes the
znode where the queue is stored to become larger than the maximum size
for a znode, which defaults to about 1MB.  In this case (reading your
log message that says len11823809), something in zookeeper has gotten to
be 11MB in size, so the zookeeper client cannot read it.

I think the zookeeper server code must be handling the addition of
children to the queue znode through a code path that doesn't pay
attention to the maximum buffer size, just goes ahead and adds it,
probably by simply appending data.  I'm unfamiliar with how the ZK
database works, so I'm guessing here.

If I'm right about where the problem is, there are two workarounds to
your immediate issue.

1) Delete all the entries in your overseer queue using a zookeeper
client that lets you edit the DB directly.  If you haven't changed the
cloud structure and all your servers are working, this should be safe.

2) Set the jute.maxbuffer system property on the startup commandline for
all ZK servers and all ZK clients (Solr instances) to a size that's
large enough to accommodate the huge znode.  In order to do the deletion
mentioned in option 1 above,you might need to increase jute.maxbuffer on
the servers and the client you use for the deletion.

These are just workarounds.  Whatever caused the huge queue in the first
place must be addressed.  It is frequently a performance issue.  If you
go to the following link, you will see that jute.maxbuffer is considered
an unsafe option:

http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options

In Jira issue SOLR-7191, I wrote the following in one of my comments:

"The giant queue I encountered was about 850000 entries, and resulted in
a packet length of a little over 14 megabytes. If I divide 850000 by 14,
I know that I can have about 60000 overseer queue entries in one znode
before jute.maxbuffer needs to be increased."

https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834

Thanks,
Shawn