You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by JoeSmith <fi...@gmail.com> on 2014/12/06 20:09:27 UTC

CloudSolrServer, concurrency and too many connections

We are using Solrj 10.10.0 to connect to a Zookeeper Solr host.  What is
the correct pattern for making concurrent requests to the Zookeeper host?

We are currently using CloudSolrServer, but it looks like this class is not
thread-safe (setDefaultCollection). Should this instance be initialized
once (at startup) and then re-used (in all threads) until shutdown when the
process terminates?  Or should it re-instantiated for each request?

Currently, we are trying to use CloudSolrServer as a singleton, but it
looks like the connections to the host are not being closed and under load
we start getting failures.  and In the Zookeeper logs we see this error:
   I

> WARN  - 2014-12-04 10:09:14.364;
> org.apache.zookeeper.server.NIOServerCnxnFactory; Too many connections from
> /11.22.33.44 - max is 60
>

netstat (on the Zookeeper host) shows that the connections are not being
closed. What is the 'correct' way to fix this?   Apologies if i have missed
any documentation that explains, pointers would be helpful.

Thanks,

Re: CloudSolrServer, concurrency and too many connections

Posted by Greg Solovyev <gr...@zimbra.com>.
I am seeing the same problem with 4.10.2 and 4.9.0. CloudSolrServer keeps opening connections to ZK and never closes them. Eventually (very soon) ZK runs out of connections and stops accepting new ones. 

Thanks,
Greg

----- Original Message -----
From: "JoeSmith" <fi...@gmail.com>
To: "solr-user" <so...@lucene.apache.org>
Sent: Sunday, December 7, 2014 8:11:50 PM
Subject: Re: CloudSolrServer, concurrency and too many connections

i've upgraded to 4.10.2 on the client-side.  Still seeing this connection
problem when connecting to the Zookeeper port.  If I connect directly to
SolrServer, the connections do not increase.  But when connecting to
Zookeeper, the connections increase up to 60 and then start to fail.  I
understand Zookeeper is configured to fail after 60 connections to prevent
a DOS attack, but I dont see why we keep adding new connections (up to
60).  Does the client-side Zookeeper code also use HttpClient
ConnectionPooling for its Connection Pool?  Below is the Exception that
shows up in the log file when this happens.  When we execute queries we are
using the _route_ parameter, could this explain anything?

o.a.zookeeper.ClientCnxn - Session 0x0 for server
aweqca3utmtc10.cloud.xxxx.com/10.22.10.107:9983, unexpected error, closing
socket connection and attempting reconnect

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_55]

        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
~[na:1.7.0_55]

        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
~[na:1.7.0_55]

        at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_55]

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
~[na:1.7.0_55]

        at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
~[zookeeper-3.4.6.jar:3.4.6-1569965]

        at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
~[zookeeper-3.4.6.jar:3.4.6-1569965]

        at
org.apache.zookeeper.Clie4.ntCnxn$SendThread.run(ClientCnxn.java:1081)
~[zookeeper-3.4.6.jar:3.4.6-1569965]


Will try to get the server code upgraded to 4.10.2.



On Sat, Dec 6, 2014 at 3:52 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 12/6/2014 12:09 PM, JoeSmith wrote:
> > We are currently using CloudSolrServer, but it looks like this class is
> not
> > thread-safe (setDefaultCollection). Should this instance be initialized
> > once (at startup) and then re-used (in all threads) until shutdown when
> the
> > process terminates?  Or should it re-instantiated for each request?
> >
> > Currently, we are trying to use CloudSolrServer as a singleton, but it
> > looks like the connections to the host are not being closed and under
> load
> > we start getting failures.  and In the Zookeeper logs we see this error:
> >
> >> WARN  - 2014-12-04 10:09:14.364;
> >> org.apache.zookeeper.server.NIOServerCnxnFactory; Too many connections
> from
> >> /11.22.33.44 - max is 60
> >
> > netstat (on the Zookeeper host) shows that the connections are not being
> > closed. What is the 'correct' way to fix this?   Apologies if i have
> missed
> > any documentation that explains, pointers would be helpful.
>
> All SolrServer implementations in SolrJ, including CloudSolrServer, are
> supposed to be threadsafe.  If it turns out they're not actually
> threadsafe, then we treat that as a bug.  The discussion to determine
> that it's a bug takes place on this mailing list, and once we determine
> that, the next step is to file an issue in Jira.
>
> The general way to use SolrJ is to initialize the server instance at the
> beginning and re-use it for all client communication to Solr.  With
> CloudSolrServer, you normally only need a single server instance to talk
> to the entire cloud, because you can set the "collection" parameter on
> each request to indicate which collection to work on.  If you only have
> a handful of collections, you might want to use multiple instances and
> use setDefaultCollection  to specify the collection.  With
> HttpSolrServer, an instance is required for each core, because the core
> name is in the initialization URL.
>
> I've not looked at the code, but I can't imagine that the client ever
> needs to make more than one connection to each server in the zookeeper
> ensemble.  Here's a list of the open connections on one of my zookeeper
> servers for my SolrCloud 4.2.1 install:
>
> java    21800 root   21u  IPv6            2836983      0t0      TCP
> 10.8.0.151:50178->10.8.0.152:2888 (ESTABLISHED)
> java    21800 root   22u  IPv6            2661097      0t0      TCP
> 10.8.0.151:3888->10.8.0.152:34116 (ESTABLISHED)
> java    21800 root   26u  IPv6           28065088      0t0      TCP
> 10.8.0.151:2181->10.8.0.141:52583 (ESTABLISHED)
> java    21800 root   27u  IPv6           23967470      0t0      TCP
> 10.8.0.151:2181->10.8.0.152:49436 (ESTABLISHED)
> java    21800 root   28r  IPv6           23969636      0t0      TCP
> 10.8.0.151:2181->10.8.0.151:57290 (ESTABLISHED)
> java    21800 root   29r  IPv6           23969951      0t0      TCP
> 10.8.0.151:3888->10.8.0.153:54721 (ESTABLISHED)
>
> The 151, 152, and 153 addresses are my ZK servers, with Solr also
> running on 151 and 152.  The 141 address is the SolrJ client.  The main
> ZK port is 2181, with ports 2888 and 3888 used for internal zookeeper
> communication.  I actually would have expected to see two client
> connections from .141 ... one for the indexer program and one for the
> webapp.  They haven't reported a Solr problem to me, so I guess it must
> be OK.
>
> If your install is re-establishing connections and not closing the old
> ones, then there is either something wrong with your setup or a bug.
> Because there are not a large number of people with the same complaint,
> I would lean more towards problems in your setup.  I won't rule out the
> possibility that there's a bug, because we've had a lot of them.
>
> One thing to try immediately is upgrading to 4.10.2 ... there have been
> two bugfix releases since the version you're running came out, with 16
> bug issues closed.  None of those issues sounds like what you're running
> into, but sometimes when mistakes are noticed in the code, fixing them
> can make other seemingly unrelated problems go away.  Upgrading to a
> bugfix release on the same minor version should be a drop-in replacement
> with no configuration changes necessary.
>
> http://lucene.apache.org/solr/4_10_2/changes/Changes.html
>
> Beyond that, we need more information.  Are there ERROR or WARN messages
> in your Solr log and/or your SolrJ client log that don't come from bad
> queries?  If there are, it may indicate some kind of problem, especially
> if they relate to the zk client timeout.  Problems like that can be
> caused by general performance issues, including garbage collection pauses.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Depending on what is found in your log, other questions about your setup
> may need answsering.
>
> Thanks,
> Shawn
>
>

Re: CloudSolrServer, concurrency and too many connections

Posted by Greg Solovyev <gr...@zimbra.com>.
This was a user error. My code was re-instantiating CloudSolrServer for each request and never calling CloudSolrServer::shutdown(). 

Thanks,
Greg

----- Original Message -----
From: "Greg Solovyev" <gr...@zimbra.com>
To: solr-user@lucene.apache.org
Sent: Wednesday, December 10, 2014 11:08:10 AM
Subject: Re: CloudSolrServer, concurrency and too many connections

I am seeing this problem with Java 1.8.0_25-b17 on Ubuntu 14.04.1 LTS ZK 3.4.6, Solr 4.10.2

Thanks,
Greg

----- Original Message -----
From: "JoeSmith" <fi...@gmail.com>
To: "solr-user" <so...@lucene.apache.org>
Sent: Monday, December 8, 2014 6:19:08 PM
Subject: Re: CloudSolrServer, concurrency and too many connections

Thanks, Shawn.  I updated to 7u72 and was not able to reproduce the
problem. That was good.  But just to be sure about this, I backed back down
to 7u55 and again was not able to reproduce.  So at least for now, this has
gone away even if the reason is inconclusive.


On Mon, Dec 8, 2014 at 7:37 AM, JoeSmith <fi...@gmail.com> wrote:

> We will need to update to 7u52, we are using 7u55.  On the client side,
> this happens with zookeeper 3.4.6 and 4.10.2 solrj.  And we will need to
> update both on the server side.   What kind of config/setup information
> would you need to see if we do still have an issue after these updates?
>
> On Mon, Dec 8, 2014 at 12:40 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 12/7/2014 9:11 PM, JoeSmith wrote:
>> > i've upgraded to 4.10.2 on the client-side.  Still seeing this
>> connection
>> > problem when connecting to the Zookeeper port.  If I connect directly to
>> > SolrServer, the connections do not increase.  But when connecting to
>> > Zookeeper, the connections increase up to 60 and then start to fail.  I
>> > understand Zookeeper is configured to fail after 60 connections to
>> prevent
>> > a DOS attack, but I dont see why we keep adding new connections (up to
>> > 60).  Does the client-side Zookeeper code also use HttpClient
>> > ConnectionPooling for its Connection Pool?  Below is the Exception that
>> > shows up in the log file when this happens.  When we execute queries we
>> are
>> > using the _route_ parameter, could this explain anything?
>>
>> The docs say that Zookeeper uses NIO communication directly by default,
>> so there's no layer like HttpClient.  I don't think it uses pooling ...
>> it does everything over a single TCP connection that doesn't normally
>> disconnect until the program exits.
>>
>> Basically, the Zookeeper authors built their own networking layer that
>> uses TCP directly.  You have the option of using Netty instead:
>>
>>
>> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Communication+using+the+Netty+framework
>>
>> Are you running version 3.4.6 for your zookeeper servers?  That's the
>> version of ZK client code you'll find in Solr 4.10.x, and the
>> recommended version for both the server and your SolrJ program.
>>
>> The most likely reasons for the connection problems you are seeing are:
>>
>> 1) A bug in the networking layer of your JVM.
>> 1a) The latest Oracle Java 7 (currently 7u72) is highly recommended.
>> 2) A bug or misconfig in the OS TCP stack, or possibly its firewall.
>> 3) A bug or misconfig in zookeeper.
>>
>> I can't rule out the fourth possibility, but so far I think it's unlikely:
>>
>> 4) A bug in SolrJ that has not yet been reported or fixed.
>>
>> Thanks,
>> Shawn
>>
>>
>

Re: CloudSolrServer, concurrency and too many connections

Posted by Greg Solovyev <gr...@zimbra.com>.
I am seeing this problem with Java 1.8.0_25-b17 on Ubuntu 14.04.1 LTS ZK 3.4.6, Solr 4.10.2

Thanks,
Greg

----- Original Message -----
From: "JoeSmith" <fi...@gmail.com>
To: "solr-user" <so...@lucene.apache.org>
Sent: Monday, December 8, 2014 6:19:08 PM
Subject: Re: CloudSolrServer, concurrency and too many connections

Thanks, Shawn.  I updated to 7u72 and was not able to reproduce the
problem. That was good.  But just to be sure about this, I backed back down
to 7u55 and again was not able to reproduce.  So at least for now, this has
gone away even if the reason is inconclusive.


On Mon, Dec 8, 2014 at 7:37 AM, JoeSmith <fi...@gmail.com> wrote:

> We will need to update to 7u52, we are using 7u55.  On the client side,
> this happens with zookeeper 3.4.6 and 4.10.2 solrj.  And we will need to
> update both on the server side.   What kind of config/setup information
> would you need to see if we do still have an issue after these updates?
>
> On Mon, Dec 8, 2014 at 12:40 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 12/7/2014 9:11 PM, JoeSmith wrote:
>> > i've upgraded to 4.10.2 on the client-side.  Still seeing this
>> connection
>> > problem when connecting to the Zookeeper port.  If I connect directly to
>> > SolrServer, the connections do not increase.  But when connecting to
>> > Zookeeper, the connections increase up to 60 and then start to fail.  I
>> > understand Zookeeper is configured to fail after 60 connections to
>> prevent
>> > a DOS attack, but I dont see why we keep adding new connections (up to
>> > 60).  Does the client-side Zookeeper code also use HttpClient
>> > ConnectionPooling for its Connection Pool?  Below is the Exception that
>> > shows up in the log file when this happens.  When we execute queries we
>> are
>> > using the _route_ parameter, could this explain anything?
>>
>> The docs say that Zookeeper uses NIO communication directly by default,
>> so there's no layer like HttpClient.  I don't think it uses pooling ...
>> it does everything over a single TCP connection that doesn't normally
>> disconnect until the program exits.
>>
>> Basically, the Zookeeper authors built their own networking layer that
>> uses TCP directly.  You have the option of using Netty instead:
>>
>>
>> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Communication+using+the+Netty+framework
>>
>> Are you running version 3.4.6 for your zookeeper servers?  That's the
>> version of ZK client code you'll find in Solr 4.10.x, and the
>> recommended version for both the server and your SolrJ program.
>>
>> The most likely reasons for the connection problems you are seeing are:
>>
>> 1) A bug in the networking layer of your JVM.
>> 1a) The latest Oracle Java 7 (currently 7u72) is highly recommended.
>> 2) A bug or misconfig in the OS TCP stack, or possibly its firewall.
>> 3) A bug or misconfig in zookeeper.
>>
>> I can't rule out the fourth possibility, but so far I think it's unlikely:
>>
>> 4) A bug in SolrJ that has not yet been reported or fixed.
>>
>> Thanks,
>> Shawn
>>
>>
>

Re: CloudSolrServer, concurrency and too many connections

Posted by JoeSmith <fi...@gmail.com>.
Thanks, Shawn.  I updated to 7u72 and was not able to reproduce the
problem. That was good.  But just to be sure about this, I backed back down
to 7u55 and again was not able to reproduce.  So at least for now, this has
gone away even if the reason is inconclusive.


On Mon, Dec 8, 2014 at 7:37 AM, JoeSmith <fi...@gmail.com> wrote:

> We will need to update to 7u52, we are using 7u55.  On the client side,
> this happens with zookeeper 3.4.6 and 4.10.2 solrj.  And we will need to
> update both on the server side.   What kind of config/setup information
> would you need to see if we do still have an issue after these updates?
>
> On Mon, Dec 8, 2014 at 12:40 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 12/7/2014 9:11 PM, JoeSmith wrote:
>> > i've upgraded to 4.10.2 on the client-side.  Still seeing this
>> connection
>> > problem when connecting to the Zookeeper port.  If I connect directly to
>> > SolrServer, the connections do not increase.  But when connecting to
>> > Zookeeper, the connections increase up to 60 and then start to fail.  I
>> > understand Zookeeper is configured to fail after 60 connections to
>> prevent
>> > a DOS attack, but I dont see why we keep adding new connections (up to
>> > 60).  Does the client-side Zookeeper code also use HttpClient
>> > ConnectionPooling for its Connection Pool?  Below is the Exception that
>> > shows up in the log file when this happens.  When we execute queries we
>> are
>> > using the _route_ parameter, could this explain anything?
>>
>> The docs say that Zookeeper uses NIO communication directly by default,
>> so there's no layer like HttpClient.  I don't think it uses pooling ...
>> it does everything over a single TCP connection that doesn't normally
>> disconnect until the program exits.
>>
>> Basically, the Zookeeper authors built their own networking layer that
>> uses TCP directly.  You have the option of using Netty instead:
>>
>>
>> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Communication+using+the+Netty+framework
>>
>> Are you running version 3.4.6 for your zookeeper servers?  That's the
>> version of ZK client code you'll find in Solr 4.10.x, and the
>> recommended version for both the server and your SolrJ program.
>>
>> The most likely reasons for the connection problems you are seeing are:
>>
>> 1) A bug in the networking layer of your JVM.
>> 1a) The latest Oracle Java 7 (currently 7u72) is highly recommended.
>> 2) A bug or misconfig in the OS TCP stack, or possibly its firewall.
>> 3) A bug or misconfig in zookeeper.
>>
>> I can't rule out the fourth possibility, but so far I think it's unlikely:
>>
>> 4) A bug in SolrJ that has not yet been reported or fixed.
>>
>> Thanks,
>> Shawn
>>
>>
>

Re: CloudSolrServer, concurrency and too many connections

Posted by JoeSmith <fi...@gmail.com>.
We will need to update to 7u52, we are using 7u55.  On the client side,
this happens with zookeeper 3.4.6 and 4.10.2 solrj.  And we will need to
update both on the server side.   What kind of config/setup information
would you need to see if we do still have an issue after these updates?

On Mon, Dec 8, 2014 at 12:40 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 12/7/2014 9:11 PM, JoeSmith wrote:
> > i've upgraded to 4.10.2 on the client-side.  Still seeing this connection
> > problem when connecting to the Zookeeper port.  If I connect directly to
> > SolrServer, the connections do not increase.  But when connecting to
> > Zookeeper, the connections increase up to 60 and then start to fail.  I
> > understand Zookeeper is configured to fail after 60 connections to
> prevent
> > a DOS attack, but I dont see why we keep adding new connections (up to
> > 60).  Does the client-side Zookeeper code also use HttpClient
> > ConnectionPooling for its Connection Pool?  Below is the Exception that
> > shows up in the log file when this happens.  When we execute queries we
> are
> > using the _route_ parameter, could this explain anything?
>
> The docs say that Zookeeper uses NIO communication directly by default,
> so there's no layer like HttpClient.  I don't think it uses pooling ...
> it does everything over a single TCP connection that doesn't normally
> disconnect until the program exits.
>
> Basically, the Zookeeper authors built their own networking layer that
> uses TCP directly.  You have the option of using Netty instead:
>
>
> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Communication+using+the+Netty+framework
>
> Are you running version 3.4.6 for your zookeeper servers?  That's the
> version of ZK client code you'll find in Solr 4.10.x, and the
> recommended version for both the server and your SolrJ program.
>
> The most likely reasons for the connection problems you are seeing are:
>
> 1) A bug in the networking layer of your JVM.
> 1a) The latest Oracle Java 7 (currently 7u72) is highly recommended.
> 2) A bug or misconfig in the OS TCP stack, or possibly its firewall.
> 3) A bug or misconfig in zookeeper.
>
> I can't rule out the fourth possibility, but so far I think it's unlikely:
>
> 4) A bug in SolrJ that has not yet been reported or fixed.
>
> Thanks,
> Shawn
>
>

Re: CloudSolrServer, concurrency and too many connections

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/7/2014 9:11 PM, JoeSmith wrote:
> i've upgraded to 4.10.2 on the client-side.  Still seeing this connection
> problem when connecting to the Zookeeper port.  If I connect directly to
> SolrServer, the connections do not increase.  But when connecting to
> Zookeeper, the connections increase up to 60 and then start to fail.  I
> understand Zookeeper is configured to fail after 60 connections to prevent
> a DOS attack, but I dont see why we keep adding new connections (up to
> 60).  Does the client-side Zookeeper code also use HttpClient
> ConnectionPooling for its Connection Pool?  Below is the Exception that
> shows up in the log file when this happens.  When we execute queries we are
> using the _route_ parameter, could this explain anything?

The docs say that Zookeeper uses NIO communication directly by default,
so there's no layer like HttpClient.  I don't think it uses pooling ...
it does everything over a single TCP connection that doesn't normally
disconnect until the program exits.

Basically, the Zookeeper authors built their own networking layer that
uses TCP directly.  You have the option of using Netty instead:

http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#Communication+using+the+Netty+framework

Are you running version 3.4.6 for your zookeeper servers?  That's the
version of ZK client code you'll find in Solr 4.10.x, and the
recommended version for both the server and your SolrJ program.

The most likely reasons for the connection problems you are seeing are:

1) A bug in the networking layer of your JVM.
1a) The latest Oracle Java 7 (currently 7u72) is highly recommended.
2) A bug or misconfig in the OS TCP stack, or possibly its firewall.
3) A bug or misconfig in zookeeper.

I can't rule out the fourth possibility, but so far I think it's unlikely:

4) A bug in SolrJ that has not yet been reported or fixed.

Thanks,
Shawn


Re: CloudSolrServer, concurrency and too many connections

Posted by JoeSmith <fi...@gmail.com>.
i've upgraded to 4.10.2 on the client-side.  Still seeing this connection
problem when connecting to the Zookeeper port.  If I connect directly to
SolrServer, the connections do not increase.  But when connecting to
Zookeeper, the connections increase up to 60 and then start to fail.  I
understand Zookeeper is configured to fail after 60 connections to prevent
a DOS attack, but I dont see why we keep adding new connections (up to
60).  Does the client-side Zookeeper code also use HttpClient
ConnectionPooling for its Connection Pool?  Below is the Exception that
shows up in the log file when this happens.  When we execute queries we are
using the _route_ parameter, could this explain anything?

o.a.zookeeper.ClientCnxn - Session 0x0 for server
aweqca3utmtc10.cloud.xxxx.com/10.22.10.107:9983, unexpected error, closing
socket connection and attempting reconnect

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_55]

        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
~[na:1.7.0_55]

        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
~[na:1.7.0_55]

        at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_55]

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
~[na:1.7.0_55]

        at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
~[zookeeper-3.4.6.jar:3.4.6-1569965]

        at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
~[zookeeper-3.4.6.jar:3.4.6-1569965]

        at
org.apache.zookeeper.Clie4.ntCnxn$SendThread.run(ClientCnxn.java:1081)
~[zookeeper-3.4.6.jar:3.4.6-1569965]


Will try to get the server code upgraded to 4.10.2.



On Sat, Dec 6, 2014 at 3:52 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 12/6/2014 12:09 PM, JoeSmith wrote:
> > We are currently using CloudSolrServer, but it looks like this class is
> not
> > thread-safe (setDefaultCollection). Should this instance be initialized
> > once (at startup) and then re-used (in all threads) until shutdown when
> the
> > process terminates?  Or should it re-instantiated for each request?
> >
> > Currently, we are trying to use CloudSolrServer as a singleton, but it
> > looks like the connections to the host are not being closed and under
> load
> > we start getting failures.  and In the Zookeeper logs we see this error:
> >
> >> WARN  - 2014-12-04 10:09:14.364;
> >> org.apache.zookeeper.server.NIOServerCnxnFactory; Too many connections
> from
> >> /11.22.33.44 - max is 60
> >
> > netstat (on the Zookeeper host) shows that the connections are not being
> > closed. What is the 'correct' way to fix this?   Apologies if i have
> missed
> > any documentation that explains, pointers would be helpful.
>
> All SolrServer implementations in SolrJ, including CloudSolrServer, are
> supposed to be threadsafe.  If it turns out they're not actually
> threadsafe, then we treat that as a bug.  The discussion to determine
> that it's a bug takes place on this mailing list, and once we determine
> that, the next step is to file an issue in Jira.
>
> The general way to use SolrJ is to initialize the server instance at the
> beginning and re-use it for all client communication to Solr.  With
> CloudSolrServer, you normally only need a single server instance to talk
> to the entire cloud, because you can set the "collection" parameter on
> each request to indicate which collection to work on.  If you only have
> a handful of collections, you might want to use multiple instances and
> use setDefaultCollection  to specify the collection.  With
> HttpSolrServer, an instance is required for each core, because the core
> name is in the initialization URL.
>
> I've not looked at the code, but I can't imagine that the client ever
> needs to make more than one connection to each server in the zookeeper
> ensemble.  Here's a list of the open connections on one of my zookeeper
> servers for my SolrCloud 4.2.1 install:
>
> java    21800 root   21u  IPv6            2836983      0t0      TCP
> 10.8.0.151:50178->10.8.0.152:2888 (ESTABLISHED)
> java    21800 root   22u  IPv6            2661097      0t0      TCP
> 10.8.0.151:3888->10.8.0.152:34116 (ESTABLISHED)
> java    21800 root   26u  IPv6           28065088      0t0      TCP
> 10.8.0.151:2181->10.8.0.141:52583 (ESTABLISHED)
> java    21800 root   27u  IPv6           23967470      0t0      TCP
> 10.8.0.151:2181->10.8.0.152:49436 (ESTABLISHED)
> java    21800 root   28r  IPv6           23969636      0t0      TCP
> 10.8.0.151:2181->10.8.0.151:57290 (ESTABLISHED)
> java    21800 root   29r  IPv6           23969951      0t0      TCP
> 10.8.0.151:3888->10.8.0.153:54721 (ESTABLISHED)
>
> The 151, 152, and 153 addresses are my ZK servers, with Solr also
> running on 151 and 152.  The 141 address is the SolrJ client.  The main
> ZK port is 2181, with ports 2888 and 3888 used for internal zookeeper
> communication.  I actually would have expected to see two client
> connections from .141 ... one for the indexer program and one for the
> webapp.  They haven't reported a Solr problem to me, so I guess it must
> be OK.
>
> If your install is re-establishing connections and not closing the old
> ones, then there is either something wrong with your setup or a bug.
> Because there are not a large number of people with the same complaint,
> I would lean more towards problems in your setup.  I won't rule out the
> possibility that there's a bug, because we've had a lot of them.
>
> One thing to try immediately is upgrading to 4.10.2 ... there have been
> two bugfix releases since the version you're running came out, with 16
> bug issues closed.  None of those issues sounds like what you're running
> into, but sometimes when mistakes are noticed in the code, fixing them
> can make other seemingly unrelated problems go away.  Upgrading to a
> bugfix release on the same minor version should be a drop-in replacement
> with no configuration changes necessary.
>
> http://lucene.apache.org/solr/4_10_2/changes/Changes.html
>
> Beyond that, we need more information.  Are there ERROR or WARN messages
> in your Solr log and/or your SolrJ client log that don't come from bad
> queries?  If there are, it may indicate some kind of problem, especially
> if they relate to the zk client timeout.  Problems like that can be
> caused by general performance issues, including garbage collection pauses.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> Depending on what is found in your log, other questions about your setup
> may need answsering.
>
> Thanks,
> Shawn
>
>

Re: CloudSolrServer, concurrency and too many connections

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/6/2014 12:09 PM, JoeSmith wrote:
> We are currently using CloudSolrServer, but it looks like this class is not
> thread-safe (setDefaultCollection). Should this instance be initialized
> once (at startup) and then re-used (in all threads) until shutdown when the
> process terminates?  Or should it re-instantiated for each request?
> 
> Currently, we are trying to use CloudSolrServer as a singleton, but it
> looks like the connections to the host are not being closed and under load
> we start getting failures.  and In the Zookeeper logs we see this error:
> 
>> WARN  - 2014-12-04 10:09:14.364;
>> org.apache.zookeeper.server.NIOServerCnxnFactory; Too many connections from
>> /11.22.33.44 - max is 60
> 
> netstat (on the Zookeeper host) shows that the connections are not being
> closed. What is the 'correct' way to fix this?   Apologies if i have missed
> any documentation that explains, pointers would be helpful.

All SolrServer implementations in SolrJ, including CloudSolrServer, are
supposed to be threadsafe.  If it turns out they're not actually
threadsafe, then we treat that as a bug.  The discussion to determine
that it's a bug takes place on this mailing list, and once we determine
that, the next step is to file an issue in Jira.

The general way to use SolrJ is to initialize the server instance at the
beginning and re-use it for all client communication to Solr.  With
CloudSolrServer, you normally only need a single server instance to talk
to the entire cloud, because you can set the "collection" parameter on
each request to indicate which collection to work on.  If you only have
a handful of collections, you might want to use multiple instances and
use setDefaultCollection  to specify the collection.  With
HttpSolrServer, an instance is required for each core, because the core
name is in the initialization URL.

I've not looked at the code, but I can't imagine that the client ever
needs to make more than one connection to each server in the zookeeper
ensemble.  Here's a list of the open connections on one of my zookeeper
servers for my SolrCloud 4.2.1 install:

java    21800 root   21u  IPv6            2836983      0t0      TCP
10.8.0.151:50178->10.8.0.152:2888 (ESTABLISHED)
java    21800 root   22u  IPv6            2661097      0t0      TCP
10.8.0.151:3888->10.8.0.152:34116 (ESTABLISHED)
java    21800 root   26u  IPv6           28065088      0t0      TCP
10.8.0.151:2181->10.8.0.141:52583 (ESTABLISHED)
java    21800 root   27u  IPv6           23967470      0t0      TCP
10.8.0.151:2181->10.8.0.152:49436 (ESTABLISHED)
java    21800 root   28r  IPv6           23969636      0t0      TCP
10.8.0.151:2181->10.8.0.151:57290 (ESTABLISHED)
java    21800 root   29r  IPv6           23969951      0t0      TCP
10.8.0.151:3888->10.8.0.153:54721 (ESTABLISHED)

The 151, 152, and 153 addresses are my ZK servers, with Solr also
running on 151 and 152.  The 141 address is the SolrJ client.  The main
ZK port is 2181, with ports 2888 and 3888 used for internal zookeeper
communication.  I actually would have expected to see two client
connections from .141 ... one for the indexer program and one for the
webapp.  They haven't reported a Solr problem to me, so I guess it must
be OK.

If your install is re-establishing connections and not closing the old
ones, then there is either something wrong with your setup or a bug.
Because there are not a large number of people with the same complaint,
I would lean more towards problems in your setup.  I won't rule out the
possibility that there's a bug, because we've had a lot of them.

One thing to try immediately is upgrading to 4.10.2 ... there have been
two bugfix releases since the version you're running came out, with 16
bug issues closed.  None of those issues sounds like what you're running
into, but sometimes when mistakes are noticed in the code, fixing them
can make other seemingly unrelated problems go away.  Upgrading to a
bugfix release on the same minor version should be a drop-in replacement
with no configuration changes necessary.

http://lucene.apache.org/solr/4_10_2/changes/Changes.html

Beyond that, we need more information.  Are there ERROR or WARN messages
in your Solr log and/or your SolrJ client log that don't come from bad
queries?  If there are, it may indicate some kind of problem, especially
if they relate to the zk client timeout.  Problems like that can be
caused by general performance issues, including garbage collection pauses.

http://wiki.apache.org/solr/SolrPerformanceProblems

Depending on what is found in your log, other questions about your setup
may need answsering.

Thanks,
Shawn