You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2011/12/16 08:44:37 UTC

Exception using SolrJ

I am seeing exceptions from some code I have written using SolrJ.I have 
placed it into a pastebin:


http://pastebin.com/XnB83Jay


I am creating a MultiThreadedHttpConnectionManager object, which I use 
to create an HttpClient, and that is used by all my 
CommonsHttpSolrServer objects, of which there are 56 total.That's two 
index chains, seven shards per chain, two cores per shard (live and 
build).  This accounts for half of the objects - for each of those, 
there is a matching machine-level server object for CoreAdmin.  In most 
circumstances only 14 of those are in active use, but when a full index 
rebuild is needed, almost all of them are used.  I have some plans that 
will reduce the number of machine-level server objects from 28 to 4, 
which is the actual number of machines I'm running.


Static._mgr = new MultiThreadedHttpConnectionManager();

_mgrParams = Static._mgr.getParams();

_mgrParams.setTcpNoDelay(true);

_mgrParams.setConnectionTimeout(30000);

_mgrParams.setStaleCheckingEnabled(true);

Static._mgr.setParams(_mgrParams);

_mgrParams = null;

Static._client = new HttpClient(Static._mgr);


Here's the code that creates the server objects.  The setMaxRetries is a 
recent change, the problem was happening before I added it, though it 
does seem to happen less often now:


   _solrServer = new CommonsHttpSolrServer(serverBaseUrl, Static._client);
   _solrCore = new CommonsHttpSolrServer(coreBaseUrl, Static._client);
   _solrServer.setMaxRetries(1);
   _solrCore.setMaxRetries(1);

The exception linked above will typically show up within a second or two 
of the start of an update cycle, well before the 30 second connection 
timeout I've specified in the parameters.What it's doing when this 
happens is part of a delete process, specifically it is executing a 
query to count the number of matching items.If any are found, then it 
will follow this with the actual deleteByQuery.


The Solr servers are running a slightly modified 3.5.0, with patches 
from SOLR-2906 and SOLR-1972 applied.I am not actually using the LFU 
cache implemented in SOLR-2906.The same problem happened when I was 
using version 3.4.0 with only SOLR-1972 applied.  The SolrJ jar comes 
from the same build as the custom Solr I'm running.


It looks like something is resetting the TCP connection, but I can't 
tell what, or where the problem is.Solr works fine as far as I can 
tell.Can anyone help?Have I done something wrong in creating my 
HttpClient or my server objects?


Thanks,

Shawn

Re: Exception using SolrJ

Posted by Shawn Heisey <so...@elyograg.org>.

On 12/21/2011 9:43 AM, Chantal Ackermann wrote:
> Hi Shawn,
>
> maybe the requests that fail have a certain pattern - for example that
> they are longer than all the others.

The query for the exception I sent is shown in the pastebin.  Here is 
the query and for reference, the pastebin URL:

did:(286861384 OR 286861312 OR 286861313 OR 284220972)
http://pastebin.com/XnB83Jay

This is a typical query for the failures.  This field (did) is a tlong 
with a precisionStep of 16.  There are about 11 million documents in the 
index referenced, total size about 20GB.  Most often it has been a 
search query like this that has failed, though sometimes it is the 
actual deletebyQuery that immediately follows this, or it is an attempt 
to add documents which comes after that.

Something to add: Solr's log, running at INFO, does not show these 
requests that fail, and does not log an exception.  These requests do 
not pass through a load balancer.  I do use haproxy on port 8983 for 
queries made by our website, but the SolrJ application talks to Solr 
directly on port 8981.  I can't say whether the request shows up in the 
jetty log, because everything is using POST.

Thanks,
Shawn

Re: Exception using SolrJ

Posted by Chantal Ackermann <ch...@btelligent.de>.

Hi Shawn,

maybe the requests that fail have a certain pattern - for example that
they are longer than all the others.

Chantal

Re: Exception using SolrJ

Posted by Shawn Heisey <so...@elyograg.org>.

On 12/21/2011 1:10 AM, Shawn Heisey wrote:
> On 12/20/2011 10:33 AM, Otis Gospodnetic wrote:
>> Shawn,
>>
>> Give httping a try: http://www.vanheusden.com/httping/
>>
>> It may reveal something about connection being dropped periodically.
>> Maybe even a plain ping would show some dropped packets if it's a 
>> general network and not a Solr-specific issue.
>
> The connections here are gigabit ethernet on the same VLAN, and 
> sometimes it happens to cores on the same box that's running the SolrJ 
> code, which if all things are sane, never actually goes out the NIC.  
> I see no errors on the interface.
>
> bond0     Link encap:Ethernet  HWaddr 00:1C:23:DC:81:53
>           inet addr:10.100.0.240  Bcast:10.100.1.255  Mask:255.255.254.0
>           inet6 addr: fe80::21c:23ff:fedc:8153/64 Scope:Link
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>           RX packets:453134140 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:297893403 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:446857564768 (416.1 GiB)  TX bytes:191134876472 
> (178.0 GiB)
>
> BONDING_OPTS="mode=1 miimon=100 updelay=200 downdelay=200 primary=eth0"

I realized after sending the ifconfig that errors would probably not 
show on the bonded interface.  Stats are also clear on the slaves:

eth0      Link encap:Ethernet  HWaddr 00:1C:23:DC:81:53
           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
           RX packets:454373740 errors:0 dropped:0 overruns:0 frame:0
           TX packets:301194576 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:449062687599 (418.2 GiB)  TX bytes:193031706549 
(179.7 GiB)
           Interrupt:16 Memory:f8000000-f8012800

eth1      Link encap:Ethernet  HWaddr 00:1C:23:DC:81:53
           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
           RX packets:2261000 errors:0 dropped:0 overruns:0 frame:0
           TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:194331296 (185.3 MiB)  TX bytes:398 (398.0 b)
           Interrupt:16 Memory:f4000000-f4012800

The switch interfaces are also very clean, as seen below.  They do show 
some output drops, but the percentage of packets is extremely low.

GigabitEthernet0/13 is up, line protocol is up (connected)
   Hardware is Gigabit Ethernet, address is 0024.c3cc.ad0d (bia 
0024.c3cc.ad0d)
   Description: bigindy0 nic1
   MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
      reliability 255/255, txload 1/255, rxload 1/255
   Encapsulation ARPA, loopback not set
   Keepalive set (10 sec)
   Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
   input flow-control is on, output flow-control is unsupported
   ARP type: ARPA, ARP Timeout 04:00:00
   Last input 1y45w, output 00:00:01, output hang never
   Last clearing of "show interface" counters never
   Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 74219
   Queueing strategy: fifo
   Output queue: 0/40 (size/max)
   5 minute input rate 378000 bits/sec, 81 packets/sec
   5 minute output rate 1863000 bits/sec, 210 packets/sec
      15993961043 packets input, 18181095872276 bytes, 0 no buffer
      Received 31769202 broadcasts (20225268 multicasts)
      0 runts, 0 giants, 0 throttles
      0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
      0 watchdog, 20225268 multicast, 0 pause input
      0 input packets with dribble condition detected
      21413035341 packets output, 21796346722157 bytes, 0 underruns
      0 output errors, 0 collisions, 3 interface resets
      0 babbles, 0 late collision, 0 deferred
      0 lost carrier, 0 no carrier, 0 PAUSE output
      0 output buffer failures, 0 output buffers swapped out

switch uptime 2 years, 27 weeks, 4 days, 21 hours, 20 minutes
host uptime 33 days, 16:21

Even if there were the occasional packet being dropped by the switch, 
the TCP stack in Linux should immediately retry that packet and 
everything would be fine, though delayed slightly.  The number of output 
drops here is 0.00035 percent of the total packets output.  One of the 
other machines (in a different switch) shows ten times as many 
switchport drops, but even that is 0.0037 percent of the packets on that 
port.  I have cleared the counters on on all the switches, and after 
twenty minutes and 400000 packets output, it's running completely 
clean.  I will keep an eye on those stats and wait for the next 
exception to see if there is a spike in output drops when the problem 
happens.  I don't expect that to be the problem, though.  If it is a 
networking problem, it is most likely to be in the CentOS 6 kernel.  I'd 
like for it to be that simple, but I think the possibility there is small.

I think it's more likely that it's a software problem, and that the 
error was probably mine, but I need help in tracking it down.

Thanks,
Shawn

Re: Exception using SolrJ

Posted by Shawn Heisey <so...@elyograg.org>.

On 12/20/2011 10:33 AM, Otis Gospodnetic wrote:
> Shawn,
>
> Give httping a try: http://www.vanheusden.com/httping/
>
> It may reveal something about connection being dropped periodically.
> Maybe even a plain ping would show some dropped packets if it's a general network and not a Solr-specific issue.

The connections here are gigabit ethernet on the same VLAN, and 
sometimes it happens to cores on the same box that's running the SolrJ 
code, which if all things are sane, never actually goes out the NIC.  I 
see no errors on the interface.

bond0     Link encap:Ethernet  HWaddr 00:1C:23:DC:81:53
           inet addr:10.100.0.240  Bcast:10.100.1.255  Mask:255.255.254.0
           inet6 addr: fe80::21c:23ff:fedc:8153/64 Scope:Link
           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
           RX packets:453134140 errors:0 dropped:0 overruns:0 frame:0
           TX packets:297893403 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:0
           RX bytes:446857564768 (416.1 GiB)  TX bytes:191134876472 
(178.0 GiB)

BONDING_OPTS="mode=1 miimon=100 updelay=200 downdelay=200 primary=eth0"

Thanks,
Shawn

Re: Exception using SolrJ

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Shawn,

Give httping a try: http://www.vanheusden.com/httping/

It may reveal something about connection being dropped periodically.
Maybe even a plain ping would show some dropped packets if it's a general network and not a Solr-specific issue.

Otis
----
Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html



>________________________________
> From: Shawn Heisey <so...@elyograg.org>
>To: solr-user@lucene.apache.org 
>Sent: Tuesday, December 20, 2011 11:32 AM
>Subject: Re: Exception using SolrJ
> 
>On 12/20/2011 2:57 AM, Chantal Ackermann wrote:
>> Hi Shawn,
>> 
>> the exception indicates that the connection was lost. I'm sure you
>> figured that out for yourself.
>> 
>> Questions:
>> - is that specific server instance really running? That is, can you
>> reach it via browser?
>> - If yes: how is your connection pool configured and how do you
>> initialize it? More specifically: from what I know, CommonsHttp is
>> already multi threaded so in your initializing code should not be using
>> multiple threads to access it. Not completely sure about that in
>> combination with SolrJ, though. I just had that issue when using
>> CommonsHttp directly in the wrong way.
>> 
>> I am using SolrJ with CommonsHttp pool for a some time now, and it all
>> works very reliably. I've encountered those Connection reset exceptions
>> also but they were always caused by the server not being reachable.
>
>Yes, I did figure out it's a disconnected socket, but I can find no reason for it.
>
>The server is likely reachable via a browser, but as it usually happens when I am sleeping or otherwise occupied, I am not able to check.  The updates run once a minute, and everything works fine on the next update.  If the server were down, I would get an email once a minute until the problem were solved, but it happens only once, then it's fine for a good long while.  I am the only person who works on these servers, so I know that nothing is going on when the exception occurs.  These Solr servers are not overloaded - the long term average requests per second are about 0.3, very low.
>
>The main reason I am creating my own HttpClient and connection manager is so that I can be sure of the connection timeout applied.  If something should go wrong and it takes more than 30 seconds to establish the HTTP connection, I don't want my code to silently keep on waiting for it, I want to know about it.  The other options are likely the default setting, but they are the settings I want, so I am just ensuring they are set, against a potential future change in the default.
>
>The problem is not caused by the configured 30 second connection timeout.  As I said in the first message, this happens very quickly, usually within 2 seconds of the start of the update, often within the same second.
>
>Thanks,
>Shawn
>
>
>
>

Re: Exception using SolrJ

Posted by Shawn Heisey <so...@elyograg.org>.

On 12/20/2011 2:57 AM, Chantal Ackermann wrote:
> Hi Shawn,
>
> the exception indicates that the connection was lost. I'm sure you
> figured that out for yourself.
>
> Questions:
> - is that specific server instance really running? That is, can you
> reach it via browser?
> - If yes: how is your connection pool configured and how do you
> initialize it? More specifically: from what I know, CommonsHttp is
> already multi threaded so in your initializing code should not be using
> multiple threads to access it. Not completely sure about that in
> combination with SolrJ, though. I just had that issue when using
> CommonsHttp directly in the wrong way.
>
> I am using SolrJ with CommonsHttp pool for a some time now, and it all
> works very reliably. I've encountered those Connection reset exceptions
> also but they were always caused by the server not being reachable.

Yes, I did figure out it's a disconnected socket, but I can find no 
reason for it.

The server is likely reachable via a browser, but as it usually happens 
when I am sleeping or otherwise occupied, I am not able to check.  The 
updates run once a minute, and everything works fine on the next 
update.  If the server were down, I would get an email once a minute 
until the problem were solved, but it happens only once, then it's fine 
for a good long while.  I am the only person who works on these servers, 
so I know that nothing is going on when the exception occurs.  These 
Solr servers are not overloaded - the long term average requests per 
second are about 0.3, very low.

The main reason I am creating my own HttpClient and connection manager 
is so that I can be sure of the connection timeout applied.  If 
something should go wrong and it takes more than 30 seconds to establish 
the HTTP connection, I don't want my code to silently keep on waiting 
for it, I want to know about it.  The other options are likely the 
default setting, but they are the settings I want, so I am just ensuring 
they are set, against a potential future change in the default.

The problem is not caused by the configured 30 second connection 
timeout.  As I said in the first message, this happens very quickly, 
usually within 2 seconds of the start of the update, often within the 
same second.

Thanks,
Shawn

Re: Exception using SolrJ

Posted by Chantal Ackermann <ch...@btelligent.de>.

Hi Shawn,

the exception indicates that the connection was lost. I'm sure you
figured that out for yourself.

Questions:
- is that specific server instance really running? That is, can you
reach it via browser?
- If yes: how is your connection pool configured and how do you
initialize it? More specifically: from what I know, CommonsHttp is
already multi threaded so in your initializing code should not be using
multiple threads to access it. Not completely sure about that in
combination with SolrJ, though. I just had that issue when using
CommonsHttp directly in the wrong way.

I am using SolrJ with CommonsHttp pool for a some time now, and it all
works very reliably. I've encountered those Connection reset exceptions
also but they were always caused by the server not being reachable.


Chantal



>From your pastebin:

Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
        at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:480)



On Tue, 2011-12-20 at 01:11 +0100, Shawn Heisey wrote:
> On 12/16/2011 12:44 AM, Shawn Heisey wrote:
> > I am seeing exceptions from some code I have written using SolrJ.I 
> > have placed it into a pastebin:
> >
> >
> > http://pastebin.com/XnB83Jay
> 
> No reply in three days, does nobody have any ideas for me?
> 
> Thanks,
> Shawn
>

Re: In-web search

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hi Remi,

That depends on how you've structured and indexed your documents (web pages?) with Solr.

If you've extracted the hostname into a 'hostname' field and indexed it, then you should be able to use syntax like:
  hostname:www.sematext.com

If you've extracted the domain name into a 'domain' field and indexed it, then you should be able to use syntax like:  domain:sematext.com

If you've flipped parts of the hostname or domain in the URL around then you could also search using:
  url:com.sematext*



etc.

Otis
----
Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html



>________________________________
> From: remi tassing <ta...@yahoo.com>
>To: solr-user@lucene.apache.org 
>Sent: Tuesday, December 20, 2011 8:29 AM
>Subject: In-web search
> 
>Hi,
>What is the query syntax for Solr to search within a specific site?
>For example in google you can search like this: "Solr site:apache.org"
>Remi
>
>

In-web search

Posted by remi tassing <ta...@yahoo.com>.

Hi,
What is the query syntax for Solr to search within a specific site?
For example in google you can search like this: "Solr site:apache.org"
Remi

Re: Exception using SolrJ

Posted by Shawn Heisey <so...@elyograg.org>.

On 12/16/2011 12:44 AM, Shawn Heisey wrote:
> I am seeing exceptions from some code I have written using SolrJ.I 
> have placed it into a pastebin:
>
>
> http://pastebin.com/XnB83Jay

No reply in three days, does nobody have any ideas for me?

Thanks,
Shawn