You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Scott Prentice <sp...@leximation.com> on 2018/01/29 20:13:10 UTC

SolrCloud installation troubles...

Using Solr 7.2.0 and Zookeeper 3.4.11

In an effort to move to a more robust Solr environment, I'm setting up a 
prototype system of 3 Solr servers and 3 Zookeeper servers. For now, 
this is all on one machine, but will eventually be 3 machines.

This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do 
the same setup on the company's network machine (a Red Hat 4.8.5-16 VM), 
I'm unable to create a collection. To keep things simple, I'm not using 
our custom schema yet, but just creating a collection through the Solr 
Admin UI using Collections > Add Collection, using the "_default" config 
set. On the Ubuntu system, I can create various collections .. 1 shard 
w/ 1 replication .. 2 shards w/ 3 replications .. 3 shards w/ 4 
replications .. all seem alive and well.

But when I do the same thing on the Red Hat system it fails. Through the 
UI, it'll first time out with this message ..

     Connection to Solr lost

Then after a refresh, the collection appears to have been partially 
created, but it's in the "Gone" state, and after some time, is deleted 
by an apparent cleanup process. If I try to create one through the 
command line ..

     ./bin/solr create -c test99 -n _default -s 2 -rf 2

I get this response ..

ERROR: Failed to create collection 'test99' due to: 
{10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8984/solr, 
10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8985/solr, 
10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: http://10.6.208.31:8983/solr}

I've seen other reports of errors like this but no solutions that seem 
to apply to my situation. Any thoughts?

Thanks!
...scott



Re: SolrCloud installation troubles...

Posted by Scott Prentice <sp...@leximation.com>.
Looks like 2888 and 2890 are not open. At least they are not reported 
with a netstat -plunt .. could be the problem.

Thanks, all!

...scott


On 1/29/18 1:10 PM, Davis, Daniel (NIH/NLM) [C] wrote:
> Trying 127.0.0.1 could help.   We kind of tend to think localhost is always 127.0.0.1, but I've seen localhost start to resolve to ::1, the IPv6 equivalent of 127.0.0.1.
>
> I guess some environments can be strict enough to restrict communication on localhost; seems hard to imagine, but it does happen.
>
> -----Original Message-----
> From: Scott Prentice [mailto:sp14@leximation.com]
> Sent: Monday, January 29, 2018 4:02 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud installation troubles...
>
>
> On 1/29/18 12:44 PM, Shawn Heisey wrote:
>> On 1/29/2018 1:13 PM, Scott Prentice wrote:
>>> But when I do the same thing on the Red Hat system it fails. Through
>>> the UI, it'll first time out with this message ..
>>>
>>>      Connection to Solr lost
>>>
>>> Then after a refresh, the collection appears to have been partially
>>> created, but it's in the "Gone" state, and after some time, is
>>> deleted by an apparent cleanup process. If I try to create one
>>> through the command line ..
>>>
>>>      ./bin/solr create -c test99 -n _default -s 2 -rf 2
>>>
>>> I get this response ..
>>>
>>> ERROR: Failed to create collection 'test99' due to:
>>> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerExcepti
>>> on:IOException occured when talking to server at:
>>> http://10.6.208.31:8984/solr,
>>> 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerExceptio
>>> n:IOException occured when talking to server at:
>>> http://10.6.208.31:8985/solr,
>>> 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerExceptio
>>> n:IOException occured when talking to server at:
>>> http://10.6.208.31:8983/solr}
>> This sounds like either network connectivity problems or possibly
>> issues caused by extreme garbage collection pauses that result in
>> timeouts.
>>
>> Thanks,
>> Shawn
>>
> Thanks, Shawn. I was wondering if there was something going on with IP redirection that was causing confusion. Any thoughts on how to debug?
> And, what do you mean by "extreme garbage collection pauses"? Is that Solr garbage collection or the OS itself? There's really nothing happening on this machine, it's purely for testing so there shouldn't be any extra load from other processes.
>
> Thanks!
> ...scott
>
>
>


RE: SolrCloud installation troubles...

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.
Trying 127.0.0.1 could help.   We kind of tend to think localhost is always 127.0.0.1, but I've seen localhost start to resolve to ::1, the IPv6 equivalent of 127.0.0.1.

I guess some environments can be strict enough to restrict communication on localhost; seems hard to imagine, but it does happen.

-----Original Message-----
From: Scott Prentice [mailto:sp14@leximation.com] 
Sent: Monday, January 29, 2018 4:02 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud installation troubles...


On 1/29/18 12:44 PM, Shawn Heisey wrote:
> On 1/29/2018 1:13 PM, Scott Prentice wrote:
>> But when I do the same thing on the Red Hat system it fails. Through 
>> the UI, it'll first time out with this message ..
>>
>>     Connection to Solr lost
>>
>> Then after a refresh, the collection appears to have been partially 
>> created, but it's in the "Gone" state, and after some time, is 
>> deleted by an apparent cleanup process. If I try to create one 
>> through the command line ..
>>
>>     ./bin/solr create -c test99 -n _default -s 2 -rf 2
>>
>> I get this response ..
>>
>> ERROR: Failed to create collection 'test99' due to: 
>> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerExcepti
>> on:IOException occured when talking to server at: 
>> http://10.6.208.31:8984/solr, 
>> 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerExceptio
>> n:IOException occured when talking to server at: 
>> http://10.6.208.31:8985/solr, 
>> 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerExceptio
>> n:IOException occured when talking to server at: 
>> http://10.6.208.31:8983/solr}
>
> This sounds like either network connectivity problems or possibly 
> issues caused by extreme garbage collection pauses that result in 
> timeouts.
>
> Thanks,
> Shawn
>
Thanks, Shawn. I was wondering if there was something going on with IP redirection that was causing confusion. Any thoughts on how to debug? 
And, what do you mean by "extreme garbage collection pauses"? Is that Solr garbage collection or the OS itself? There's really nothing happening on this machine, it's purely for testing so there shouldn't be any extra load from other processes.

Thanks!
...scott




Re: SolrCloud installation troubles...

Posted by Rick Leir <rl...@leirtech.com>.
SELinux? Number open File limits? Number of Process limits? 
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: SolrCloud installation troubles...

Posted by Scott Prentice <sp...@leximation.com>.
On 1/29/18 1:31 PM, Shawn Heisey wrote:
> On 1/29/2018 2:02 PM, Scott Prentice wrote:
>> Thanks, Shawn. I was wondering if there was something going on with 
>> IP redirection that was causing confusion. Any thoughts on how to 
>> debug? And, what do you mean by "extreme garbage collection pauses"? 
>> Is that Solr garbage collection or the OS itself? There's really 
>> nothing happening on this machine, it's purely for testing so there 
>> shouldn't be any extra load from other processes. 
>
> Garbage collection is one of the primary features of Java's memory 
> management.  It's not Solr or the OS.
>
> If the java heap is really enormous, you can end up with long pauses, 
> but I wouldn't expect them to be frequent unless the index is also 
> really huge.
>
> A very common issue that can cause even worse pause issues than a 
> large heap is a heap that's too small, but not quite small enough to 
> cause Java to completely run out of heap memory.  The default max heap 
> size in recent Solr versions is 512MB, which is very small.  A Java 
> program (which Solr is) can never use more heap memory than the 
> maximum it is configured with, even if the machine has more memory 
> available.
>
> This paragraph is included because you mentioned IP redirection: 
> Extreme care must be used when setting up SolrCloud on virtual 
> machines where accessing the VM has to go through any kind of IP 
> translation.  SolrCloud keeps track of how to reach each server in the 
> cloud and if it stores an untranslated address when you need the 
> translated address (or vice-versa), things are not going to work.  
> Generally speaking translated addresses are going to be problematic 
> for SolrCloud, and should not be used.
>
> Thanks,
> Shawn
>
Thanks for the clarification. Yes, we're just using the default heap 
size for Solr, but there's no index (yet) and nothing really going on, 
so I'd hope that garbage collection isn't the problem.

I'm putting my money on some IP translation issues (this is on a tightly 
controlled corporate network) or the fact that the 2888 and 2890 ports 
appear to not be open. I'll dig down the network issue path for now and 
see where that gets me.

Thanks,
...scott



Re: SolrCloud installation troubles...

Posted by Shawn Heisey <el...@elyograg.org>.
On 1/29/2018 2:02 PM, Scott Prentice wrote:
> Thanks, Shawn. I was wondering if there was something going on with IP 
> redirection that was causing confusion. Any thoughts on how to debug? 
> And, what do you mean by "extreme garbage collection pauses"? Is that 
> Solr garbage collection or the OS itself? There's really nothing 
> happening on this machine, it's purely for testing so there shouldn't 
> be any extra load from other processes. 

Garbage collection is one of the primary features of Java's memory 
management.  It's not Solr or the OS.

If the java heap is really enormous, you can end up with long pauses, 
but I wouldn't expect them to be frequent unless the index is also 
really huge.

A very common issue that can cause even worse pause issues than a large 
heap is a heap that's too small, but not quite small enough to cause 
Java to completely run out of heap memory.  The default max heap size in 
recent Solr versions is 512MB, which is very small.  A Java program 
(which Solr is) can never use more heap memory than the maximum it is 
configured with, even if the machine has more memory available.

This paragraph is included because you mentioned IP redirection:  
Extreme care must be used when setting up SolrCloud on virtual machines 
where accessing the VM has to go through any kind of IP translation.  
SolrCloud keeps track of how to reach each server in the cloud and if it 
stores an untranslated address when you need the translated address (or 
vice-versa), things are not going to work.  Generally speaking 
translated addresses are going to be problematic for SolrCloud, and 
should not be used.

Thanks,
Shawn


Re: SolrCloud installation troubles...

Posted by Scott Prentice <sp...@leximation.com>.
On 1/29/18 12:44 PM, Shawn Heisey wrote:
> On 1/29/2018 1:13 PM, Scott Prentice wrote:
>> But when I do the same thing on the Red Hat system it fails. Through 
>> the UI, it'll first time out with this message ..
>>
>>     Connection to Solr lost
>>
>> Then after a refresh, the collection appears to have been partially 
>> created, but it's in the "Gone" state, and after some time, is 
>> deleted by an apparent cleanup process. If I try to create one 
>> through the command line ..
>>
>>     ./bin/solr create -c test99 -n _default -s 2 -rf 2
>>
>> I get this response ..
>>
>> ERROR: Failed to create collection 'test99' due to: 
>> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
>> occured when talking to server at: http://10.6.208.31:8984/solr, 
>> 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
>> occured when talking to server at: http://10.6.208.31:8985/solr, 
>> 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
>> occured when talking to server at: http://10.6.208.31:8983/solr} 
>
> This sounds like either network connectivity problems or possibly 
> issues caused by extreme garbage collection pauses that result in 
> timeouts.
>
> Thanks,
> Shawn
>
Thanks, Shawn. I was wondering if there was something going on with IP 
redirection that was causing confusion. Any thoughts on how to debug? 
And, what do you mean by "extreme garbage collection pauses"? Is that 
Solr garbage collection or the OS itself? There's really nothing 
happening on this machine, it's purely for testing so there shouldn't be 
any extra load from other processes.

Thanks!
...scott




Re: SolrCloud installation troubles...

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/29/2018 1:13 PM, Scott Prentice wrote:
> But when I do the same thing on the Red Hat system it fails. Through 
> the UI, it'll first time out with this message ..
>
>     Connection to Solr lost
>
> Then after a refresh, the collection appears to have been partially 
> created, but it's in the "Gone" state, and after some time, is deleted 
> by an apparent cleanup process. If I try to create one through the 
> command line ..
>
>     ./bin/solr create -c test99 -n _default -s 2 -rf 2
>
> I get this response ..
>
> ERROR: Failed to create collection 'test99' due to: 
> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
> occured when talking to server at: http://10.6.208.31:8984/solr, 
> 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
> occured when talking to server at: http://10.6.208.31:8985/solr, 
> 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
> occured when talking to server at: http://10.6.208.31:8983/solr} 

This sounds like either network connectivity problems or possibly issues 
caused by extreme garbage collection pauses that result in timeouts.

Thanks,
Shawn


Re: SolrCloud installation troubles...

Posted by Scott Prentice <sp...@leximation.com>.
Interesting. I am using "localhost" in the config files (using the IP 
caused things to break even worse). But perhaps I should check with IT 
to make sure the ports are all open.

Thanks,
...scott


On 1/29/18 12:57 PM, Davis, Daniel (NIH/NLM) [C] wrote:
> To expand on that answer, you have to wonder what ports are open in the server system's port-based firewall.    I have to ask my systems team to open ports for everything I'm using, especially when I move from localhost to outside.
>
> You should be able to "fake it out" if you set up your zookeeper configuration to use localhost ports.
>
> -----Original Message-----
> From: Scott Prentice [mailto:sp14@leximation.com]
> Sent: Monday, January 29, 2018 3:13 PM
> To: solr-user@lucene.apache.org
> Subject: SolrCloud installation troubles...
>
> Using Solr 7.2.0 and Zookeeper 3.4.11
>
> In an effort to move to a more robust Solr environment, I'm setting up a prototype system of 3 Solr servers and 3 Zookeeper servers. For now, this is all on one machine, but will eventually be 3 machines.
>
> This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do the same setup on the company's network machine (a Red Hat 4.8.5-16 VM), I'm unable to create a collection. To keep things simple, I'm not using our custom schema yet, but just creating a collection through the Solr Admin UI using Collections > Add Collection, using the "_default" config set. On the Ubuntu system, I can create various collections .. 1 shard w/ 1 replication .. 2 shards w/ 3 replications .. 3 shards w/ 4 replications .. all seem alive and well.
>
> But when I do the same thing on the Red Hat system it fails. Through the UI, it'll first time out with this message ..
>
>       Connection to Solr lost
>
> Then after a refresh, the collection appears to have been partially created, but it's in the "Gone" state, and after some time, is deleted by an apparent cleanup process. If I try to create one through the command line ..
>
>       ./bin/solr create -c test99 -n _default -s 2 -rf 2
>
> I get this response ..
>
> ERROR: Failed to create collection 'test99' due to:
> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException
> occured when talking to server at: http://10.6.208.31:8984/solr, 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException
> occured when talking to server at: http://10.6.208.31:8985/solr, 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException
> occured when talking to server at: http://10.6.208.31:8983/solr}
>
> I've seen other reports of errors like this but no solutions that seem to apply to my situation. Any thoughts?
>
> Thanks!
> ...scott
>
>


RE: SolrCloud installation troubles...

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.
To expand on that answer, you have to wonder what ports are open in the server system's port-based firewall.    I have to ask my systems team to open ports for everything I'm using, especially when I move from localhost to outside.

You should be able to "fake it out" if you set up your zookeeper configuration to use localhost ports.

-----Original Message-----
From: Scott Prentice [mailto:sp14@leximation.com] 
Sent: Monday, January 29, 2018 3:13 PM
To: solr-user@lucene.apache.org
Subject: SolrCloud installation troubles...

Using Solr 7.2.0 and Zookeeper 3.4.11

In an effort to move to a more robust Solr environment, I'm setting up a prototype system of 3 Solr servers and 3 Zookeeper servers. For now, this is all on one machine, but will eventually be 3 machines.

This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do the same setup on the company's network machine (a Red Hat 4.8.5-16 VM), I'm unable to create a collection. To keep things simple, I'm not using our custom schema yet, but just creating a collection through the Solr Admin UI using Collections > Add Collection, using the "_default" config set. On the Ubuntu system, I can create various collections .. 1 shard w/ 1 replication .. 2 shards w/ 3 replications .. 3 shards w/ 4 replications .. all seem alive and well.

But when I do the same thing on the Red Hat system it fails. Through the UI, it'll first time out with this message ..

     Connection to Solr lost

Then after a refresh, the collection appears to have been partially created, but it's in the "Gone" state, and after some time, is deleted by an apparent cleanup process. If I try to create one through the command line ..

     ./bin/solr create -c test99 -n _default -s 2 -rf 2

I get this response ..

ERROR: Failed to create collection 'test99' due to: 
{10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8984/solr, 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8985/solr, 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: http://10.6.208.31:8983/solr}

I've seen other reports of errors like this but no solutions that seem to apply to my situation. Any thoughts?

Thanks!
...scott