You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Vineet Mishra <cl...@gmail.com> on 2014/02/18 14:05:45 UTC

Fault Tolerant Technique of Solr Cloud

Hi All,

I want to have clear idea about the Fault Tolerant Capability of SolrCloud

Considering I have setup the SolrCloud with a external Zookeeper, 2 shards,
each having a replica with single collection as given in the official Solr
Documentation.

https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud

                                   *Collection1*
                                     /            \
                                   /                \
                                 /                    \
                               /                        \
                             /                            \
                            /                               \
*Shard 1                                                     Shard 2*
localhost:8983                                            localhost:7574
localhost:8900                                            localhost:7500


I Indexed some document and then if I shutdown any of the replica or Leader
say for ex- *localhost:8900*, I can't query to the collection to that
particular port

http:/*/localhost:8900*/solr/collection1/select?q=*:*

Then how is it Fault Tolerant or how the query has to be made.

Regards

Re: Fault Tolerant Technique of Solr Cloud

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

Vineet, I'm assuming that you are executing your search from a Java
Client. If so, just use CloudSolrServer present in the Solrj API and
save yourself from all these troubles. If you are not using a Java
client, then you need to put a few or all your servers behind a load
balancer and invoke requests against that.

On Mon, Feb 24, 2014 at 5:34 PM, Vineet Mishra <cl...@gmail.com> wrote:
> Can you brief as how to make a direct call to Zookeeper instead of Cloud
> Collection(as currently I was querying the Cloud something like
> *"http://192.168.2.183:8900/solr/collection1/select?q=*:*
> <http://192.168.2.183:8900/solr/collection1/select?q=*:*>"* ) from UI, now
> if I assume shard 8900 is down then how can I still make the call.
>
> I have followed the Apache Tutorial(with separate zookeeper running on port
> 2181)
>
> http://wiki.apache.org/solr/SolrCloud
>
> Can you please be more specific in respect to zookeeper distributed calls.
>
> Regards
>
>
> On Wed, Feb 19, 2014 at 9:45 PM, Per Steffensen <st...@designware.dk> wrote:
>
>> On 19/02/14 07:57, Vineet Mishra wrote:
>>
>>> Thanks for all your response but my doubt is which *Server:Port* should
>>> the
>>>
>>> query be made as we don't know the crashed server or which server might
>>> crash in the future(as any server can go down).
>>>
>> That is what CloudSolrServer will deal with for you. It knows which
>> servers are down and make sure not to send request to those servers.
>>
>>
>>> The only intention for writing this doubt is to get an idea about how the
>>> query format for distributed search might work if any of the shard or
>>> replica goes down.
>>>
>>
>> // Setting up your CloudSolrServer-client
>> CloudSolrServer client=  new  CloudSolrServer(<zkConnectionStr>);  //
>> <zkConnectionStr> being the same string as you provide in -D|zkHost when
>> starting your servers
>> |client.setDefaultCollection("collection1");
>> client.connect();
>>
>> // Creating and firing queries (you can do it in different way, but at
>> least this is an option)
>> SolrQuery query = new SolrQuery("*:*");
>> QueryResponse results = client.query(query);
>>
>>
>> Because you are using CloudSolrServer you do not have to worry about not
>> sending the request to a crashed server.
>>
>> In your example I believe the situation is as follows:
>> * One collection called "collection1" with two shards "shard1" and
>> "shard2" each having two replica "replica1" and "replica2" (a replica is an
>> "instance" of a shard, and when you have one replica you are not having
>> replication).
>> * collection1.shard1.replica1 is running on localhost:8983 and
>> collection1.shard1.replica2 is running on localhost:8900 (or maybe switched)
>> * collection1.shard2.replica1 is running on localhost:7574 and
>> collection1.shard2.replica2 is running on localhost:7500 (or maybe switched)
>> If localhost:8900 is the only server that is down, all data is still
>> available for search because every shard has at least on replica running.
>> In that case I believe setting "shards.tolerant" will not make a
>> difference. You will get your response no matter what. But if
>> localhost:8983 was also down there would no live replica of shard1. I that
>> case you will get an exception from you query, indicating that the query
>> cannot be carried out over the complete data-set. In that case if you set
>> "shards.tolerant" that behaviour will change, and you will not get an
>> exception - you will get a real response, but it will just not include data
>> from shard1, because it is not available at the moment. That is just the
>> way I believe "shards.tolerant" works, but you might want to verify that.
>>
>> To set "shards.tolerant":
>>
>> SolrQuery query = new SolrQuery("*:*");
>> query.set("shards.tolerant", true);
>> QueryResponse results = client.query(query);
>>
>>
>> Believe distributes search is default, but you can explicitly require it by
>>
>> query.setDistrib(true);
>>
>> or
>>
>> query.set("distrib", true);
>>
>>
>>> Thanks
>>>
>>
>>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Fault Tolerant Technique of Solr Cloud

Posted by Daniel Collins <da...@gmail.com>.

I can see what you mean, what you are expecting is a single host:port
combination for "The Cloud" that always works, and you can call from your
UI.  That is perfectly possible, but its really not within the scope of
Solr itself.

What you should understand is that Solr provides is a cloud that has
fail-over and fault tolerance within itself.  It provides the SolrJ client
to access it, but if you want an HTTP interface to "Solr Cloud", that's a
standard haproxy/load balancer setup. That can be done in hardware or
software, there are guides galore on the web if you search for HAProxy.


On 27 February 2014 11:41, Vineet Mishra <cl...@gmail.com> wrote:

> Hi Per
>
> Thanks for your response, got it working.
> But moreover I was more interested in querying the same Cloud from UI in a
> case of one of the server down and querying the same server to get
> collection result. But I guess thats not possible.
> Thanks!
>
>
>
>
> On Mon, Feb 24, 2014 at 7:36 PM, Per Steffensen <st...@designware.dk>
> wrote:
>
> > On 24/02/14 13:04, Vineet Mishra wrote:
> >
> >> Can you brief as how to make a direct call to Zookeeper instead of Cloud
> >> Collection(as currently I was querying the Cloud something like
> >> *"http://192.168.2.183:8900/solr/collection1/select?q=*:*
> >> <http://192.168.2.183:8900/solr/collection1/select?q=*:*>"* ) from UI,
> >> now
> >>
> >> if I assume shard 8900 is down then how can I still make the call.
> >>
> > It is obvious that you cannot make the call to localhost:8900 - the
> server
> > listening to that port is down. You can make the call to any of the other
> > servers, though. Information about which Solr-servers are running is
> > available in ZooKeeper, CloudSolrServer reads that information in order
> to
> > know which servers to route requests to. As long as localhost:8900 is
> down
> > it will not route requests to that server.
> >
> > You do not make a "direct call to ZooKeeper". ZooKeeper is not an HTTP
> > server that will receive your calls. It just has information about which
> > Solr-servers are up and running. CloudSolrServers takes advantage of that
> > information. You really cannot do without CloudSolrServer (or at least
> > LBHttpSolrServer), unless you write a component that can do the same
> thing
> > in some other language (if the reason you do not want to use
> > CloudSolrServer, is that your client is not java). Else you need to do
> > other clever stuff, like e.g. what Shalin suggests.
> >
> > Regards, Per Steffensen
> >
>

Re: Fault Tolerant Technique of Solr Cloud

Posted by Vineet Mishra <cl...@gmail.com>.

Hi Per

Thanks for your response, got it working.
But moreover I was more interested in querying the same Cloud from UI in a
case of one of the server down and querying the same server to get
collection result. But I guess thats not possible.
Thanks!




On Mon, Feb 24, 2014 at 7:36 PM, Per Steffensen <st...@designware.dk> wrote:

> On 24/02/14 13:04, Vineet Mishra wrote:
>
>> Can you brief as how to make a direct call to Zookeeper instead of Cloud
>> Collection(as currently I was querying the Cloud something like
>> *"http://192.168.2.183:8900/solr/collection1/select?q=*:*
>> <http://192.168.2.183:8900/solr/collection1/select?q=*:*>"* ) from UI,
>> now
>>
>> if I assume shard 8900 is down then how can I still make the call.
>>
> It is obvious that you cannot make the call to localhost:8900 - the server
> listening to that port is down. You can make the call to any of the other
> servers, though. Information about which Solr-servers are running is
> available in ZooKeeper, CloudSolrServer reads that information in order to
> know which servers to route requests to. As long as localhost:8900 is down
> it will not route requests to that server.
>
> You do not make a "direct call to ZooKeeper". ZooKeeper is not an HTTP
> server that will receive your calls. It just has information about which
> Solr-servers are up and running. CloudSolrServers takes advantage of that
> information. You really cannot do without CloudSolrServer (or at least
> LBHttpSolrServer), unless you write a component that can do the same thing
> in some other language (if the reason you do not want to use
> CloudSolrServer, is that your client is not java). Else you need to do
> other clever stuff, like e.g. what Shalin suggests.
>
> Regards, Per Steffensen
>

Re: Fault Tolerant Technique of Solr Cloud

Posted by Per Steffensen <st...@designware.dk>.

On 24/02/14 13:04, Vineet Mishra wrote:
> Can you brief as how to make a direct call to Zookeeper instead of Cloud
> Collection(as currently I was querying the Cloud something like
> *"http://192.168.2.183:8900/solr/collection1/select?q=*:*
> <http://192.168.2.183:8900/solr/collection1/select?q=*:*>"* ) from UI, now
> if I assume shard 8900 is down then how can I still make the call.
It is obvious that you cannot make the call to localhost:8900 - the 
server listening to that port is down. You can make the call to any of 
the other servers, though. Information about which Solr-servers are 
running is available in ZooKeeper, CloudSolrServer reads that 
information in order to know which servers to route requests to. As long 
as localhost:8900 is down it will not route requests to that server.

You do not make a "direct call to ZooKeeper". ZooKeeper is not an HTTP 
server that will receive your calls. It just has information about which 
Solr-servers are up and running. CloudSolrServers takes advantage of 
that information. You really cannot do without CloudSolrServer (or at 
least LBHttpSolrServer), unless you write a component that can do the 
same thing in some other language (if the reason you do not want to use 
CloudSolrServer, is that your client is not java). Else you need to do 
other clever stuff, like e.g. what Shalin suggests.

Regards, Per Steffensen

Re: Fault Tolerant Technique of Solr Cloud

Posted by Vineet Mishra <cl...@gmail.com>.

Can you brief as how to make a direct call to Zookeeper instead of Cloud
Collection(as currently I was querying the Cloud something like
*"http://192.168.2.183:8900/solr/collection1/select?q=*:*
<http://192.168.2.183:8900/solr/collection1/select?q=*:*>"* ) from UI, now
if I assume shard 8900 is down then how can I still make the call.

I have followed the Apache Tutorial(with separate zookeeper running on port
2181)

http://wiki.apache.org/solr/SolrCloud

Can you please be more specific in respect to zookeeper distributed calls.

Regards


On Wed, Feb 19, 2014 at 9:45 PM, Per Steffensen <st...@designware.dk> wrote:

> On 19/02/14 07:57, Vineet Mishra wrote:
>
>> Thanks for all your response but my doubt is which *Server:Port* should
>> the
>>
>> query be made as we don't know the crashed server or which server might
>> crash in the future(as any server can go down).
>>
> That is what CloudSolrServer will deal with for you. It knows which
> servers are down and make sure not to send request to those servers.
>
>
>> The only intention for writing this doubt is to get an idea about how the
>> query format for distributed search might work if any of the shard or
>> replica goes down.
>>
>
> // Setting up your CloudSolrServer-client
> CloudSolrServer client=  new  CloudSolrServer(<zkConnectionStr>);  //
> <zkConnectionStr> being the same string as you provide in -D|zkHost when
> starting your servers
> |client.setDefaultCollection("collection1");
> client.connect();
>
> // Creating and firing queries (you can do it in different way, but at
> least this is an option)
> SolrQuery query = new SolrQuery("*:*");
> QueryResponse results = client.query(query);
>
>
> Because you are using CloudSolrServer you do not have to worry about not
> sending the request to a crashed server.
>
> In your example I believe the situation is as follows:
> * One collection called "collection1" with two shards "shard1" and
> "shard2" each having two replica "replica1" and "replica2" (a replica is an
> "instance" of a shard, and when you have one replica you are not having
> replication).
> * collection1.shard1.replica1 is running on localhost:8983 and
> collection1.shard1.replica2 is running on localhost:8900 (or maybe switched)
> * collection1.shard2.replica1 is running on localhost:7574 and
> collection1.shard2.replica2 is running on localhost:7500 (or maybe switched)
> If localhost:8900 is the only server that is down, all data is still
> available for search because every shard has at least on replica running.
> In that case I believe setting "shards.tolerant" will not make a
> difference. You will get your response no matter what. But if
> localhost:8983 was also down there would no live replica of shard1. I that
> case you will get an exception from you query, indicating that the query
> cannot be carried out over the complete data-set. In that case if you set
> "shards.tolerant" that behaviour will change, and you will not get an
> exception - you will get a real response, but it will just not include data
> from shard1, because it is not available at the moment. That is just the
> way I believe "shards.tolerant" works, but you might want to verify that.
>
> To set "shards.tolerant":
>
> SolrQuery query = new SolrQuery("*:*");
> query.set("shards.tolerant", true);
> QueryResponse results = client.query(query);
>
>
> Believe distributes search is default, but you can explicitly require it by
>
> query.setDistrib(true);
>
> or
>
> query.set("distrib", true);
>
>
>> Thanks
>>
>
>

Re: Fault Tolerant Technique of Solr Cloud

Posted by Per Steffensen <st...@designware.dk>.

On 19/02/14 07:57, Vineet Mishra wrote:
> Thanks for all your response but my doubt is which *Server:Port* should the
> query be made as we don't know the crashed server or which server might
> crash in the future(as any server can go down).
That is what CloudSolrServer will deal with for you. It knows which 
servers are down and make sure not to send request to those servers.
>
> The only intention for writing this doubt is to get an idea about how the
> query format for distributed search might work if any of the shard or
> replica goes down.

// Setting up your CloudSolrServer-client
CloudSolrServer client=  new  CloudSolrServer(<zkConnectionStr>);  // <zkConnectionStr> being the same string as you provide in -D|zkHost when starting your servers
|client.setDefaultCollection("collection1");
client.connect();

// Creating and firing queries (you can do it in different way, but at least this is an option)
SolrQuery query = new SolrQuery("*:*");
QueryResponse results = client.query(query);

Because you are using CloudSolrServer you do not have to worry about not 
sending the request to a crashed server.

In your example I believe the situation is as follows:
* One collection called "collection1" with two shards "shard1" and 
"shard2" each having two replica "replica1" and "replica2" (a replica is 
an "instance" of a shard, and when you have one replica you are not 
having replication).
* collection1.shard1.replica1 is running on localhost:8983 and 
collection1.shard1.replica2 is running on localhost:8900 (or maybe switched)
* collection1.shard2.replica1 is running on localhost:7574 and 
collection1.shard2.replica2 is running on localhost:7500 (or maybe switched)
If localhost:8900 is the only server that is down, all data is still 
available for search because every shard has at least on replica 
running. In that case I believe setting "shards.tolerant" will not make 
a difference. You will get your response no matter what. But if 
localhost:8983 was also down there would no live replica of shard1. I 
that case you will get an exception from you query, indicating that the 
query cannot be carried out over the complete data-set. In that case if 
you set "shards.tolerant" that behaviour will change, and you will not 
get an exception - you will get a real response, but it will just not 
include data from shard1, because it is not available at the moment. 
That is just the way I believe "shards.tolerant" works, but you might 
want to verify that.

To set "shards.tolerant":

SolrQuery query = new SolrQuery("*:*");
query.set("shards.tolerant", true);
QueryResponse results = client.query(query);

Believe distributes search is default, but you can explicitly require it by

query.setDistrib(true);

or

query.set("distrib", true);

>
> Thanks

Re: Fault Tolerant Technique of Solr Cloud

Posted by shamik <sh...@gmail.com>.

As Shawn had pointed, if you are using CloudSolrServer client, then you are
immune to the scenario where a shard and its replica(s) go down. The
communication should be ideally with the zookeepers and not the solr servers
directly, One thing you need to make sure is to add the shard.tolerant
parameter so that the query returns result from the shard which is alive,
though it'll fetch a partial resultset.



--
View this message in context: http://lucene.472066.n3.nabble.com/Fault-Tolerant-Technique-of-Solr-Cloud-tp4118003p4118196.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Fault Tolerant Technique of Solr Cloud

Posted by Vineet Mishra <cl...@gmail.com>.

Thanks for all your response but my doubt is which *Server:Port* should the
query be made as we don't know the crashed server or which server might
crash in the future(as any server can go down).

The only intention for writing this doubt is to get an idea about how the
query format for distributed search might work if any of the shard or
replica goes down.

Thanks


On Tue, Feb 18, 2014 at 11:22 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 2/18/2014 8:32 AM, Shawn Heisey wrote:
>
>> On 2/18/2014 6:05 AM, Vineet Mishra wrote:
>>
>>> *Shard 1                                                     Shard 2*
>>> localhost:8983                                            localhost:7574
>>> localhost:8900                                            localhost:7500
>>>
>>>
>>> I Indexed some document and then if I shutdown any of the replica or
>>> Leader
>>> say for ex- *localhost:8900*, I can't query to the collection to that
>>> particular port
>>>
>>> http:/*/localhost:8900*/solr/collection1/select?q=*:*
>>>
>>> Then how is it Fault Tolerant or how the query has to be made.
>>>
>> What is the complete error you are getting?  If you don't see the error
>> in the response, you'll need to find your Solr Logfile and look for the
>> error (including a large java stacktrace) there.
>>
>
> Good catch by Per.  I did not notice that you were trying to send the
> query to the server that you took down.  This isn't going to work -- if the
> software you're trying to reach is not running, it won't respond.  Think
> about what happens if you are sending requests to a server and it crashes
> completely.
>
> If you want to always send to the same host/port, you will need a load
> balancer listening on that port.  You'll also want something that maintains
> a shared IP address, so that if the machine dies, the IP address and the
> load balancer move to another machine.  Haproxy and Pacemaker work very
> well as a combination for this.  There are many other choices, both
> hardware and software.
>
> Per also mentioned the other option - you can write code that knows about
> multiple URLs and can switch between them.  This is something you get for
> free with CloudSolrServer when writing Java code with SolrJ.
>
> Thanks,
> Shawn
>
>

Re: Fault Tolerant Technique of Solr Cloud

Posted by Shawn Heisey <so...@elyograg.org>.

On 2/18/2014 8:32 AM, Shawn Heisey wrote:
> On 2/18/2014 6:05 AM, Vineet Mishra wrote:
>> *Shard 1                                                     Shard 2*
>> localhost:8983                                            localhost:7574
>> localhost:8900                                            localhost:7500
>>
>>
>> I Indexed some document and then if I shutdown any of the replica or Leader
>> say for ex- *localhost:8900*, I can't query to the collection to that
>> particular port
>>
>> http:/*/localhost:8900*/solr/collection1/select?q=*:*
>>
>> Then how is it Fault Tolerant or how the query has to be made.
> What is the complete error you are getting?  If you don't see the error
> in the response, you'll need to find your Solr Logfile and look for the
> error (including a large java stacktrace) there.

Good catch by Per.  I did not notice that you were trying to send the 
query to the server that you took down.  This isn't going to work -- if 
the software you're trying to reach is not running, it won't respond.  
Think about what happens if you are sending requests to a server and it 
crashes completely.

If you want to always send to the same host/port, you will need a load 
balancer listening on that port.  You'll also want something that 
maintains a shared IP address, so that if the machine dies, the IP 
address and the load balancer move to another machine.  Haproxy and 
Pacemaker work very well as a combination for this.  There are many 
other choices, both hardware and software.

Per also mentioned the other option - you can write code that knows 
about multiple URLs and can switch between them.  This is something you 
get for free with CloudSolrServer when writing Java code with SolrJ.

Thanks,
Shawn

Re: Fault Tolerant Technique of Solr Cloud

Posted by Shawn Heisey <so...@elyograg.org>.

On 2/18/2014 6:05 AM, Vineet Mishra wrote:
> *Shard 1                                                     Shard 2*
> localhost:8983                                            localhost:7574
> localhost:8900                                            localhost:7500
> 
> 
> I Indexed some document and then if I shutdown any of the replica or Leader
> say for ex- *localhost:8900*, I can't query to the collection to that
> particular port
> 
> http:/*/localhost:8900*/solr/collection1/select?q=*:*
> 
> Then how is it Fault Tolerant or how the query has to be made.

What is the complete error you are getting?  If you don't see the error
in the response, you'll need to find your Solr Logfile and look for the
error (including a large java stacktrace) there.

Thanks,
Shawn

Re: Fault Tolerant Technique of Solr Cloud

Posted by Per Steffensen <st...@designware.dk>.

If localhost:8900 is down but localhost:8983 contain replica of the same 
shard(s) that 8900 was running, all data/documents are still available. 
You cannot query the shutdown server (port 8900), but you can query any 
of the other servers (8983, 7574 or 7500). If you make a distributed 
query to collection1 you should still be able to find all of your 
documents, even though 8900 is down.

It is cumbersome to keep a list of crashed/shutdown servers, in order to 
make sure you are always querying a server that is not down. The 
information about what servers are running (and which are not) and which 
replica they run are all in ZooKeeper. So basically, just go look in 
ZooKeeper :-) Ahh, Solr has tool to help you do that - at least if you 
are running your client in java-code. Solr implement different kinds of 
clients (called XXXSolrServer - yes, obvious name for a client). There 
are HttpSolrServer that can do queries against a particular server (wont 
help you), there are LBHttpSolrServer that can do load-balancing over 
several HttpSolrServers (ahh, still not there), and there are 
CloudSolrServer that watches ZooKeeper in order to know what is running 
and where to send requests. CloudSolrServer uses LBHttpSolrServer behind 
the scenes. If you use CloudSolrServer as a client everything should be 
smooth and transparent with respect to querying when servers are down. 
CloudSolrServer will find out where to (and not to) route your requests.

Regards, Per Steffensen

On 18/02/14 14:05, Vineet Mishra wrote:
> Hi All,
>
> I want to have clear idea about the Fault Tolerant Capability of SolrCloud
>
> Considering I have setup the SolrCloud with a external Zookeeper, 2 shards,
> each having a replica with single collection as given in the official Solr
> Documentation.
>
> https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
>
>                                     *Collection1*
>                                       /            \
>                                     /                \
>                                   /                    \
>                                 /                        \
>                               /                            \
>                              /                               \
> *Shard 1                                                     Shard 2*
> localhost:8983                                            localhost:7574
> localhost:8900                                            localhost:7500
>
>
> I Indexed some document and then if I shutdown any of the replica or Leader
> say for ex- *localhost:8900*, I can't query to the collection to that
> particular port
>
> http:/*/localhost:8900*/solr/collection1/select?q=*:*
>
> Then how is it Fault Tolerant or how the query has to be made.
>
> Regards
>

Re: Fault Tolerant Technique of Solr Cloud

Posted by Amit Jha <sh...@gmail.com>.

Solr will complaint only if you brought down both replica & leader of same shard. It would be difficult to have highly available env. If you have less number of physical servers.

Rgds
AJ

> On 18-Feb-2014, at 18:35, Vineet Mishra <cl...@gmail.com> wrote:
> 
> Hi All,
> 
> I want to have clear idea about the Fault Tolerant Capability of SolrCloud
> 
> Considering I have setup the SolrCloud with a external Zookeeper, 2 shards,
> each having a replica with single collection as given in the official Solr
> Documentation.
> 
> https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
> 
>                                   *Collection1*
>                                     /            \
>                                   /                \
>                                 /                    \
>                               /                        \
>                             /                            \
>                            /                               \
> *Shard 1                                                     Shard 2*
> localhost:8983                                            localhost:7574
> localhost:8900                                            localhost:7500
> 
> 
> I Indexed some document and then if I shutdown any of the replica or Leader
> say for ex- *localhost:8900*, I can't query to the collection to that
> particular port
> 
> http:/*/localhost:8900*/solr/collection1/select?q=*:*
> 
> Then how is it Fault Tolerant or how the query has to be made.
> 
> Regards