You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by James Dulin <jd...@crelate.com> on 2013/05/30 21:48:27 UTC

2 VM setup for SOLRCLOUD?

Working to setup SolrCloud in Windows Azure.  I have read over the solr Cloud wiki, but am a little confused about some of the deployment options.  I am attaching an image for what I am thinking we want to do.  2 VM's that will have 2 shards spanning across them.  4 Nodes total across the two machines, and a zookeeper on each VM.  I think this is feasible, but, I am a little confused about how each node knows how to respond to requests (do I need a load balancer in front, or can we just reference the "collection" etc.)

[cid:image001.png@01CE5D4B.D617D6E0]

Thanks!

Jamey

Re: 2 VM setup for SOLRCLOUD?

Posted by Erick Erickson <er...@gmail.com>.

bq: so you have all the shard data, logically you should be able to
index just using that...

This assumes
1> that the cluster state isn't changing and
2> that all the nodes are available

neither of these are guaranteed.

Consider a topology where there are two ZK servers and a bunch of nodes that can
become isolated from each other. An easy way to think about it is two
separate datacenters, each with a single ZK and a bunch of nodes.
We'll skip the reasons
this is a bad idea for other reasons for this discussion. Now the
connection is lost and each
ZK node isn't able to see some number of the Solr nodes. That is ZK1
can only see
node1.1, node 1.2 and ZK2 can only see node2.1 and node2.2. Further, node1.*
can't see any of node 2.* and vice-versa.

Now, each separated half has a self-consistent view of the cluster,
it's just that
half of the cluster is unavailable for whatever reason (in our
example, the connection
between datacenters is down). Updates can come in to either half of
the cluster happen
independently.

Now the connection is restored. How could one reconcile the updates to the two
different clusters? Especially if there were updates to the same
document? This is
the "split brain" problem and exactly why a quorum is required. At
least that way
the resolution is deterministic (and explainable)....

Best
Erick

On Sat, Jun 1, 2013 at 11:17 AM, Walter Underwood <wu...@wunderwood.org> wrote:
> Running ZK on all the cloud servers makes it very, very hard to add a new Solr node. You have to reconfigure every ZK server to do that.
>
> Manage the ZK cluster and the Solr cluster separately.
>
> I'm not sure it is worth configuring Solr Cloud if you are only going to run two servers. Instead, run one server as live, and use simple replication to the second as a hot backup.
>
> If you need four or more Solr servers and you need NRT, run Solr Cloud.
>
> wunder
>
> On Jun 1, 2013, at 1:55 AM, Daniel Collins wrote:
>
>> Document updates will fail with less than the quorum of ZKs, so you won't be able to index anything when 1 server is down.
>>
>> Its the one area that always seems counter intuitive (to me at any rate), after all you have your 2 instances on 1 server, so you have all the shard data, logically you should be able to index just using that (and if you had a single ZK running on that server it would indeed be fine)...  However, ZK needs a 3rd instance running somewhere in order to maintain its majority rule.
>>
>> The consensus I've seen tends to be run a ZK on all your cloud servers, and then run some "outside" the cloud on other machines.  If you had a 3rd VM that just ran ZK and nothing else, you could lose any 1 of the 3 machines and still be ok. But if you lose 2 you are in trouble.
>>
>> -----Original Message----- From: James Dulin
>> Sent: Friday, May 31, 2013 10:28 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: 2 VM setup for SOLRCLOUD?
>>
>> Thanks. When you say updates will fail, do you mean document updates will fail, or, updates to the cluster, like adding a new node?  If adding new data will fail, I will definitely need to figure out a different way to set this up.
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: Friday, May 31, 2013 4:33 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: 2 VM setup for SOLRCLOUD?
>>
>> Be really careful here. Zookeeper requires a quorum, which is ((zk
>> nodes)/2) + 1. So the problem here is that if (zk nodes) is 2, both of them need to be up. If either of them is down, searches will still work, but updates will fail.
>>
>> Best
>> Erick
>>
>> On Fri, May 31, 2013 at 11:39 AM, James Dulin <jd...@crelate.com> wrote:
>>>
>>> Thanks, I think that the load balancer will be simple enough to set up in Azure.   My only other current concern is having the zookeepers on the same VMs as Solr.  While not ideal, we basically just need simple redunancy, so my theory is that if VM1 goes down, VM 2 will have the shard, node, and zookeeper to keep everything going smooth.
>>>
>>>
>>> -----Original Message-----
>>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>>> Sent: Friday, May 31, 2013 8:07 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: 2 VM setup for SOLRCLOUD?
>>>
>>> Actually, you don't technically _need_ a load balancer, you could hard code all requests to the same node and internally, everything would "just work". But then you'd be _creating_ a single point of failure if that node went down, so a fronting LB is usually indicated.
>>>
>>> Perhaps the thing you're missing is that Zookeeper is there explicitly for the purpose of knowing where all the nodes are and what their state is. Solr communicates with ZK and any incoming requests (update or query) are handled appripriately thus Jason's comment that once a request gets to any node in the cluster, things are handled automatically.
>>>
>>> All that said, if you're using SolrJ and use CloudSolrServer exclusively, then the load balancer isn't necessary. Internally CloudSolrServer (the client) reads the list of accessible nodes from Zookeeper and will be fault tolerant and load balance internally.
>>>
>>> Best
>>> Erick
>>>
>>> On Thu, May 30, 2013 at 3:51 PM, Jason Hellman <jh...@innoventsolutions.com> wrote:
>>>> Jamey,
>>>>
>>>> You will need a load balancer on the front end to direct traffic into one of your SolrCore entry points.  It doesn't matter, technically, which one though you will find benefits to narrowing traffic to fewer (for purposes of better cache management).
>>>>
>>>> Internally SolrCloud will round-robin distribute requests to other shards once a query begins execution.  But you do need an entry point externally to be defined through your load balancer.
>>>>
>>>> Hope this is useful!
>>>>
>>>> Jason
>>>>
>>>> On May 30, 2013, at 12:48 PM, James Dulin <jd...@crelate.com> wrote:
>>>>
>>>>> Working to setup SolrCloud in Windows Azure.  I have read over the
>>>>> solr Cloud wiki, but am a little confused about some of the
>>>>> deployment options.  I am attaching an image for what I am thinking
>>>>> we want to do.  2 VM's that will have 2 shards spanning across them.
>>>>> 4 Nodes total across the two machines, and a zookeeper on each VM.
>>>>> I think this is feasible, but, I am a little confused about how each
>>>>> node knows how to respond to requests (do I need a load balancer in
>>>>> front, or can we just reference the "collection" etc.)
>>>>>
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Jamey
>>>>>
>>>>>
>>
>
>
>
>

Re: 2 VM setup for SOLRCLOUD?

Posted by Walter Underwood <wu...@wunderwood.org>.

Running ZK on all the cloud servers makes it very, very hard to add a new Solr node. You have to reconfigure every ZK server to do that.

Manage the ZK cluster and the Solr cluster separately.

I'm not sure it is worth configuring Solr Cloud if you are only going to run two servers. Instead, run one server as live, and use simple replication to the second as a hot backup.

If you need four or more Solr servers and you need NRT, run Solr Cloud. 

wunder

On Jun 1, 2013, at 1:55 AM, Daniel Collins wrote:

> Document updates will fail with less than the quorum of ZKs, so you won't be able to index anything when 1 server is down.
> 
> Its the one area that always seems counter intuitive (to me at any rate), after all you have your 2 instances on 1 server, so you have all the shard data, logically you should be able to index just using that (and if you had a single ZK running on that server it would indeed be fine)...  However, ZK needs a 3rd instance running somewhere in order to maintain its majority rule.
> 
> The consensus I've seen tends to be run a ZK on all your cloud servers, and then run some "outside" the cloud on other machines.  If you had a 3rd VM that just ran ZK and nothing else, you could lose any 1 of the 3 machines and still be ok. But if you lose 2 you are in trouble.
> 
> -----Original Message----- From: James Dulin
> Sent: Friday, May 31, 2013 10:28 PM
> To: solr-user@lucene.apache.org
> Subject: RE: 2 VM setup for SOLRCLOUD?
> 
> Thanks. When you say updates will fail, do you mean document updates will fail, or, updates to the cluster, like adding a new node?  If adding new data will fail, I will definitely need to figure out a different way to set this up.
> 
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, May 31, 2013 4:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: 2 VM setup for SOLRCLOUD?
> 
> Be really careful here. Zookeeper requires a quorum, which is ((zk
> nodes)/2) + 1. So the problem here is that if (zk nodes) is 2, both of them need to be up. If either of them is down, searches will still work, but updates will fail.
> 
> Best
> Erick
> 
> On Fri, May 31, 2013 at 11:39 AM, James Dulin <jd...@crelate.com> wrote:
>> 
>> Thanks, I think that the load balancer will be simple enough to set up in Azure.   My only other current concern is having the zookeepers on the same VMs as Solr.  While not ideal, we basically just need simple redunancy, so my theory is that if VM1 goes down, VM 2 will have the shard, node, and zookeeper to keep everything going smooth.
>> 
>> 
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: Friday, May 31, 2013 8:07 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: 2 VM setup for SOLRCLOUD?
>> 
>> Actually, you don't technically _need_ a load balancer, you could hard code all requests to the same node and internally, everything would "just work". But then you'd be _creating_ a single point of failure if that node went down, so a fronting LB is usually indicated.
>> 
>> Perhaps the thing you're missing is that Zookeeper is there explicitly for the purpose of knowing where all the nodes are and what their state is. Solr communicates with ZK and any incoming requests (update or query) are handled appripriately thus Jason's comment that once a request gets to any node in the cluster, things are handled automatically.
>> 
>> All that said, if you're using SolrJ and use CloudSolrServer exclusively, then the load balancer isn't necessary. Internally CloudSolrServer (the client) reads the list of accessible nodes from Zookeeper and will be fault tolerant and load balance internally.
>> 
>> Best
>> Erick
>> 
>> On Thu, May 30, 2013 at 3:51 PM, Jason Hellman <jh...@innoventsolutions.com> wrote:
>>> Jamey,
>>> 
>>> You will need a load balancer on the front end to direct traffic into one of your SolrCore entry points.  It doesn't matter, technically, which one though you will find benefits to narrowing traffic to fewer (for purposes of better cache management).
>>> 
>>> Internally SolrCloud will round-robin distribute requests to other shards once a query begins execution.  But you do need an entry point externally to be defined through your load balancer.
>>> 
>>> Hope this is useful!
>>> 
>>> Jason
>>> 
>>> On May 30, 2013, at 12:48 PM, James Dulin <jd...@crelate.com> wrote:
>>> 
>>>> Working to setup SolrCloud in Windows Azure.  I have read over the
>>>> solr Cloud wiki, but am a little confused about some of the
>>>> deployment options.  I am attaching an image for what I am thinking
>>>> we want to do.  2 VM's that will have 2 shards spanning across them.
>>>> 4 Nodes total across the two machines, and a zookeeper on each VM.
>>>> I think this is feasible, but, I am a little confused about how each
>>>> node knows how to respond to requests (do I need a load balancer in
>>>> front, or can we just reference the "collection" etc.)
>>>> 
>>>> 
>>>> 
>>>> Thanks!
>>>> 
>>>> Jamey
>>>> 
>>>> 
>

Re: 2 VM setup for SOLRCLOUD?

Posted by Daniel Collins <da...@gmail.com>.

Document updates will fail with less than the quorum of ZKs, so you won't be 
able to index anything when 1 server is down.

Its the one area that always seems counter intuitive (to me at any rate), 
after all you have your 2 instances on 1 server, so you have all the shard 
data, logically you should be able to index just using that (and if you had 
a single ZK running on that server it would indeed be fine)...  However, ZK 
needs a 3rd instance running somewhere in order to maintain its majority 
rule.

The consensus I've seen tends to be run a ZK on all your cloud servers, and 
then run some "outside" the cloud on other machines.  If you had a 3rd VM 
that just ran ZK and nothing else, you could lose any 1 of the 3 machines 
and still be ok. But if you lose 2 you are in trouble.

-----Original Message----- 
From: James Dulin
Sent: Friday, May 31, 2013 10:28 PM
To: solr-user@lucene.apache.org
Subject: RE: 2 VM setup for SOLRCLOUD?

Thanks. When you say updates will fail, do you mean document updates will 
fail, or, updates to the cluster, like adding a new node?  If adding new 
data will fail, I will definitely need to figure out a different way to set 
this up.

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Friday, May 31, 2013 4:33 PM
To: solr-user@lucene.apache.org
Subject: Re: 2 VM setup for SOLRCLOUD?

Be really careful here. Zookeeper requires a quorum, which is ((zk
nodes)/2) + 1. So the problem here is that if (zk nodes) is 2, both of them 
need to be up. If either of them is down, searches will still work, but 
updates will fail.

Best
Erick

On Fri, May 31, 2013 at 11:39 AM, James Dulin <jd...@crelate.com> wrote:
>
> Thanks, I think that the load balancer will be simple enough to set up in 
> Azure.   My only other current concern is having the zookeepers on the 
> same VMs as Solr.  While not ideal, we basically just need simple 
> redunancy, so my theory is that if VM1 goes down, VM 2 will have the 
> shard, node, and zookeeper to keep everything going smooth.
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, May 31, 2013 8:07 AM
> To: solr-user@lucene.apache.org
> Subject: Re: 2 VM setup for SOLRCLOUD?
>
> Actually, you don't technically _need_ a load balancer, you could hard 
> code all requests to the same node and internally, everything would "just 
> work". But then you'd be _creating_ a single point of failure if that node 
> went down, so a fronting LB is usually indicated.
>
> Perhaps the thing you're missing is that Zookeeper is there explicitly for 
> the purpose of knowing where all the nodes are and what their state is. 
> Solr communicates with ZK and any incoming requests (update or query) are 
> handled appripriately thus Jason's comment that once a request gets to any 
> node in the cluster, things are handled automatically.
>
> All that said, if you're using SolrJ and use CloudSolrServer exclusively, 
> then the load balancer isn't necessary. Internally CloudSolrServer (the 
> client) reads the list of accessible nodes from Zookeeper and will be 
> fault tolerant and load balance internally.
>
> Best
> Erick
>
> On Thu, May 30, 2013 at 3:51 PM, Jason Hellman 
> <jh...@innoventsolutions.com> wrote:
>> Jamey,
>>
>> You will need a load balancer on the front end to direct traffic into one 
>> of your SolrCore entry points.  It doesn't matter, technically, which one 
>> though you will find benefits to narrowing traffic to fewer (for purposes 
>> of better cache management).
>>
>> Internally SolrCloud will round-robin distribute requests to other shards 
>> once a query begins execution.  But you do need an entry point externally 
>> to be defined through your load balancer.
>>
>> Hope this is useful!
>>
>> Jason
>>
>> On May 30, 2013, at 12:48 PM, James Dulin <jd...@crelate.com> wrote:
>>
>>> Working to setup SolrCloud in Windows Azure.  I have read over the
>>> solr Cloud wiki, but am a little confused about some of the
>>> deployment options.  I am attaching an image for what I am thinking
>>> we want to do.  2 VM's that will have 2 shards spanning across them.
>>> 4 Nodes total across the two machines, and a zookeeper on each VM.
>>> I think this is feasible, but, I am a little confused about how each
>>> node knows how to respond to requests (do I need a load balancer in
>>> front, or can we just reference the "collection" etc.)
>>>
>>>
>>>
>>> Thanks!
>>>
>>> Jamey
>>>
>>>
>>

RE: 2 VM setup for SOLRCLOUD?

Posted by James Dulin <jd...@crelate.com>.

Thanks. When you say updates will fail, do you mean document updates will fail, or, updates to the cluster, like adding a new node?  If adding new data will fail, I will definitely need to figure out a different way to set this up.

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Friday, May 31, 2013 4:33 PM
To: solr-user@lucene.apache.org
Subject: Re: 2 VM setup for SOLRCLOUD?

Be really careful here. Zookeeper requires a quorum, which is ((zk
nodes)/2) + 1. So the problem here is that if (zk nodes) is 2, both of them need to be up. If either of them is down, searches will still work, but updates will fail.

Best
Erick

On Fri, May 31, 2013 at 11:39 AM, James Dulin <jd...@crelate.com> wrote:
>
> Thanks, I think that the load balancer will be simple enough to set up in Azure.   My only other current concern is having the zookeepers on the same VMs as Solr.  While not ideal, we basically just need simple redunancy, so my theory is that if VM1 goes down, VM 2 will have the shard, node, and zookeeper to keep everything going smooth.
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, May 31, 2013 8:07 AM
> To: solr-user@lucene.apache.org
> Subject: Re: 2 VM setup for SOLRCLOUD?
>
> Actually, you don't technically _need_ a load balancer, you could hard code all requests to the same node and internally, everything would "just work". But then you'd be _creating_ a single point of failure if that node went down, so a fronting LB is usually indicated.
>
> Perhaps the thing you're missing is that Zookeeper is there explicitly for the purpose of knowing where all the nodes are and what their state is. Solr communicates with ZK and any incoming requests (update or query) are handled appripriately thus Jason's comment that once a request gets to any node in the cluster, things are handled automatically.
>
> All that said, if you're using SolrJ and use CloudSolrServer exclusively, then the load balancer isn't necessary. Internally CloudSolrServer (the client) reads the list of accessible nodes from Zookeeper and will be fault tolerant and load balance internally.
>
> Best
> Erick
>
> On Thu, May 30, 2013 at 3:51 PM, Jason Hellman <jh...@innoventsolutions.com> wrote:
>> Jamey,
>>
>> You will need a load balancer on the front end to direct traffic into one of your SolrCore entry points.  It doesn't matter, technically, which one though you will find benefits to narrowing traffic to fewer (for purposes of better cache management).
>>
>> Internally SolrCloud will round-robin distribute requests to other shards once a query begins execution.  But you do need an entry point externally to be defined through your load balancer.
>>
>> Hope this is useful!
>>
>> Jason
>>
>> On May 30, 2013, at 12:48 PM, James Dulin <jd...@crelate.com> wrote:
>>
>>> Working to setup SolrCloud in Windows Azure.  I have read over the 
>>> solr Cloud wiki, but am a little confused about some of the 
>>> deployment options.  I am attaching an image for what I am thinking 
>>> we want to do.  2 VM's that will have 2 shards spanning across them.
>>> 4 Nodes total across the two machines, and a zookeeper on each VM.  
>>> I think this is feasible, but, I am a little confused about how each 
>>> node knows how to respond to requests (do I need a load balancer in 
>>> front, or can we just reference the "collection" etc.)
>>>
>>>
>>>
>>> Thanks!
>>>
>>> Jamey
>>>
>>>
>>

Re: 2 VM setup for SOLRCLOUD?

Posted by Erick Erickson <er...@gmail.com>.

Be really careful here. Zookeeper requires a quorum, which is ((zk
nodes)/2) + 1. So the problem here is that if (zk nodes) is 2, both of
them need to be up. If either of them is down, searches will still
work, but updates will fail.

Best
Erick

On Fri, May 31, 2013 at 11:39 AM, James Dulin <jd...@crelate.com> wrote:
>
> Thanks, I think that the load balancer will be simple enough to set up in Azure.   My only other current concern is having the zookeepers on the same VMs as Solr.  While not ideal, we basically just need simple redunancy, so my theory is that if VM1 goes down, VM 2 will have the shard, node, and zookeeper to keep everything going smooth.
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, May 31, 2013 8:07 AM
> To: solr-user@lucene.apache.org
> Subject: Re: 2 VM setup for SOLRCLOUD?
>
> Actually, you don't technically _need_ a load balancer, you could hard code all requests to the same node and internally, everything would "just work". But then you'd be _creating_ a single point of failure if that node went down, so a fronting LB is usually indicated.
>
> Perhaps the thing you're missing is that Zookeeper is there explicitly for the purpose of knowing where all the nodes are and what their state is. Solr communicates with ZK and any incoming requests (update or query) are handled appripriately thus Jason's comment that once a request gets to any node in the cluster, things are handled automatically.
>
> All that said, if you're using SolrJ and use CloudSolrServer exclusively, then the load balancer isn't necessary. Internally CloudSolrServer (the client) reads the list of accessible nodes from Zookeeper and will be fault tolerant and load balance internally.
>
> Best
> Erick
>
> On Thu, May 30, 2013 at 3:51 PM, Jason Hellman <jh...@innoventsolutions.com> wrote:
>> Jamey,
>>
>> You will need a load balancer on the front end to direct traffic into one of your SolrCore entry points.  It doesn't matter, technically, which one though you will find benefits to narrowing traffic to fewer (for purposes of better cache management).
>>
>> Internally SolrCloud will round-robin distribute requests to other shards once a query begins execution.  But you do need an entry point externally to be defined through your load balancer.
>>
>> Hope this is useful!
>>
>> Jason
>>
>> On May 30, 2013, at 12:48 PM, James Dulin <jd...@crelate.com> wrote:
>>
>>> Working to setup SolrCloud in Windows Azure.  I have read over the
>>> solr Cloud wiki, but am a little confused about some of the
>>> deployment options.  I am attaching an image for what I am thinking
>>> we want to do.  2 VM's that will have 2 shards spanning across them.
>>> 4 Nodes total across the two machines, and a zookeeper on each VM.  I
>>> think this is feasible, but, I am a little confused about how each
>>> node knows how to respond to requests (do I need a load balancer in
>>> front, or can we just reference the "collection" etc.)
>>>
>>>
>>>
>>> Thanks!
>>>
>>> Jamey
>>>
>>>
>>

RE: 2 VM setup for SOLRCLOUD?

Posted by James Dulin <jd...@crelate.com>.

Thanks, I think that the load balancer will be simple enough to set up in Azure.   My only other current concern is having the zookeepers on the same VMs as Solr.  While not ideal, we basically just need simple redunancy, so my theory is that if VM1 goes down, VM 2 will have the shard, node, and zookeeper to keep everything going smooth.  

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Friday, May 31, 2013 8:07 AM
To: solr-user@lucene.apache.org
Subject: Re: 2 VM setup for SOLRCLOUD?

Actually, you don't technically _need_ a load balancer, you could hard code all requests to the same node and internally, everything would "just work". But then you'd be _creating_ a single point of failure if that node went down, so a fronting LB is usually indicated.

Perhaps the thing you're missing is that Zookeeper is there explicitly for the purpose of knowing where all the nodes are and what their state is. Solr communicates with ZK and any incoming requests (update or query) are handled appripriately thus Jason's comment that once a request gets to any node in the cluster, things are handled automatically.

All that said, if you're using SolrJ and use CloudSolrServer exclusively, then the load balancer isn't necessary. Internally CloudSolrServer (the client) reads the list of accessible nodes from Zookeeper and will be fault tolerant and load balance internally.

Best
Erick

On Thu, May 30, 2013 at 3:51 PM, Jason Hellman <jh...@innoventsolutions.com> wrote:
> Jamey,
>
> You will need a load balancer on the front end to direct traffic into one of your SolrCore entry points.  It doesn't matter, technically, which one though you will find benefits to narrowing traffic to fewer (for purposes of better cache management).
>
> Internally SolrCloud will round-robin distribute requests to other shards once a query begins execution.  But you do need an entry point externally to be defined through your load balancer.
>
> Hope this is useful!
>
> Jason
>
> On May 30, 2013, at 12:48 PM, James Dulin <jd...@crelate.com> wrote:
>
>> Working to setup SolrCloud in Windows Azure.  I have read over the 
>> solr Cloud wiki, but am a little confused about some of the 
>> deployment options.  I am attaching an image for what I am thinking 
>> we want to do.  2 VM's that will have 2 shards spanning across them.  
>> 4 Nodes total across the two machines, and a zookeeper on each VM.  I 
>> think this is feasible, but, I am a little confused about how each 
>> node knows how to respond to requests (do I need a load balancer in 
>> front, or can we just reference the "collection" etc.)
>>
>>
>>
>> Thanks!
>>
>> Jamey
>>
>>
>

Re: 2 VM setup for SOLRCLOUD?

Posted by Erick Erickson <er...@gmail.com>.

Actually, you don't technically _need_ a load balancer,
you could hard code all requests to the same node and
internally, everything would "just work". But then you'd
be _creating_ a single point of failure if that node went down,
so a fronting LB is usually indicated.

Perhaps the thing you're missing is that Zookeeper is there
explicitly for the purpose of knowing where all the nodes are
and what their state is. Solr communicates with ZK and any
incoming requests (update or query) are handled appripriately
thus Jason's comment that once a request gets to any node
in the cluster, things are handled automatically.

All that said, if you're using SolrJ and use CloudSolrServer
exclusively, then the load balancer isn't necessary. Internally
CloudSolrServer (the client) reads the list of accessible nodes
from Zookeeper and will be fault tolerant and load balance
internally.

Best
Erick

On Thu, May 30, 2013 at 3:51 PM, Jason Hellman
<jh...@innoventsolutions.com> wrote:
> Jamey,
>
> You will need a load balancer on the front end to direct traffic into one of your SolrCore entry points.  It doesn't matter, technically, which one though you will find benefits to narrowing traffic to fewer (for purposes of better cache management).
>
> Internally SolrCloud will round-robin distribute requests to other shards once a query begins execution.  But you do need an entry point externally to be defined through your load balancer.
>
> Hope this is useful!
>
> Jason
>
> On May 30, 2013, at 12:48 PM, James Dulin <jd...@crelate.com> wrote:
>
>> Working to setup SolrCloud in Windows Azure.  I have read over the solr Cloud wiki, but am a little confused about some of the deployment options.  I am attaching an image for what I am thinking we want to do.  2 VM’s that will have 2 shards spanning across them.  4 Nodes total across the two machines, and a zookeeper on each VM.  I think this is feasible, but, I am a little confused about how each node knows how to respond to requests (do I need a load balancer in front, or can we just reference the “collection” etc.)
>>
>>
>>
>> Thanks!
>>
>> Jamey
>>
>>
>

Re: 2 VM setup for SOLRCLOUD?

Posted by Jason Hellman <jh...@innoventsolutions.com>.

Jamey,

You will need a load balancer on the front end to direct traffic into one of your SolrCore entry points.  It doesn't matter, technically, which one though you will find benefits to narrowing traffic to fewer (for purposes of better cache management).

Internally SolrCloud will round-robin distribute requests to other shards once a query begins execution.  But you do need an entry point externally to be defined through your load balancer.

Hope this is useful!

Jason

On May 30, 2013, at 12:48 PM, James Dulin <jd...@crelate.com> wrote:

> Working to setup SolrCloud in Windows Azure.  I have read over the solr Cloud wiki, but am a little confused about some of the deployment options.  I am attaching an image for what I am thinking we want to do.  2 VM’s that will have 2 shards spanning across them.  4 Nodes total across the two machines, and a zookeeper on each VM.  I think this is feasible, but, I am a little confused about how each node knows how to respond to requests (do I need a load balancer in front, or can we just reference the “collection” etc.)
>  
> 
>  
> Thanks!
>  
> Jamey
>  
>