You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2010/03/09 22:28:40 UTC

Distributed search fault tolerance

I attended the Webinar on March 4th.  Many thanks to Yonik for putting 
that on.  That has led to some questions about the best way to bring 
fault tolerance to our distributed search.  High level question: Should 
I go with SolrCloud, or stick with 1.4 and use load balancing?  I hope 
the rest of this email isn't too disjointed for understanding.

We are using virtual machines on 8-core servers with 32GB of RAM to 
house all this.  For initial deployment, there are two of these, but we 
will have a total of four once we migrate off our current indexing 
solution.  We won't be able to bring fault tolerance into the mix until 
we have all four hosts, but I need to know what direction we are going 
before initial deployment.

One choice is to stick with version 1.4 for stability and use load 
balancing on the shards.  I had already planned to have a pair of load 
balancer VMs to handle redundancy on what I'm calling the broker 
(explained further down), so it would not be a major step to have it do 
the shards as well.

I have been looking into SolrCloud.  I tried to just swap out the .war 
file with one compiled from the cloud branch, but that didn't work.  A 
little digging showed that the cloud branch uses a core for the 
collection.  I already have cores defined so I can build indexes and 
swap them into place quickly.  A big question - can I continue to use 
this multi-core approach with SolrCloud, or does it supplant cores with 
its collection logic?

Due to the observed high CPU requirements involved in sorting results 
from multiple shards into a final result, I have so far opted to go with 
an architecture that puts an empty index into a broker core, which lives 
on its own VM host separate from the large static shards.  This core's 
solrconfig.xml has a list of all the shards that get queried.  My 
application has no idea that it's talking to anything other than a 
single SOLR instance.  Once we get the caches warmed, performance is 
quite good.

The VM host with the broker will also have another VM with the shard 
where all new data goes, a concept we call the incrememental.  On a 
nightly basis, some of the documents in the incremental will be 
redistributed to the static shards and everything will get reoptimized.

How would you recommend I pursue fault tolerance?  I had already planned 
to set up a load balancer VM to handle redundancy for the broker, so it 
would not be a HUGE step to have it load balance the shards too.


Re: Distributed search fault tolerance

Posted by Mark Miller <ma...@gmail.com>.
On 03/09/2010 04:28 PM, Shawn Heisey wrote:
>  I attended the Webinar on March 4th.  Many thanks to Yonik for
>  putting that on.  That has led to some questions about the best way
>  to bring fault tolerance to our distributed search.  High level
>  question: Should I go with SolrCloud, or stick with 1.4 and use load
>  balancing?  I hope the rest of this email isn't too disjointed for
>  understanding.
>
>  We are using virtual machines on 8-core servers with 32GB of RAM to
>  house all this.  For initial deployment, there are two of these, but
>  we will have a total of four once we migrate off our current indexing
>  solution.  We won't be able to bring fault tolerance into the mix
>  until we have all four hosts, but I need to know what direction we
>  are going before initial deployment.
>
>  One choice is to stick with version 1.4 for stability and use load
>  balancing on the shards.  I had already planned to have a pair of
>  load balancer VMs to handle redundancy on what I'm calling the broker
>  (explained further down), so it would not be a major step to have it
>  do the shards as well.
>
>  I have been looking into SolrCloud.  I tried to just swap out the
>  .war file with one compiled from the cloud branch, but that didn't
>  work.  A little digging showed that the cloud branch uses a core for
>  the collection.

Hmm - not sure I completely follow - a collection is currently made up 
of n cores. Unless you override the collection name that a core should 
participate in, it defaults to the core name.

>I already have cores defined so I can build indexes
>  and swap them into place quickly.  A big question - can I continue to
>  use this multi-core approach with SolrCloud, or does it supplant
>  cores with its collection logic?

You should be able to do anything you were doing pre SolrCloud I believe.

>
>  Due to the observed high CPU requirements involved in sorting results
>  from multiple shards into a final result, I have so far opted to go
>  with an architecture that puts an empty index into a broker core,
>  which lives on its own VM host separate from the large static shards.
>  This core's solrconfig.xml has a list of all the shards that get
>  queried.  My application has no idea that it's talking to anything
>  other than a single SOLR instance.  Once we get the caches warmed,
>  performance is quite good.
>
>  The VM host with the broker will also have another VM with the shard
>  where all new data goes, a concept we call the incrememental.  On a
>  nightly basis, some of the documents in the incremental will be
>  redistributed to the static shards and everything will get
>  reoptimized.
>
>  How would you recommend I pursue fault tolerance?  I had already
>  planned to set up a load balancer VM to handle redundancy for the
>  broker, so it would not be a HUGE step to have it load balance the
>  shards too.
>

The SolrCloud stuff has search side fault tolerance built-in - it should 
get better over time (more features eg partial results)

That's kind of a cop out answer to all you have there - but got to the 
end of this email and I'm feeling a bit tired.

Perhaps followup up with some clarifications/extensions to what you are 
looking for and we can try some more responses.

-- 
- Mark

http://www.lucidimagination.com




Re: Distributed search fault tolerance

Posted by Shawn Heisey <so...@elyograg.org>.
I guess I must be including too much information in my questions, 
running into "tl;dr" with them.  Later today when I have more time I'll 
try to make it more bite-size.

On 3/9/2010 2:28 PM, Shawn Heisey wrote:
> I attended the Webinar on March 4th.  Many thanks to Yonik for putting 
> that on.  That has led to some questions about the best way to bring 
> fault tolerance to our distributed search.  High level question: 
> Should I go with SolrCloud, or stick with 1.4 and use load balancing?  
> I hope the rest of this email isn't too disjointed for understanding.


Re: Distributed search fault tolerance

Posted by Mark Miller <ma...@gmail.com>.
My response to this was mangled by my email client - sorry - hopefully 
this one comes through a little easier to read ;)

On 03/09/2010 04:28 PM, Shawn Heisey wrote:
> I attended the Webinar on March 4th.  Many thanks to Yonik for putting 
> that on.  That has led to some questions about the best way to bring 
> fault tolerance to our distributed search.  High level question: 
> Should I go with SolrCloud, or stick with 1.4 and use load balancing?  
> I hope the rest of this email isn't too disjointed for understanding.
>
> We are using virtual machines on 8-core servers with 32GB of RAM to 
> house all this.  For initial deployment, there are two of these, but 
> we will have a total of four once we migrate off our current indexing 
> solution.  We won't be able to bring fault tolerance into the mix 
> until we have all four hosts, but I need to know what direction we are 
> going before initial deployment.
>
> One choice is to stick with version 1.4 for stability and use load 
> balancing on the shards.  I had already planned to have a pair of load 
> balancer VMs to handle redundancy on what I'm calling the broker 
> (explained further down), so it would not be a major step to have it 
> do the shards as well.
>
> I have been looking into SolrCloud.  I tried to just swap out the .war 
> file with one compiled from the cloud branch, but that didn't work.  A 
> little digging showed that the cloud branch uses a core for the 
> collection. 

Hmm - not sure I completely follow - a collection is currently made up 
of n cores. Unless you override the collection name that a core should 
participate in, it defaults to the core name.

> I already have cores defined so I can build indexes and swap them into 
> place quickly.  A big question - can I continue to use this multi-core 
> approach with SolrCloud, or does it supplant cores with its collection 
> logic?

You should be able to do anything you were doing pre SolrCloud I believe.

>
> Due to the observed high CPU requirements involved in sorting results 
> from multiple shards into a final result, I have so far opted to go 
> with an architecture that puts an empty index into a broker core, 
> which lives on its own VM host separate from the large static shards.  
> This core's solrconfig.xml has a list of all the shards that get 
> queried.  My application has no idea that it's talking to anything 
> other than a single SOLR instance.  Once we get the caches warmed, 
> performance is quite good.
>
> The VM host with the broker will also have another VM with the shard 
> where all new data goes, a concept we call the incrememental.  On a 
> nightly basis, some of the documents in the incremental will be 
> redistributed to the static shards and everything will get reoptimized.
>
> How would you recommend I pursue fault tolerance?  I had already 
> planned to set up a load balancer VM to handle redundancy for the 
> broker, so it would not be a HUGE step to have it load balance the 
> shards too.
>

The SolrCloud stuff has search side fault tolerance built-in - it should 
get better over time (more features eg partial results)

That's kind of a cop out answer to all you have there - but got to the 
end of this email and I'm feeling a bit tired.

Perhaps followup up with some clarifications/extensions to what you are 
looking for and we can try some more responses.

-- 
- Mark

http://www.lucidimagination.com