You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Amit Nithian <an...@gmail.com> on 2013/03/01 02:22:33 UTC

Re: Poll: SolrCloud vs. Master-Slave usage

I don't know a ton about SolrCloud but for our setup and my limited
understanding of it is that you start to bleed operational and
non-operational aspects together which I am not comfortable doing (i.e.
software load balancing). Also adding ZooKeeper to the mix is yet another
thing to install, setup, monitor, maintain etc which doesn't add any value
above and beyond what we have setup already.

For example, we have a hardware load balancer that can do the actual load
balancing of requests among the slaves and taking slaves in and out of
rotation either on demand or if it's down. We've placed a virtual IP on top
of our multiple masters so that we have redundancy there. While we have
multiple cores, the data volume is large enough to fit on one node so we
aren't at the data volume necessary for sharding our indices. I suspect
that if we had a sufficiently large dataset that couldn't fit on one box
SolrCloud is perfect but when you can fit on one box, why add more
complexity?

Please correct me if I'm wrong for I'd like to better understand this!




On Thu, Feb 28, 2013 at 12:53 AM, rulinma <ru...@gmail.com> wrote:

> I am doing research on SolrCloud.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Poll-SolrCloud-vs-Master-Slave-usage-tp4042931p4043582.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Poll: SolrCloud vs. Master-Slave usage

Posted by Shawn Heisey <so...@elyograg.org>.
On 3/1/2013 11:00 AM, Amit Nithian wrote:
> But does that mean that in SolrCloud, slave nodes are busy indexing
> documents?

With SolrCloud, there is no such thing as master or slave.  When you 
index documents, all applicable shard replicas are indexing the 
documents independently.  I think the simple answer to your question is 
"yes" with the additional note that they are not "slaves" in the same 
sense as what you may be used to with older Solr versions.

With the additional inclusion of soft commits in 4.x, NRT is completely 
achievable in the SolrCloud model.

Thanks,
Shawn


Re: Poll: SolrCloud vs. Master-Slave usage

Posted by Amit Nithian <an...@gmail.com>.
But does that mean that in SolrCloud, slave nodes are busy indexing
documents?


On Fri, Mar 1, 2013 at 5:37 AM, Michael Della Bitta <
michael.della.bitta@appinions.com> wrote:

> Amit,
>
> NRT is not possible in a master-slave setup because of the necessity
> of a hard commit and replication, both of which add considerable
> delay.
>
> Solr Cloud sends each document for a given shard to each node hosting
> that shard, so there's no need for the hard commit and replication for
> visibility.
>
> You could conceivably get NRT on a single node without Solr Cloud, but
> there would be no redundancy.
>
> Michael Della Bitta
>
> ------------------------------------------------
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Fri, Mar 1, 2013 at 1:22 AM, Amit Nithian <an...@gmail.com> wrote:
> > Erick,
> >
> > Well put and thanks for the clarification. One question:
> > "And if you need NRT, you just can't get it with traditional M/S setups."
> > ==> Can you explain how that works with SolrCloud?
> >
> > I agree with what you said too because there was an article or
> discussion I
> > read that said having high-availability masters requires some fairly
> > complicated setups and I guess I am under-estimating how
> > expensive/complicated our setup is relative to what you can get out of
> the
> > box with SolrCloud.
> >
> > Thanks!
> > Amit
> >
> >
> > On Thu, Feb 28, 2013 at 6:29 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
> >
> >> Amit:
> >>
> >> It's a balancing act. If I was starting fresh, even with one shard, I'd
> >> probably use SolrCloud rather than deal with the issues around the "how
> do
> >> I recover if my master goes down" question. Additionally, SolrCloud
> allows
> >> one to monitor the health of the entire system by monitoring the state
> >> information kept in Zookeeper rather than build a monitoring system that
> >> understands the changing topology of your network.
> >>
> >> And if you need NRT, you just can't get it with traditional M/S setups.
> >>
> >> In a mature production system where all the operational issues are
> figured
> >> out and you don't need NRT, it's easier just to plop 4.x in traditional
> M/S
> >> setups and not go to SolrCloud. And you're right, you have to understand
> >> Zookeeper which isn't all that difficult, but is another moving part and
> >> I'm a big fan of keeping the number of moving parts down if possible.
> >>
> >> It's not a one-size-fits-all situation. From what you've described, I
> can't
> >> say there's a compelling reason to do the SolrCloud thing. If you find
> >> yourself spending lots of time building monitoring or High
> >> Availability/Disaster Recovery tools, then you might find the
> cost/benefit
> >> analysis changing.
> >>
> >> Personally, I think it's ironic that the memory improvements that came
> >> along _with_ SolrCloud make it less necessary to shard. Which means that
> >> traditional M/S setups will suit more people longer <G>....
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Thu, Feb 28, 2013 at 8:22 PM, Amit Nithian <an...@gmail.com>
> wrote:
> >>
> >> > I don't know a ton about SolrCloud but for our setup and my limited
> >> > understanding of it is that you start to bleed operational and
> >> > non-operational aspects together which I am not comfortable doing
> (i.e.
> >> > software load balancing). Also adding ZooKeeper to the mix is yet
> another
> >> > thing to install, setup, monitor, maintain etc which doesn't add any
> >> value
> >> > above and beyond what we have setup already.
> >> >
> >> > For example, we have a hardware load balancer that can do the actual
> load
> >> > balancing of requests among the slaves and taking slaves in and out of
> >> > rotation either on demand or if it's down. We've placed a virtual IP
> on
> >> top
> >> > of our multiple masters so that we have redundancy there. While we
> have
> >> > multiple cores, the data volume is large enough to fit on one node so
> we
> >> > aren't at the data volume necessary for sharding our indices. I
> suspect
> >> > that if we had a sufficiently large dataset that couldn't fit on one
> box
> >> > SolrCloud is perfect but when you can fit on one box, why add more
> >> > complexity?
> >> >
> >> > Please correct me if I'm wrong for I'd like to better understand this!
> >> >
> >> >
> >> >
> >> >
> >> > On Thu, Feb 28, 2013 at 12:53 AM, rulinma <ru...@gmail.com> wrote:
> >> >
> >> > > I am doing research on SolrCloud.
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > View this message in context:
> >> > >
> >> >
> >>
> http://lucene.472066.n3.nabble.com/Poll-SolrCloud-vs-Master-Slave-usage-tp4042931p4043582.html
> >> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >> > >
> >> >
> >>
>

Re: Poll: SolrCloud vs. Master-Slave usage

Posted by Michael Della Bitta <mi...@appinions.com>.
Amit,

NRT is not possible in a master-slave setup because of the necessity
of a hard commit and replication, both of which add considerable
delay.

Solr Cloud sends each document for a given shard to each node hosting
that shard, so there's no need for the hard commit and replication for
visibility.

You could conceivably get NRT on a single node without Solr Cloud, but
there would be no redundancy.

Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Fri, Mar 1, 2013 at 1:22 AM, Amit Nithian <an...@gmail.com> wrote:
> Erick,
>
> Well put and thanks for the clarification. One question:
> "And if you need NRT, you just can't get it with traditional M/S setups."
> ==> Can you explain how that works with SolrCloud?
>
> I agree with what you said too because there was an article or discussion I
> read that said having high-availability masters requires some fairly
> complicated setups and I guess I am under-estimating how
> expensive/complicated our setup is relative to what you can get out of the
> box with SolrCloud.
>
> Thanks!
> Amit
>
>
> On Thu, Feb 28, 2013 at 6:29 PM, Erick Erickson <er...@gmail.com>wrote:
>
>> Amit:
>>
>> It's a balancing act. If I was starting fresh, even with one shard, I'd
>> probably use SolrCloud rather than deal with the issues around the "how do
>> I recover if my master goes down" question. Additionally, SolrCloud allows
>> one to monitor the health of the entire system by monitoring the state
>> information kept in Zookeeper rather than build a monitoring system that
>> understands the changing topology of your network.
>>
>> And if you need NRT, you just can't get it with traditional M/S setups.
>>
>> In a mature production system where all the operational issues are figured
>> out and you don't need NRT, it's easier just to plop 4.x in traditional M/S
>> setups and not go to SolrCloud. And you're right, you have to understand
>> Zookeeper which isn't all that difficult, but is another moving part and
>> I'm a big fan of keeping the number of moving parts down if possible.
>>
>> It's not a one-size-fits-all situation. From what you've described, I can't
>> say there's a compelling reason to do the SolrCloud thing. If you find
>> yourself spending lots of time building monitoring or High
>> Availability/Disaster Recovery tools, then you might find the cost/benefit
>> analysis changing.
>>
>> Personally, I think it's ironic that the memory improvements that came
>> along _with_ SolrCloud make it less necessary to shard. Which means that
>> traditional M/S setups will suit more people longer <G>....
>>
>> Best
>> Erick
>>
>>
>> On Thu, Feb 28, 2013 at 8:22 PM, Amit Nithian <an...@gmail.com> wrote:
>>
>> > I don't know a ton about SolrCloud but for our setup and my limited
>> > understanding of it is that you start to bleed operational and
>> > non-operational aspects together which I am not comfortable doing (i.e.
>> > software load balancing). Also adding ZooKeeper to the mix is yet another
>> > thing to install, setup, monitor, maintain etc which doesn't add any
>> value
>> > above and beyond what we have setup already.
>> >
>> > For example, we have a hardware load balancer that can do the actual load
>> > balancing of requests among the slaves and taking slaves in and out of
>> > rotation either on demand or if it's down. We've placed a virtual IP on
>> top
>> > of our multiple masters so that we have redundancy there. While we have
>> > multiple cores, the data volume is large enough to fit on one node so we
>> > aren't at the data volume necessary for sharding our indices. I suspect
>> > that if we had a sufficiently large dataset that couldn't fit on one box
>> > SolrCloud is perfect but when you can fit on one box, why add more
>> > complexity?
>> >
>> > Please correct me if I'm wrong for I'd like to better understand this!
>> >
>> >
>> >
>> >
>> > On Thu, Feb 28, 2013 at 12:53 AM, rulinma <ru...@gmail.com> wrote:
>> >
>> > > I am doing research on SolrCloud.
>> > >
>> > >
>> > >
>> > > --
>> > > View this message in context:
>> > >
>> >
>> http://lucene.472066.n3.nabble.com/Poll-SolrCloud-vs-Master-Slave-usage-tp4042931p4043582.html
>> > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > >
>> >
>>

Re: Poll: SolrCloud vs. Master-Slave usage

Posted by Amit Nithian <an...@gmail.com>.
Erick,

Well put and thanks for the clarification. One question:
"And if you need NRT, you just can't get it with traditional M/S setups."
==> Can you explain how that works with SolrCloud?

I agree with what you said too because there was an article or discussion I
read that said having high-availability masters requires some fairly
complicated setups and I guess I am under-estimating how
expensive/complicated our setup is relative to what you can get out of the
box with SolrCloud.

Thanks!
Amit


On Thu, Feb 28, 2013 at 6:29 PM, Erick Erickson <er...@gmail.com>wrote:

> Amit:
>
> It's a balancing act. If I was starting fresh, even with one shard, I'd
> probably use SolrCloud rather than deal with the issues around the "how do
> I recover if my master goes down" question. Additionally, SolrCloud allows
> one to monitor the health of the entire system by monitoring the state
> information kept in Zookeeper rather than build a monitoring system that
> understands the changing topology of your network.
>
> And if you need NRT, you just can't get it with traditional M/S setups.
>
> In a mature production system where all the operational issues are figured
> out and you don't need NRT, it's easier just to plop 4.x in traditional M/S
> setups and not go to SolrCloud. And you're right, you have to understand
> Zookeeper which isn't all that difficult, but is another moving part and
> I'm a big fan of keeping the number of moving parts down if possible.
>
> It's not a one-size-fits-all situation. From what you've described, I can't
> say there's a compelling reason to do the SolrCloud thing. If you find
> yourself spending lots of time building monitoring or High
> Availability/Disaster Recovery tools, then you might find the cost/benefit
> analysis changing.
>
> Personally, I think it's ironic that the memory improvements that came
> along _with_ SolrCloud make it less necessary to shard. Which means that
> traditional M/S setups will suit more people longer <G>....
>
> Best
> Erick
>
>
> On Thu, Feb 28, 2013 at 8:22 PM, Amit Nithian <an...@gmail.com> wrote:
>
> > I don't know a ton about SolrCloud but for our setup and my limited
> > understanding of it is that you start to bleed operational and
> > non-operational aspects together which I am not comfortable doing (i.e.
> > software load balancing). Also adding ZooKeeper to the mix is yet another
> > thing to install, setup, monitor, maintain etc which doesn't add any
> value
> > above and beyond what we have setup already.
> >
> > For example, we have a hardware load balancer that can do the actual load
> > balancing of requests among the slaves and taking slaves in and out of
> > rotation either on demand or if it's down. We've placed a virtual IP on
> top
> > of our multiple masters so that we have redundancy there. While we have
> > multiple cores, the data volume is large enough to fit on one node so we
> > aren't at the data volume necessary for sharding our indices. I suspect
> > that if we had a sufficiently large dataset that couldn't fit on one box
> > SolrCloud is perfect but when you can fit on one box, why add more
> > complexity?
> >
> > Please correct me if I'm wrong for I'd like to better understand this!
> >
> >
> >
> >
> > On Thu, Feb 28, 2013 at 12:53 AM, rulinma <ru...@gmail.com> wrote:
> >
> > > I am doing research on SolrCloud.
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Poll-SolrCloud-vs-Master-Slave-usage-tp4042931p4043582.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>

Re: Poll: SolrCloud vs. Master-Slave usage

Posted by Erick Erickson <er...@gmail.com>.
Amit:

It's a balancing act. If I was starting fresh, even with one shard, I'd
probably use SolrCloud rather than deal with the issues around the "how do
I recover if my master goes down" question. Additionally, SolrCloud allows
one to monitor the health of the entire system by monitoring the state
information kept in Zookeeper rather than build a monitoring system that
understands the changing topology of your network.

And if you need NRT, you just can't get it with traditional M/S setups.

In a mature production system where all the operational issues are figured
out and you don't need NRT, it's easier just to plop 4.x in traditional M/S
setups and not go to SolrCloud. And you're right, you have to understand
Zookeeper which isn't all that difficult, but is another moving part and
I'm a big fan of keeping the number of moving parts down if possible.

It's not a one-size-fits-all situation. From what you've described, I can't
say there's a compelling reason to do the SolrCloud thing. If you find
yourself spending lots of time building monitoring or High
Availability/Disaster Recovery tools, then you might find the cost/benefit
analysis changing.

Personally, I think it's ironic that the memory improvements that came
along _with_ SolrCloud make it less necessary to shard. Which means that
traditional M/S setups will suit more people longer <G>....

Best
Erick


On Thu, Feb 28, 2013 at 8:22 PM, Amit Nithian <an...@gmail.com> wrote:

> I don't know a ton about SolrCloud but for our setup and my limited
> understanding of it is that you start to bleed operational and
> non-operational aspects together which I am not comfortable doing (i.e.
> software load balancing). Also adding ZooKeeper to the mix is yet another
> thing to install, setup, monitor, maintain etc which doesn't add any value
> above and beyond what we have setup already.
>
> For example, we have a hardware load balancer that can do the actual load
> balancing of requests among the slaves and taking slaves in and out of
> rotation either on demand or if it's down. We've placed a virtual IP on top
> of our multiple masters so that we have redundancy there. While we have
> multiple cores, the data volume is large enough to fit on one node so we
> aren't at the data volume necessary for sharding our indices. I suspect
> that if we had a sufficiently large dataset that couldn't fit on one box
> SolrCloud is perfect but when you can fit on one box, why add more
> complexity?
>
> Please correct me if I'm wrong for I'd like to better understand this!
>
>
>
>
> On Thu, Feb 28, 2013 at 12:53 AM, rulinma <ru...@gmail.com> wrote:
>
> > I am doing research on SolrCloud.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Poll-SolrCloud-vs-Master-Slave-usage-tp4042931p4043582.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>