You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zap Org <za...@gmail.com> on 2016/01/08 06:55:11 UTC

Solr search and index rate optimization

i wanted to ask that i need to index after evey 15 min with hard commit
(real time records) and currently have 5 zookeeper instances and 2 solr
instances in one machine serving 200 users with 32GB RAM. whereas i wanted
to serve more than 10,000 users so what should be my machine specs and what
should be my architecture for this much serve rate along with index rate.
my index size is 30GB increases gradually. Thankyou in advance and i am
newbee hoping to get response

Re: Solr search and index rate optimization

Posted by Zap Org <za...@gmail.com>.
hello dear thanks for replying it means 3 ZK instances are more than enough
in my case

On Fri, Jan 8, 2016 at 10:07 PM, Erick Erickson <er...@gmail.com>
wrote:

> Here's a longer form of Toke's answer:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> BTW, on the surface, having 5 ZK nodes isn't doing you any real good.
> Zookeeper isn't really involved in serving queries or handling
> updates, it's purpose is to have the state of the cluster (nodes up,
> recovering, down, etc) and notify Solr listeners when that state
> changes. There's no good reason to have 5 with a small cluster and by
> "small" I mean < 100s of nodes.
>
> Best,
> Erick
>
> On Fri, Jan 8, 2016 at 2:40 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
> wrote:
> > On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote:
> >> i wanted to ask that i need to index after evey 15 min with hard commit
> >> (real time records) and currently have 5 zookeeper instances and 2 solr
> >> instances in one machine serving 200 users with 32GB RAM. whereas i
> wanted
> >> to serve more than 10,000 users so what should be my machine specs and
> what
> >> should be my architecture for this much serve rate along with index
> rate.
> >
> > It depends on your system and if we were forced to guess, our guess
> > would be very loose.
> >
> >
> > Fortunately you do have a running system with real queries: Make a copy
> > on two similar machines (you will probably need more hardware anyway)
> > and simulate growing traffic, measuring response times at appropriate
> > points: 200 users, 500, 1000, 2000 etc.
> >
> > If you are very lucky, your current system scales all the way. If not,
> > you should have enough data to make an educated guess of the amount of
> > machines you need. You should have at least 3 measuring point to
> > extrapolate from as scaling is not always linear.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
>

Re: Solr search and index rate optimization

Posted by Erick Erickson <er...@gmail.com>.
bq: Well, a good reason would be if you want your system to
continue to operate if 2 ZK nodes lose communication with
the rest of the cluster or go down completely

My argument is usually that if you are losing 2 of 3 ZK nodes
at the same time with any regularity, you probably have
problems that won't be solved by adding more ZK nodes ;)

So I agree that if you want to guard against 2 nodes dropping
ZK below quorum going to 5 is an option. I've just seen very
few situations where that makes any practical difference and
it does add to maintenance...

BTW, let's say you have a running cluster and _all_ the ZK
nodes die. You'll still be able to run queries, but you won't
be able to update any docs. And Solr nodes coming online
won't be able to make themselves known to the rest of the
cluster etc, but at least you aren't totally dead in the water.

Not really disagreeing, just expressing solidarity with the ops
folks who don't want to maintain hardware that has really
marginal benefit ;)

Best,
Erick

On Sat, Jan 9, 2016 at 8:52 PM, Steve Davids <sd...@gmail.com> wrote:
> bq. There's no good reason to have 5 with a small cluster and by "small" I
> mean < 100s of nodes.
>
> Well, a good reason would be if you want your system to continue to operate
> if 2 ZK nodes lose communication with the rest of the cluster or go down
> completely. Just to be clear though, the ZK nodes definitely don't need to
> be beefy machines compared to your Solr data nodes since they are just
> doing light-weight orchestration. But yea, for a 2 data node system one
> might be willing to go with a 3 node ensemble to tolerate a single ZK
> node dying, just depends on how much cash you are willing to spend and
> availability level you are looking for.
>
> -Steve
>
>
> On Fri, Jan 8, 2016 at 12:07 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Here's a longer form of Toke's answer:
>>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> BTW, on the surface, having 5 ZK nodes isn't doing you any real good.
>> Zookeeper isn't really involved in serving queries or handling
>> updates, it's purpose is to have the state of the cluster (nodes up,
>> recovering, down, etc) and notify Solr listeners when that state
>> changes. There's no good reason to have 5 with a small cluster and by
>> "small" I mean < 100s of nodes.
>>
>> Best,
>> Erick
>>
>> On Fri, Jan 8, 2016 at 2:40 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
>> wrote:
>> > On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote:
>> >> i wanted to ask that i need to index after evey 15 min with hard commit
>> >> (real time records) and currently have 5 zookeeper instances and 2 solr
>> >> instances in one machine serving 200 users with 32GB RAM. whereas i
>> wanted
>> >> to serve more than 10,000 users so what should be my machine specs and
>> what
>> >> should be my architecture for this much serve rate along with index
>> rate.
>> >
>> > It depends on your system and if we were forced to guess, our guess
>> > would be very loose.
>> >
>> >
>> > Fortunately you do have a running system with real queries: Make a copy
>> > on two similar machines (you will probably need more hardware anyway)
>> > and simulate growing traffic, measuring response times at appropriate
>> > points: 200 users, 500, 1000, 2000 etc.
>> >
>> > If you are very lucky, your current system scales all the way. If not,
>> > you should have enough data to make an educated guess of the amount of
>> > machines you need. You should have at least 3 measuring point to
>> > extrapolate from as scaling is not always linear.
>> >
>> > - Toke Eskildsen, State and University Library, Denmark
>> >
>> >
>>

Re: Solr search and index rate optimization

Posted by Steve Davids <sd...@gmail.com>.
bq. There's no good reason to have 5 with a small cluster and by "small" I
mean < 100s of nodes.

Well, a good reason would be if you want your system to continue to operate
if 2 ZK nodes lose communication with the rest of the cluster or go down
completely. Just to be clear though, the ZK nodes definitely don't need to
be beefy machines compared to your Solr data nodes since they are just
doing light-weight orchestration. But yea, for a 2 data node system one
might be willing to go with a 3 node ensemble to tolerate a single ZK
node dying, just depends on how much cash you are willing to spend and
availability level you are looking for.

-Steve


On Fri, Jan 8, 2016 at 12:07 PM, Erick Erickson <er...@gmail.com>
wrote:

> Here's a longer form of Toke's answer:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> BTW, on the surface, having 5 ZK nodes isn't doing you any real good.
> Zookeeper isn't really involved in serving queries or handling
> updates, it's purpose is to have the state of the cluster (nodes up,
> recovering, down, etc) and notify Solr listeners when that state
> changes. There's no good reason to have 5 with a small cluster and by
> "small" I mean < 100s of nodes.
>
> Best,
> Erick
>
> On Fri, Jan 8, 2016 at 2:40 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
> wrote:
> > On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote:
> >> i wanted to ask that i need to index after evey 15 min with hard commit
> >> (real time records) and currently have 5 zookeeper instances and 2 solr
> >> instances in one machine serving 200 users with 32GB RAM. whereas i
> wanted
> >> to serve more than 10,000 users so what should be my machine specs and
> what
> >> should be my architecture for this much serve rate along with index
> rate.
> >
> > It depends on your system and if we were forced to guess, our guess
> > would be very loose.
> >
> >
> > Fortunately you do have a running system with real queries: Make a copy
> > on two similar machines (you will probably need more hardware anyway)
> > and simulate growing traffic, measuring response times at appropriate
> > points: 200 users, 500, 1000, 2000 etc.
> >
> > If you are very lucky, your current system scales all the way. If not,
> > you should have enough data to make an educated guess of the amount of
> > machines you need. You should have at least 3 measuring point to
> > extrapolate from as scaling is not always linear.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
>

Re: Solr search and index rate optimization

Posted by Erick Erickson <er...@gmail.com>.
Here's a longer form of Toke's answer:
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

BTW, on the surface, having 5 ZK nodes isn't doing you any real good.
Zookeeper isn't really involved in serving queries or handling
updates, it's purpose is to have the state of the cluster (nodes up,
recovering, down, etc) and notify Solr listeners when that state
changes. There's no good reason to have 5 with a small cluster and by
"small" I mean < 100s of nodes.

Best,
Erick

On Fri, Jan 8, 2016 at 2:40 AM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
> On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote:
>> i wanted to ask that i need to index after evey 15 min with hard commit
>> (real time records) and currently have 5 zookeeper instances and 2 solr
>> instances in one machine serving 200 users with 32GB RAM. whereas i wanted
>> to serve more than 10,000 users so what should be my machine specs and what
>> should be my architecture for this much serve rate along with index rate.
>
> It depends on your system and if we were forced to guess, our guess
> would be very loose.
>
>
> Fortunately you do have a running system with real queries: Make a copy
> on two similar machines (you will probably need more hardware anyway)
> and simulate growing traffic, measuring response times at appropriate
> points: 200 users, 500, 1000, 2000 etc.
>
> If you are very lucky, your current system scales all the way. If not,
> you should have enough data to make an educated guess of the amount of
> machines you need. You should have at least 3 measuring point to
> extrapolate from as scaling is not always linear.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>

Re: Solr search and index rate optimization

Posted by Zap Org <za...@gmail.com>.
thanks for replying currently my machine specs are
32 GB RAM
4 core processor
windows server 2008 64bit
500 GB HD
16 GB swap memorey

now the already running machine with cpu usage not more than 10% already
consumed all the RAM and now started to use swap memorey what my guess is
my server will chok when swap memorey will end. i am only running solr and
ZK instances there any wild idea what is happening and why memorey
consumption is too high.
all the field cache and query caches are set to 1GB in solrconfig and along
with serving queries i am running delta after every 15 minute.

On Fri, Jan 8, 2016 at 3:40 PM, Toke Eskildsen <te...@statsbiblioteket.dk>
wrote:

> On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote:
> > i wanted to ask that i need to index after evey 15 min with hard commit
> > (real time records) and currently have 5 zookeeper instances and 2 solr
> > instances in one machine serving 200 users with 32GB RAM. whereas i
> wanted
> > to serve more than 10,000 users so what should be my machine specs and
> what
> > should be my architecture for this much serve rate along with index rate.
>
> It depends on your system and if we were forced to guess, our guess
> would be very loose.
>
>
> Fortunately you do have a running system with real queries: Make a copy
> on two similar machines (you will probably need more hardware anyway)
> and simulate growing traffic, measuring response times at appropriate
> points: 200 users, 500, 1000, 2000 etc.
>
> If you are very lucky, your current system scales all the way. If not,
> you should have enough data to make an educated guess of the amount of
> machines you need. You should have at least 3 measuring point to
> extrapolate from as scaling is not always linear.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>

Re: Solr search and index rate optimization

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote:
> i wanted to ask that i need to index after evey 15 min with hard commit
> (real time records) and currently have 5 zookeeper instances and 2 solr
> instances in one machine serving 200 users with 32GB RAM. whereas i wanted
> to serve more than 10,000 users so what should be my machine specs and what
> should be my architecture for this much serve rate along with index rate.

It depends on your system and if we were forced to guess, our guess
would be very loose.


Fortunately you do have a running system with real queries: Make a copy
on two similar machines (you will probably need more hardware anyway)
and simulate growing traffic, measuring response times at appropriate
points: 200 users, 500, 1000, 2000 etc.

If you are very lucky, your current system scales all the way. If not,
you should have enough data to make an educated guess of the amount of
machines you need. You should have at least 3 measuring point to
extrapolate from as scaling is not always linear.

- Toke Eskildsen, State and University Library, Denmark