You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Susheel Kumar <su...@thedigitalgroup.net> on 2014/02/11 23:18:14 UTC

RE: Solr server requirements for 100+ million documents

Hi Otis,

Just to confirm, the 3 servers you mean here are 2 for shards/nodes and 1 for Zookeeper. Is that correct?

Thanks,
Susheel

-----Original Message-----
From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com] 
Sent: Friday, January 24, 2014 5:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr server requirements for 100+ million documents

Hi Susheel,

Like Erick said, it's impossible to give precise recommendations, but making a few assumptions and combining them with experience (+ a licked finger in the air):
* 3 servers
* 32 GB
* 2+ CPU cores
* Linux

Assuming docs are not bigger than a few KB, that they are not being reindexed over and over, that you don't have a search rate higher than a few dozen QPS, assuming your queries are not a page long, etc. assuming best practices are followed, the above should be sufficient.

I hope this helps.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/


On Fri, Jan 24, 2014 at 1:10 PM, Susheel Kumar < susheel.kumar@thedigitalgroup.net> wrote:

> Hi,
>
> Currently we are indexing 10 million document from database (10 db 
> data
> entities) & index size is around 8 GB on windows virtual box. Indexing 
> in one shot taking 12+ hours while indexing parallel in separate cores 
> & merging them together taking 4+ hours.
>
> We are looking to scale to 100+ million documents and looking for 
> recommendation on servers requirements on below parameters for a 
> Production environment. There can be 200+ users performing search same time.
>
> No of physical servers (considering solr cloud) Memory requirement 
> Processor requirement (# cores) Linux as OS oppose to windows
>
> Thanks in advance.
> Susheel
>
>

Re: Solr server requirements for 100+ million documents

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/11/2014 3:28 PM, Susheel Kumar wrote:
> Thanks, Otis for quick reply. So for ZK do you recommend separate servers and if so how many for initial Solr cloud cluster setup.

In a minimal 3-server setup, all servers would run zookeeper and two of 
them would also run Solr.With this setup, you can survive the failure of 
any of those three machines, even if it dies completely.

If the third machine is only running zookeeper, two fast CPU cores and 
2GB of RAM would be plenty.  For 100 million documents, I would 
personally recommend at least 8 CPU cores on the machines running Solr, 
ideally provided by at least two separate physical CPUs.  Otis 
recommended 32GB of RAM as a starting point.  You would very likely want 
more.

One copy of my 90 million document index uses two servers to run all the 
shards.  Because I have two copies of the index, I have four servers.  
Each server has 64GB of RAM.  This is **NOT** running SolrCloud, but if 
it were, I would have zookeeper running on three of those servers.

Thanks,
Shawn


Re: Solr server requirements for 100+ million documents

Posted by Jason Hellman <jh...@innoventsolutions.com>.
Whether you use the same machines as Solr or separate machines is a matter suited to taste.

If you are the CTO, then you should make this decision.  If not, inform management that risk conditions are greater when you share function and control on a single piece of hardware.  A single failure of a replica + zookeeper node will be more impactful than a single failure of a replica *or* a zookeeper node.  Let them earn the big bucks to make the risk decision.

The good news is, zookeeper hardware can be extremely lightweight for Solr Cloud.  Commodity hardware should work just fineā€¦and thus scaling to 5 nodes for zookeeper is not that hard at all.

Jason


On Feb 11, 2014, at 3:00 PM, svante karlsson <sa...@csi.se> wrote:

> ZK needs a quorum to keep functional so 3 servers handles one failure. 5
> handles 2 node failures. If you Solr with 1 replica per shard then stick to
> 3 ZK. If you use 2 replicas use 5 ZK
> 
> 
> 
> 
> 
>> 


Re: Solr server requirements for 100+ million documents

Posted by svante karlsson <sa...@csi.se>.
ZK needs a quorum to keep functional so 3 servers handles one failure. 5
handles 2 node failures. If you Solr with 1 replica per shard then stick to
3 ZK. If you use 2 replicas use 5 ZK





>

RE: Solr server requirements for 100+ million documents

Posted by Susheel Kumar <su...@thedigitalgroup.net>.
Thanks, Otis for quick reply. So for ZK do you recommend separate servers and if so how many for initial Solr cloud cluster setup. 

-----Original Message-----
From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com] 
Sent: Tuesday, February 11, 2014 4:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr server requirements for 100+ million documents

Hi Susheel,

No, we wouldn't want to go with just 1 ZK. :)

Otis
--
Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/


On Tue, Feb 11, 2014 at 5:18 PM, Susheel Kumar < susheel.kumar@thedigitalgroup.net> wrote:

> Hi Otis,
>
> Just to confirm, the 3 servers you mean here are 2 for shards/nodes 
> and 1 for Zookeeper. Is that correct?
>
> Thanks,
> Susheel
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
> Sent: Friday, January 24, 2014 5:21 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr server requirements for 100+ million documents
>
> Hi Susheel,
>
> Like Erick said, it's impossible to give precise recommendations, but 
> making a few assumptions and combining them with experience (+ a 
> licked finger in the air):
> * 3 servers
> * 32 GB
> * 2+ CPU cores
> * Linux
>
> Assuming docs are not bigger than a few KB, that they are not being 
> reindexed over and over, that you don't have a search rate higher than 
> a few dozen QPS, assuming your queries are not a page long, etc. 
> assuming best practices are followed, the above should be sufficient.
>
> I hope this helps.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics Solr & 
> Elasticsearch Support * http://sematext.com/
>
>
> On Fri, Jan 24, 2014 at 1:10 PM, Susheel Kumar < 
> susheel.kumar@thedigitalgroup.net> wrote:
>
> > Hi,
> >
> > Currently we are indexing 10 million document from database (10 db 
> > data
> > entities) & index size is around 8 GB on windows virtual box. 
> > Indexing in one shot taking 12+ hours while indexing parallel in 
> > separate cores & merging them together taking 4+ hours.
> >
> > We are looking to scale to 100+ million documents and looking for 
> > recommendation on servers requirements on below parameters for a 
> > Production environment. There can be 200+ users performing search 
> > same
> time.
> >
> > No of physical servers (considering solr cloud) Memory requirement 
> > Processor requirement (# cores) Linux as OS oppose to windows
> >
> > Thanks in advance.
> > Susheel
> >
> >
>

Re: Solr server requirements for 100+ million documents

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Susheel,

No, we wouldn't want to go with just 1 ZK. :)

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Feb 11, 2014 at 5:18 PM, Susheel Kumar <
susheel.kumar@thedigitalgroup.net> wrote:

> Hi Otis,
>
> Just to confirm, the 3 servers you mean here are 2 for shards/nodes and 1
> for Zookeeper. Is that correct?
>
> Thanks,
> Susheel
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
> Sent: Friday, January 24, 2014 5:21 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr server requirements for 100+ million documents
>
> Hi Susheel,
>
> Like Erick said, it's impossible to give precise recommendations, but
> making a few assumptions and combining them with experience (+ a licked
> finger in the air):
> * 3 servers
> * 32 GB
> * 2+ CPU cores
> * Linux
>
> Assuming docs are not bigger than a few KB, that they are not being
> reindexed over and over, that you don't have a search rate higher than a
> few dozen QPS, assuming your queries are not a page long, etc. assuming
> best practices are followed, the above should be sufficient.
>
> I hope this helps.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics Solr &
> Elasticsearch Support * http://sematext.com/
>
>
> On Fri, Jan 24, 2014 at 1:10 PM, Susheel Kumar <
> susheel.kumar@thedigitalgroup.net> wrote:
>
> > Hi,
> >
> > Currently we are indexing 10 million document from database (10 db
> > data
> > entities) & index size is around 8 GB on windows virtual box. Indexing
> > in one shot taking 12+ hours while indexing parallel in separate cores
> > & merging them together taking 4+ hours.
> >
> > We are looking to scale to 100+ million documents and looking for
> > recommendation on servers requirements on below parameters for a
> > Production environment. There can be 200+ users performing search same
> time.
> >
> > No of physical servers (considering solr cloud) Memory requirement
> > Processor requirement (# cores) Linux as OS oppose to windows
> >
> > Thanks in advance.
> > Susheel
> >
> >
>