You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wael Kader <wa...@softech-lb.com> on 2018/02/14 16:41:55 UTC

Solr Recommended setup

Hi,

I would like to get a recommendation for the SOLR setup I have.

I have an index getting around 2 Million records per day. The index used is
in Cloudera Search (Solr).
I am running everything on one node. I run SOLR commits for whatever data
that comes to the index every 5 minutes.
The whole Cloudera VM has 64 GB of Ram.

Its working fine till now having around 80 Million records but Solr gets
slow once a week so I restart the VM for things to work.
I would like to get a recommendation on the setup. Note that I can add VM's
for my setup if needed.
I read somewhere that its wrong to index and read data from the same place.
I am doing this now and I do know I am doing things wrong.
How can I do a setup on Cloudera for SOLR to do indexing in one VM and do
the reading on another and what recommendations should I do for my setup.


-- 
Regards,
Wael

Re: Solr Recommended setup

Posted by Emir Arnautović <em...@sematext.com>.
Hi Wael,
It is hard to give recommendation what to do since every data set and access patterns differ. There are some guidelines that can be followed, but you will need to test to see which setup suites you.
I am guessing that you are running Solr in standalone mode. The problem with such approach is that you have to scale it vertically but eventually you will reach limits. The most likely it will be query latency that will force you to split your index. At that moment it is usually better to switch to SolrCloud and let Solr handle shards then doing it on your own.
Re splitting reading/writing - I guess you are talking about master-slave mode where you index on master and query slaves. When/if you switch to SolrCloud you will no longer need/be able to do that (even there is some work going on to support such scenario).

Here are links to some blogposts that explain how you can estimate the right setup for you:
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ <https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/>
http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html <http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html>

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 14 Feb 2018, at 17:41, Wael Kader <wa...@softech-lb.com> wrote:
> 
> Hi,
> 
> I would like to get a recommendation for the SOLR setup I have.
> 
> I have an index getting around 2 Million records per day. The index used is
> in Cloudera Search (Solr).
> I am running everything on one node. I run SOLR commits for whatever data
> that comes to the index every 5 minutes.
> The whole Cloudera VM has 64 GB of Ram.
> 
> Its working fine till now having around 80 Million records but Solr gets
> slow once a week so I restart the VM for things to work.
> I would like to get a recommendation on the setup. Note that I can add VM's
> for my setup if needed.
> I read somewhere that its wrong to index and read data from the same place.
> I am doing this now and I do know I am doing things wrong.
> How can I do a setup on Cloudera for SOLR to do indexing in one VM and do
> the reading on another and what recommendations should I do for my setup.
> 
> 
> -- 
> Regards,
> Wael