You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Norgorn <ls...@mail.ru> on 2014/09/24 08:00:21 UTC

SlrCloud RAM requirments

I have CLOUD with 3 nodes and 16 MB RAM on each.
My index is about 1 TB and search speed is awfully bad.
I've read, that one needs at least 50% of index size in RAM, but I suerly
can't afford it.
Please, tell me, is there any way to improve perfomance with hardly limited
resources?
Yes, I can try to make index smaller, and I'll do that, but I need to know
how much RAM is enough and if there are some magic ways to make the things
better.

SOLR spec is hs_0.06



--
View this message in context: http://lucene.472066.n3.nabble.com/SlrCloud-RAM-requirments-tp4160853.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SlrCloud RAM requirments

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Thu, 2014-09-25 at 06:29 +0200, Norgorn wrote:
> I can't say for sure, cause filter caches are out of the JVM (dat HS), but
> top shows  5 GB cached and no free RAM.

The cached reported from top should be correct, no matter if one used
off-heap or not: You have 5GB for (I guess) 300MB index, so 1.5% of the
index size.

I agree fully with Shawn that this will never perform for interactive
use, when you're using spinning drives.

> The only question for me now is how to balance disk cache and filter cache?
> Do I need to worry about that, or big disk cache is enough?

Even if you skipped the filters fully (so just simple queries) and
magically had 15GB out of the 16GB free for disk cache, it would only be
5% of the index size. Still not enough for decent performance with
spinning drives, unless your index is very special, e.g. enormous amount
of stored fields.


As for the whole "how much will it help with SSDs?", might I suggest
simply testing? Buy a 500GB SSD and put it in one of the machines, test
searches against that shard vs. the shards on the other machines. If you
do not see much difference, move the drive to your developer machine and
be happy for the upgrade. Win-win.

> And does "optimized index"  mean SOLR "optimize" command, or something else?

Optimized down to a single segment (which I think the 'optimize' command
will do). But you should only consider that if you know that your shard
will not be updated in the foreseeable future.

- Toke Eskildsen, State and University Library, Denmark



RE: SlrCloud RAM requirments

Posted by Norgorn <ls...@mail.ru>.
Thanks again.
I'd answered before properly reading your post, my apologizes.

I can't say for sure, cause filter caches are out of the JVM (dat HS), but
top shows  5 GB cached and no free RAM.
The only question for me now is how to balance disk cache and filter cache?
Do I need to worry about that, or big disk cache is enough?
And does "optimized index"  mean SOLR "optimize" command, or something else?

Anyway, your previous answers are really greate, so don't spend time, if u
don't have much to)



--
View this message in context: http://lucene.472066.n3.nabble.com/SlrCloud-RAM-requirments-tp4160853p4161047.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SlrCloud RAM requirments

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
Norgorn [lsunnydayl@mail.ru] wrote:
> Collection contains about billion of documents.

So 3-400M documents per core. That is a challenge with frequent updates and facets, but with your simple queries it should be doable.

> At the end, I want to reach several seconds per search query (for not cached
> query =) ), so, please, give me some reference points.

The frustratingly true answer is
http://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

> How much (roughly) will I need RAM with and without SSDs?
> I know, it depends, but at least sommething, please.

Okay, something it is: We have a 256GB machine, running a SolrCloud with 17 shards, each shard being about 900GB / 300M documents and put on a dedicated SSD. The machine currently has 160GB free for disk cache or about 1% of the total index size. For very simple unwarmed searches (just query on 1-3 terms, edismax over 6 fields with 2 phrase fields), median response time is < 200ms and nearly all response times are < 1 second. An extremely rough downscale with a factor 15 to approximate your 1TB index would leave 11GB for disk cache; divide it by 3 for your machines and it's 4GB disk cache/machine + whatever it takes to run your Solrs and the system itself.

BUT! All the shards are fully optimized and never updated, range filters can be tricky, multiple filters takes time, you have more documents/bytes than we have etc. 

> And HS means HelioSearch,

Ah. Of course. Although it helps with processing performance, it cannot do anything for your IO-problem,

How much memory is used for disk caching with your current setup?

- Toke

RE: SlrCloud RAM requirments

Posted by Norgorn <ls...@mail.ru>.
Thanks for your reply.

Collection contains about billion of documents.
I'm using most of all simple queries with date and other filters (5 filters
per query).
Yup, disks are cheapest and simplest.

At the end, I want to reach several seconds per search query (for not cached
query =) ), so, please, give me some reference points.
How much (roughly) will I need RAM with and without SSDs?

I know, it depends, but at least sommething, please.

And HS means HelioSearch, SOLR spec, which store filter caches out of JVM
heap, for me it helps to avoid OOM exceptions.



--
View this message in context: http://lucene.472066.n3.nabble.com/SlrCloud-RAM-requirments-tp4160853p4160891.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SlrCloud RAM requirments

Posted by Shawn Heisey <so...@elyograg.org>.
On 9/24/2014 2:18 AM, Toke Eskildsen wrote:
> Norgorn [lsunnydayl@mail.ru] wrote:
>> I have CLOUD with 3 nodes and 16 MB RAM on each.
>> My index is about 1 TB and search speed is awfully bad.
> 
> We all have different standard with regards to search performance. What is "awfully bad" and what is "good enough" for you?
> 
> Related to this: How many documents are in your index, how do you query (faceting, sorting, special searches) and how often is an index performed?
> 
>> I've read, that one needs at least 50% of index size in RAM,
> 
> That is the common advice, yes. The advice is not bad for some use cases. The problem is that it has become gospel.
> 
> I am guessing that you are using spinning drives? Solr needs fast random access reads and spinning drives are very slow for that. You can either compensate by buying enough RAM or you can change to a faster underlying storage technology. The obvious choice these days are Solid State Drives (we bought Samsung 840 EVO's last time and would probably buy those again). They will not give you RAM speed, but they do give a lot more bang for the buck and depending on your performance requirements they can be enough.

I am guilty of spreading the "gospel" that you need 50-100% of your
index to fit in the OS disk cache, as Toke mentioned.  This wiki page is
my creation:

http://wiki.apache.org/solr/SolrPerformanceProblems

I've seen decent performance out of systems with standard hard disks
that only had enough RAM to fit about 25% of the index into the disk
cache, but I've also seen systems with 50% that can't complete a simple
query in less than 10 seconds.

With a terabyte of index on the system (assuming that's how much is on
each one), 25% is still at least 256GB of RAM.  With only 16GB, there's
simply no way you'll ever get good performance.

I've heard quite a lot of anecdotal evidence that if you put the index
on SSD, you only need 10% of the index to fit in RAM.  I'm a little bit
skeptical that this would be true as a general rule, but I do not doubt
that it's been done successfully.  For a terabyte index, that's still
100GB of RAM, so 128GB would be the absolute minimum that you'll want to
consider.  The more RAM you can throw at this problem, the better your
performance will be.

Thanks,
Shawn


RE: SlrCloud RAM requirments

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
Norgorn [lsunnydayl@mail.ru] wrote:
> I have CLOUD with 3 nodes and 16 MB RAM on each.
> My index is about 1 TB and search speed is awfully bad.

We all have different standard with regards to search performance. What is "awfully bad" and what is "good enough" for you?

Related to this: How many documents are in your index, how do you query (faceting, sorting, special searches) and how often is an index performed?

> I've read, that one needs at least 50% of index size in RAM,

That is the common advice, yes. The advice is not bad for some use cases. The problem is that it has become gospel.

I am guessing that you are using spinning drives? Solr needs fast random access reads and spinning drives are very slow for that. You can either compensate by buying enough RAM or you can change to a faster underlying storage technology. The obvious choice these days are Solid State Drives (we bought Samsung 840 EVO's last time and would probably buy those again). They will not give you RAM speed, but they do give a lot more bang for the buck and depending on your performance requirements they can be enough.

You might want to read http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/ (I am the author)

All that being said, it is not certain that your performance problems are due to slow IO. But 3*16MB for 1TB of index certainly points that way.

> SOLR spec is hs_0.06

I have no idea what that means.

- Toke Eskildsen