You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mohammed Farhan Ejaz <fa...@gmail.com> on 2020/02/03 09:24:56 UTC

How to compute index size

Hello All,

I want to size the RAM for my Solr cloud instance. The thumb rule is your
total RAM size should be = (JVM size + index size)

Now I have a simple question, How do I know my index size? A simple method,
perhaps from the Solr cloud admin UI or an API?

My assumption so far is the total segment info size is the same as the
index size.

Thanks & Regards
Farhan

Re: How to compute index size

Posted by Andrzej Białecki <ab...@getopt.org>.
If you’re using Solr 8.2 or newer there’s a built-in index analysis tool that gives you a better understanding of what kind of data in your index occupies the most disk space, so that you can tweak your schema accordingly: https://lucene.apache.org/solr/guide/8_2/collection-management.html#colstatus <https://lucene.apache.org/solr/guide/8_2/collection-management.html#colstatus>

Which is another way of saying that you have to try and see ;)

> On 3 Feb 2020, at 18:02, David Hastings <ha...@gmail.com> wrote:
> 
> Yup, I find the right calculation to be as much ram as the server can take,
> and as much SSD space as it will hold, when you run out, buy another server
> and repeat.  machines/ram/SSD's are cheap.  just get as much as you can.
> 
> On Mon, Feb 3, 2020 at 11:59 AM Walter Underwood <wu...@wunderwood.org>
> wrote:
> 
>> What he said.
>> 
>> But if you must have a number, assume that the index will be as big as
>> your (text) data. It might be 2X bigger or 2X smaller. Or 3X or 4X, but
>> that is a starting point. Once you start updating, the index might get as
>> much as 2X bigger before merges.
>> 
>> Do NOT try to get by with the smallest possible RAM or disk.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Feb 3, 2020, at 5:28 AM, Erick Erickson <er...@gmail.com>
>> wrote:
>>> 
>>> I’ve always had trouble with that advice, that RAM size should be JVM +
>> index size. I’ve seen 300G indexes (as measured by the size of the
>> data/index directory) run in 128G of memory.
>>> 
>>> Here’s the long form:
>> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>> 
>>> But the short form is “stress test and see”.
>>> 
>>> To answer your question, though, when people say “index size” they’re
>> usually referring to the size on disk as I mentioned above.
>>> 
>>> Best,
>>> Erick
>>> 
>>>> On Feb 3, 2020, at 4:24 AM, Mohammed Farhan Ejaz <fa...@gmail.com>
>> wrote:
>>>> 
>>>> Hello All,
>>>> 
>>>> I want to size the RAM for my Solr cloud instance. The thumb rule is
>> your
>>>> total RAM size should be = (JVM size + index size)
>>>> 
>>>> Now I have a simple question, How do I know my index size? A simple
>> method,
>>>> perhaps from the Solr cloud admin UI or an API?
>>>> 
>>>> My assumption so far is the total segment info size is the same as the
>>>> index size.
>>>> 
>>>> Thanks & Regards
>>>> Farhan
>>> 
>> 
>> 


Re: How to compute index size

Posted by David Hastings <ha...@gmail.com>.
Yup, I find the right calculation to be as much ram as the server can take,
and as much SSD space as it will hold, when you run out, buy another server
and repeat.  machines/ram/SSD's are cheap.  just get as much as you can.

On Mon, Feb 3, 2020 at 11:59 AM Walter Underwood <wu...@wunderwood.org>
wrote:

> What he said.
>
> But if you must have a number, assume that the index will be as big as
> your (text) data. It might be 2X bigger or 2X smaller. Or 3X or 4X, but
> that is a starting point. Once you start updating, the index might get as
> much as 2X bigger before merges.
>
> Do NOT try to get by with the smallest possible RAM or disk.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Feb 3, 2020, at 5:28 AM, Erick Erickson <er...@gmail.com>
> wrote:
> >
> > I’ve always had trouble with that advice, that RAM size should be JVM +
> index size. I’ve seen 300G indexes (as measured by the size of the
> data/index directory) run in 128G of memory.
> >
> > Here’s the long form:
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >
> > But the short form is “stress test and see”.
> >
> > To answer your question, though, when people say “index size” they’re
> usually referring to the size on disk as I mentioned above.
> >
> > Best,
> > Erick
> >
> >> On Feb 3, 2020, at 4:24 AM, Mohammed Farhan Ejaz <fa...@gmail.com>
> wrote:
> >>
> >> Hello All,
> >>
> >> I want to size the RAM for my Solr cloud instance. The thumb rule is
> your
> >> total RAM size should be = (JVM size + index size)
> >>
> >> Now I have a simple question, How do I know my index size? A simple
> method,
> >> perhaps from the Solr cloud admin UI or an API?
> >>
> >> My assumption so far is the total segment info size is the same as the
> >> index size.
> >>
> >> Thanks & Regards
> >> Farhan
> >
>
>

Re: How to compute index size

Posted by Walter Underwood <wu...@wunderwood.org>.
What he said.

But if you must have a number, assume that the index will be as big as your (text) data. It might be 2X bigger or 2X smaller. Or 3X or 4X, but that is a starting point. Once you start updating, the index might get as much as 2X bigger before merges.

Do NOT try to get by with the smallest possible RAM or disk.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 3, 2020, at 5:28 AM, Erick Erickson <er...@gmail.com> wrote:
> 
> I’ve always had trouble with that advice, that RAM size should be JVM + index size. I’ve seen 300G indexes (as measured by the size of the data/index directory) run in 128G of memory. 
> 
> Here’s the long form: https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> 
> But the short form is “stress test and see”.
> 
> To answer your question, though, when people say “index size” they’re usually referring to the size on disk as I mentioned above.
> 
> Best,
> Erick
> 
>> On Feb 3, 2020, at 4:24 AM, Mohammed Farhan Ejaz <fa...@gmail.com> wrote:
>> 
>> Hello All,
>> 
>> I want to size the RAM for my Solr cloud instance. The thumb rule is your
>> total RAM size should be = (JVM size + index size)
>> 
>> Now I have a simple question, How do I know my index size? A simple method,
>> perhaps from the Solr cloud admin UI or an API?
>> 
>> My assumption so far is the total segment info size is the same as the
>> index size.
>> 
>> Thanks & Regards
>> Farhan
> 


Re: How to compute index size

Posted by Erick Erickson <er...@gmail.com>.
I’ve always had trouble with that advice, that RAM size should be JVM + index size. I’ve seen 300G indexes (as measured by the size of the data/index directory) run in 128G of memory. 

Here’s the long form: https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

But the short form is “stress test and see”.

To answer your question, though, when people say “index size” they’re usually referring to the size on disk as I mentioned above.

Best,
Erick

> On Feb 3, 2020, at 4:24 AM, Mohammed Farhan Ejaz <fa...@gmail.com> wrote:
> 
> Hello All,
> 
> I want to size the RAM for my Solr cloud instance. The thumb rule is your
> total RAM size should be = (JVM size + index size)
> 
> Now I have a simple question, How do I know my index size? A simple method,
> perhaps from the Solr cloud admin UI or an API?
> 
> My assumption so far is the total segment info size is the same as the
> index size.
> 
> Thanks & Regards
> Farhan