You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by SHS SOLR <sh...@gmail.com> on 2010/01/26 21:41:16 UTC

SOLR index file system size estimate

We wanted to estimate the file system size requirements for index. Although
space very cheap, its not so here as we have to go through a process to add
space to the file system. So we don't want to end up estimating less and get
the process to kick in.

Is there a estimate tool for index sizes that can give a number based on
estimated size of each document? How much % should we add to the actual
document size considering we do all kinds of analysis/filters on text?

We are currently looking at only 70 documents each 20k size. But the number
of documents will increase to more than 10K soon. We would like to request
for some space keeping in mind about the future.

Any help is appreciated.

Thanks,
Pavan.

Re: SOLR index file system size estimate

Posted by Erick Erickson <er...@gmail.com>.
10K documents of 20K each is only 200M as a
base, so I don't think you need to worry.

Especially since your question is unanswerable
given the number of variables....

About the only thing you can really do is measure, with the
understanding that the first documents are more expensive
space-wise than later documents. So, assuming your
documents are similar, index the first 5,000, then index
the next 2000 and use the size delta to calculate the
average index growth/document. That'll give you a
pretty good idea in *your* environment with *your*
index structure......

But, again, this is not much data to index, so
I really think you'll be fine.

HTH
Erick

On Tue, Jan 26, 2010 at 3:41 PM, SHS SOLR <sh...@gmail.com> wrote:

> We wanted to estimate the file system size requirements for index. Although
> space very cheap, its not so here as we have to go through a process to add
> space to the file system. So we don't want to end up estimating less and
> get
> the process to kick in.
>
> Is there a estimate tool for index sizes that can give a number based on
> estimated size of each document? How much % should we add to the actual
> document size considering we do all kinds of analysis/filters on text?
>
> We are currently looking at only 70 documents each 20k size. But the number
> of documents will increase to more than 10K soon. We would like to request
> for some space keeping in mind about the future.
>
> Any help is appreciated.
>
> Thanks,
> Pavan.
>