You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2013/04/08 11:50:23 UTC

Prediction About Index Sizes of Solr

This may not be a well detailed question but I will try to make it clear.

I am crawling web pages and will index them at SolrCloud 4.2. What I want
to predict is the index size.

I will have approximately 2 billion web pages and I consider each of them
will be 100 Kb.
I know that it depends on storing documents, stop words. etc. etc. If you
want to ask about detail of my question I may give you more explanation.
However there should be some analysis to help me because I should predict
something about what will be the index size for me.

On the other hand my other important question is how SolrCloud makes
replicas for indexes, can I change it how many replicas will be. Because I
should multiply the total amount of index size with replica size.

Here I found an article related to my analysis:
http://juanggrande.wordpress.com/2010/12/20/solr-index-size-analysis/

I know this question may not be details but if you give ideas about it you
are welcome.

Re: Prediction About Index Sizes of Solr

Posted by Dmitry Kan <so...@gmail.com>.
Interesting bit, thanks* *Rafał!



On Mon, Apr 8, 2013 at 12:54 PM, Rafał Kuć <r....@solr.pl> wrote:

> Hello!
>
> Let me answer the first part of your question. Please have a look at
>
> https://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls
> It should help you make an estimation about your index size.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> > This may not be a well detailed question but I will try to make it clear.
>
> > I am crawling web pages and will index them at SolrCloud 4.2. What I want
> > to predict is the index size.
>
> > I will have approximately 2 billion web pages and I consider each of them
> > will be 100 Kb.
> > I know that it depends on storing documents, stop words. etc. etc. If you
> > want to ask about detail of my question I may give you more explanation.
> > However there should be some analysis to help me because I should predict
> > something about what will be the index size for me.
>
> > On the other hand my other important question is how SolrCloud makes
> > replicas for indexes, can I change it how many replicas will be. Because
> I
> > should multiply the total amount of index size with replica size.
>
> > Here I found an article related to my analysis:
> > http://juanggrande.wordpress.com/2010/12/20/solr-index-size-analysis/
>
> > I know this question may not be details but if you give ideas about it
> you
> > are welcome.
>
>

Re: Prediction About Index Sizes of Solr

Posted by Rafał Kuć <r....@solr.pl>.
Hello!

Let me answer the first part of your question. Please have a look at
https://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls
It should help you make an estimation about your index size. 

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> This may not be a well detailed question but I will try to make it clear.

> I am crawling web pages and will index them at SolrCloud 4.2. What I want
> to predict is the index size.

> I will have approximately 2 billion web pages and I consider each of them
> will be 100 Kb.
> I know that it depends on storing documents, stop words. etc. etc. If you
> want to ask about detail of my question I may give you more explanation.
> However there should be some analysis to help me because I should predict
> something about what will be the index size for me.

> On the other hand my other important question is how SolrCloud makes
> replicas for indexes, can I change it how many replicas will be. Because I
> should multiply the total amount of index size with replica size.

> Here I found an article related to my analysis:
> http://juanggrande.wordpress.com/2010/12/20/solr-index-size-analysis/

> I know this question may not be details but if you give ideas about it you
> are welcome.